Secure Inference Serving Stack

This project investigates how to operationalize privacy-preserving inference in a production-like serving environment. It combines runtime scheduling, operator-level optimization, and practical deployment constraints.

Key contributions include:

latency-aware orchestration for encrypted operators
service-level performance profiling tools
reproducible deployment scripts for benchmark workloads