Secure Inference Serving Stack
End-to-end serving stack for privacy-preserving machine learning inference.
This project investigates how to operationalize privacy-preserving inference in a production-like serving environment. It combines runtime scheduling, operator-level optimization, and practical deployment constraints.
Key contributions include:
- latency-aware orchestration for encrypted operators
- service-level performance profiling tools
- reproducible deployment scripts for benchmark workloads