At the boundary of model development and MLOps lies the balance between the speed of deploying new models and ensuring operational constraints. These include factors like low latency prediction, the absence of vulnerabilities in dependencies and the need for the model behavior to stay reproducible for years. The longer the list of constraints, the longer it usually takes to take a model from its development environment into production. In this talk, we present how we seemingly managed to square the circle and have both a rapid, highly dynamic model development and yet also a stable and high-performance deployment.
At QuantCo, we ship sklearn-based models in a real-time service that guarantees 24/7 uptime with low latency (ms) responses. Simultaneously, we adhere to strict regulatory and security policies, where every model must remain available for 3-5 years, while its dependencies are kept up-to-date. As the basis, we are using ONNX as a technology to transform our dynamic Python pipelines into static, low-overhead model definitions. To ensure the cost of the model transformation does not slow down our Data Scientists, we have developed an open-source library named Spox, to streamline these operations as much as possible. Combined with an apt model serving infrastructure, we can satisfy the needs of our data scientists (fast development and deployment) and those of corporate IT (vulnerability-free, year-long stability) without compromising efficiency.
Speakers: Christian Bourjau Jakub Bachurski