Proper monitoring of machine learning models in production is essential to avoid performance issues. Setting up monitoring can be easy for a single model, but it often becomes challenging at scale or when you face alert fatigue based on many metrics and dashboards.
In this talk, I will introduce the concept of test-based ML monitoring. I will explore how to prioritize metrics based on risks and model use cases, integrate checks in the prediction pipeline and standardize them across similar models and model lifecycle. I will also take an in-depth look at batch model monitoring architecture and the use of open-source tools for setup and analysis.
Have you ever deployed a machine learning model in production only to realize that it wasn't performing as well as you thought it would, or was late to detect a model performance drop due to corrupted data? Proper monitoring can help avoid it. Typically, this involves checking the quality of the input data, monitoring the model's responses, and detecting any changes that might lead to model quality drops.
However, setting up monitoring is often easier said than done. First, while it is easy to write a few assertions for data quality checks or track accuracy for a single model you created, it is much more challenging to do so consistently and at scale as the number of models, pipelines, and the volume of data increases. Second, building monitoring dashboards to track many metrics often leads to alert fatigue and does not help with root cause analysis of the problem.
In this talk, I will introduce the idea of test-based ML monitoring and how it can help you keep your models in check in production. I will cover the following:
- The difference between testing and monitoring and when one is better than other
- How to prioritize metrics and tests for each model based on risks and model use cases
- How to integrate checks in the model prediction pipeline and standardize them across similar models and model lifecycle
- An in-depth look at batch model monitoring architecture, including setup and analysis of results using open-source tools