Trust Fall: Hidden Gems in MLFlow that Improve Model Credibility

PyTexas 2023

When it comes to machine learning projects, verifying and trusting model performance results is a particularly grueling challenge. This talk will explore both how we can use Python to instill confidence in model metrics and the best way to keep models versioned to increase transparency and accessibility across the team. The tactics demonstrated will help developers save precious development time and increase transparency by incorporating metric tracking early on.

Machine learning projects are challenging, but verifying and trusting model performance results is a particularly grueling challenge. The problem becomes more complex with more team members and larger scope. Engineers need to integrate model performance tracking throughout the project lifecycle to ensure that their model results are reliable, reproducible and transparent. This talk will explore both how we can use Python to instill confidence in model metrics and the best way to keep models versioned to increase transparency and accessibility across the team. In this talk, I'll present the following three MLFlow features that can improve transparency and credibility in the model results. If engineers aren’t applying these techniques, they could risk presenting false or irreproducible model results and waste precious development time retracing their steps to find the best model version. *MLFlow Autologging: This feature requires minimal code and stores all relevant configurations, parameters and metrics, reducing the likelihood that some aspect of the model has not been captured. It makes it easy to compare the results of experiments and determine the final model. *MLFlow System Tags: A small but mighty feature, these reserved tags store important metadata. They capture the state of the codebase at the time of the run, such as git branch, commit hash and developer name. It helps solve a familiar issue to machine learning engineers - how to rerun the exact training pipeline code that was used to generate the best metrics in an experiment by another developer. *MLFlow Model Registry: As the project gets closer to finish, it can be daunting to sift through experiments and runs. Leveraging some of MLFlow's Python API, we'll show how engineers can version their best models and transition models to different stages. This makes it easy to share model performance and encourage other engineers to reproduce the results before pushing the model to production. By the end of this talk, Python developers will know how to leverage some powerful techniques to increase model performance credibility, integrate tracking to enhance collaboration over the project lifecycle and version models to always be ready for production.

Speakers: Krishi Sharma