In this talk, we will share some of the lessons we have learned over the last 4 years that we have been developing ML solutions, and deploying and maintaining them in production in real enterprise operations.
With regard to ML solution development, we will share our insights so far on overcoming some challenges that we have seen commonly arise in a process typically driven by iterative experimentation within a team, focusing above all on achieving high levels of traceability and reproducibility. Combining various common MLOps best practices such as versioning data and models together with code, as well as tracking experiments, we have set up a methodology that makes it practically impossible for team members to evade conducting their work in a highly reproducible way, at the same time as providing flexibility for rapid experimentation.
When it comes to deployment, the MLOps practices that have served us particularly well are the principles of early and controlled deployment (shadow mode, canary and blue-green deployments), the careful definition of key business and technical metrics, and an obsessive focus on observability and monitoring. We will also touch upon some non-technical challenges that we have commonly encountered along the way.
Aside from sharing our own experiences and lessons learned, we would like to encourage a constructive discussion with the community, drawing on the wide experience of the community to continue to evolve best practices in the field.