Maximizing Efficiency and Scalability in Open-Source MLOps: A Step-by-Step Approach

PyCon DE & PyData Berlin 2023

This talk presents a novel approach to MLOps that combines the benefits of open-source technologies with the power and cost-effectiveness of cloud computing platforms. By using tools such as Terraform, MLflow, and Feast, we demonstrate how to build a scalable and maintainable ML system on the cloud that is accessible to ML Engineers and Data Scientists. Our approach leverages cloud managed services for the entire ML lifecycle, reducing the complexity and overhead of maintenance and eliminating the vendor lock-in and additional costs associated with managed MLOps SaaS services. This innovative approach to MLOps allows organizations to take full advantage of the potential of machine learning while minimizing cost and complexity.

Building a machine learning (ML) system on a cloud platform can be a challenging and time-consuming task, especially when it comes to selecting the right tools and technologies. In this talk, we will present a comprehensive solution for building scalable and maintainable ML systems on the cloud using open source technologies like MLFlow, Feast, and Terraform. MLFlow is a powerful open source platform that simplifies the end-to-end ML lifecycle, including experimentation, reproducibility, and deployment. It allows you to track and compare different runs of your ML models and deploy them to various environments, such as production or staging, with ease. Feast is an innovative open source feature store that enables you to store and serve features for training, serving, and evaluating ML models. It integrates seamlessly with MLFlow, enabling you to track feature versions and dependencies, and deploy feature sets to different environments. Terraform is a widely-used open source infrastructure as code (IaC) tool that enables you to define and manage your cloud resources in a declarative manner. It allows you to automate the provisioning and management of your ML infrastructure, such as compute clusters, databases, and message brokers, saving you time and effort. In this talk, we will demonstrate how these open source technologies can be used together to build an ML system on the cloud and discuss the benefits and trade-offs of using them. We will also share best practices and lessons learned from our own experiences building ML systems on the cloud, providing valuable insights and guidance for attendees looking to do the same.

Speakers: Paul Elvers