Accelerate Model Training with an Easy to Use High-Performance AI/ML Stack for the Cloud

The advent of large scale machine learning models has exacerbated the ongoing problem of resource and infrastructure management for ML practitioners. How can a data scientist, who has little or no DevOps knowledge, train and deploy models that require compute clusters with dozens or hundreds of nodes and GPU resources? In this talk, Michael Clifford will discuss how members of Red Hat’s Emerging Technologies team leverage two open source projects, Ray and Open Data Hub, to simplify their distributed training and cloud based resource allocation for their team. We will cover: * An overview of Open Data Hub and Ray * A detailed discussion on how we’ve integrated Ray with Open Data Hub to improve the user experience for developing large machine learning models * A demonstration of a real-world use case where Ray is used to accelerate an AI/ML workload on Open Data Hub * A discussion on the open source project developing this work to improve ML workflow tooling in the cloud, project CodeFlare By the end of this talk, attendees will have a better understanding of how to build high-performance and scalable AI/ML systems.

Speakers: Michael Clifford Erik Erlandson

Accelerate Model Training with an Easy to Use High-Performance AI/ML Stack for the Cloud

Accelerate Model Training with an Easy to Use High-Performance AI/ML Stack for the Cloud

FOSSY 2023