From Jupyter Notebook to production code

FOSDEM 2021

Have you ever had issues to share your Jupyter Notebooks? Ever had troubles with code that "works on my machine" only? Do you consider your Research and Development smooth and straightforward? Is your code scalable? Tough questions, I know. But if you've mentally answered 'no' to any of those you could use a tool to help with some of the pain-points of your workflow. Kedro is an open-source Python library that helps data scientists write data pipelines following software engineering best practices from the start. Known as the Django of ML/DS projects, Kedro is an opinionated framework based on cookiecutter data science that allows for modularity and scalability on data science projects.

In this talk, I will explore the workflow of a Kedro project, introduce some of the most outstanding features of the framework, such as the Data Catalog and show how to convert a Jupyter Notebook into a Kedro project, allowing for scalability and team collaboration.

Talk structure

Intro (5 min)
The problem(s) (10 min)
The solution (5 min)
Demo - convert Notebook to Kedro project (15 min)
Q&A (5 min)

Audience

This talk focus on data engineers, machine learning engineers, and data scientists who wish to learn how to write code beyond the Jupyter Notebook. The audience is expected to know the basics of Python and Jupyter Notebooks. All levels are welcome. Key takeaways

By the end of this talk, the attendees are expected to understand the basics setup of a Kedro project, know how to convert a Jupyter Notebook into a Kedro project, and to visualize the created data pipelines using the Kedro Viz extension.

Speakers: Lais Carvalho Matteo Bertucci