Code Cleanup: A Data Scientist's Guide to Sparkling Code

PyCon DE & PyData Berlin 2023

Does your production code look like it’s been copied from Untitled12.ipynb? Are your engineers complaining about the code but you can’t find the time to work on improving the code base? This talk will go through some of the basics of clean coding and how to best implement them in a data science team.

Data scientists often have a different background and priorities than software engineers. A lot of the code Data Scientists write never makes it to production, and as a result, the code might not always meet the same standards as production-ready code in a developer team. While it makes sense to have rather lax requirements on code for one-off analyses, this can lead to difficulties in maintaining production code and collaborating on projects with software engineers. Since production code is not (always) the main output of a data science team, it can also be hard to prioritize code quality. In this presentation, we will go over some of the main principles of clean code and talk about practical steps that data science teams can take to improve their code. We will specifically focus on strategies that teams can implement to slowly and steadily improve the existing code base. This talk is aimed at data scientists who may not have a strong background in software engineering, but are interested in improving code quality and collaborating more effectively with software engineering teams.

Speakers: Corrie Bartelheimer