conferences | speakers | series

Fullstack datascientist v.2021 (how much of software engineering should a modern datascientist know)

home

Fullstack datascientist v.2021 (how much of software engineering should a modern datascientist know)
PyCon Sweden 2021

Live broadcast: https://www.youtube.com/watch?v=UujU3xOo038 What are the essential software engineering skills a datascientist should have to succesfully bring own work to production? We - Sergei Beilin, Ph.D., software engineering consultant in AI/ML, and his wife Natalia Beylina, Ph.D., datascientist - will go through the most important things a modern datascientist needs to know about software engineering, from both software engineer and datascientist point of views, and using our own experience. We will discuss: * programming language(s): how much of the language should one know? * execution models, orchestration, containerization - kubernetes, kubeflow, airflow, spark/databricks, etc * storage, network protocols/APIs, file formats - from CSVs to delta, from json to avro * modern systems architecture concepts to understand * and how the whole system architecture and infrastructure landscape will dictate the way you deploy and run your work * tools and devops practices * processes: integrating data scientists' workflow into typical agile * bad practices to avoid: a few examples we've seen ourselves

Data science went from universities and research labs to small to big commercial companies in different business areas. From experimentation phase it's going to production and not everyone knows how to build teams around datascience projects, and datascientist need to know more about software engineering, especially when they have to work a lot alone, without proper support from software engineers. No longer is data science just some experimental code, and no, a jupyter notebook is not enough. The industrialization of data science required more, broader skills. We - Sergei Beilin, Ph.D., software engineering consultant in AI/ML, and his wife Natalia Beylina, Ph.D., datascientist - will go through the most important things a modern datascientist needs to know about software engineering, from both software engineer and datascientist point of views, and using our own experience. We both have Ph.Ds in mathematics and worked for quite some time in research and education, so at some point we had this experience of "research to business" mindset shift. In this talk we tried to collect our experience of working in different datascience projects and companies as well as helping others move to data science from different fields.

Speakers: Sergei Beilin Natalia Beylina