What could possibly go wrong? - An incomplete guide on how to prevent, detect & mitigate biases in data products

PyCon DE & PyData Berlin 2023

Within this talk, I want to look at the topic of data ethics with a practical lens and facilitate the discussion about how we can establish ethical data practices into our day to day work. I will shed some light on the multiple sources of biases in data applications: Where are potential pitfalls and how can we prevent, detect and mitigate them early so they never become a risk for our data product. I will walk you through the different stages of a data product lifecycle and dive deeper into the questions we as data professionals have to ask ourselves throughout the process. Furthermore, I will present methods, tools and libraries that can support our work. Being well aware that there is no universal solution as tools and strategies need to be chosen to specifically address requirements of the use-case and models at hand, my talk will provide a good starting point for your own data ethics journey.

Terms like trustworthy, responsible or ethical AI have been popular buzzwords for some time. But while we've seen some startling examples of ‘AI gone wrong’, such as when Facebook falsely classified black persons as ‘Primates’, Amazon’s hiring algorithm discriminated against women or the A-level algorithmic grading fiasco in the UK, for many data projects ethical considerations only come into play as an afterthought - if at all. Experience has shown that more accountability and transparency are needed in AI systems, and regulatory initiatives such as the EU AI Act make it increasingly important to treat the topic as a first-class citizen throughout the whole development process. While the implementation of legal initiatives and ethics guidelines raise awareness and bring the topic into focus, it often remains quite abstract and difficult to translate into our day to day work. Therefore, I want to look at the topic with a practical lens and facilitate the discussion about how we can establish ethical data practices. I will shed some light on the multiple sources of biases in data applications: Where are potential pitfalls and how can we prevent, detect and mitigate them early so they never become a risk for our data product. I will walk you through the different stages of a data product lifecycle and dive deeper into the questions we as data professionals have to ask ourselves throughout the process. Furthermore, I will present methods, tools and libraries that can support our work. Being well aware that there is no universal solution as tools and strategies need to be chosen to specifically address requirements of the use-case and models at hand, my talk will provide a good starting point for your own data ethics journey.

Speakers: Lea Petters