You've got trust issues, we've got solutions: Differential Privacy

PyCon DE & PyData Berlin 2023

As we are in an era of big data where large groups of information are assimilated and analyzed, for insights into human behavior, data privacy has become a hot topic. Since there is a lot of private information which once leaked can be misused, all data cannot be released for research. This talk aims to discuss Differential Privacy, a cutting-edge technique of cybersecurity that claims to preserve an individual’s privacy, how it is employed to minimize the risks with private data, its applications in various domains, and how Python eases the task of employing it in our models with PyDP.

Since there is a lot of private information which once leaked can be misused, how should privacy be protected? One might think that simply making personally identifiable fields in the dataset anonymous might be useful, but this can lead to the entire dataset becoming useless and not fit for analysis. And research has proven that by statistically studying both the datasets, private information can easily be re-extracted! The session will start with a brief on the current standards of privacy, and the possible risks of handling customer data. This will lay the foundation for introducing Differential Privacy, a cutting-edge technique of cybersecurity that claims to preserve an individual’s privacy, by manipulating data in such a way as to not render it useless for data analysis. Developers will gain an insight into the concept of Differential Privacy, how it is employed to minimize the risks associated with private data, its practical applications in various domains, and how Python eases the task of employing it in our models with PyDP. As the talk progresses, a walkthrough of a real-life practical example, along with a nifty visualization will acquaint the audience with PyDP, and how differential private results come out to be in approximation to what unfiltered data would have provided.

Speakers: Vikram Waradpande Sarthika Dhawan