We present BastionLab, a Rust open-source privacy framework for confidential data science collaboration.
We aim to help data owners open access to their datasets to outside data scientists. The current approaches, such as opening Jupyter notebooks, provide no elaborate control over what is shared. Datasets can easily be extracted from them, which means they offer little privacy guarantees and make data collaboration difficult.
BastionLab provides an interactive interface for data scientists to explore remote datasets, yet answers the privacy concerns of data owners, as only results compliant with the privacy policy defined by the data owners can be communicated.
Data exposure is limited as data scientists never have direct access to the data, they can only use a limited set of operators which preclude arbitrary code execution to exfiltrate data, and a strict access control policy is put in place. Differential Privacy and Trusted Execution Environments are supported as well to ensure maximum privacy.
We will provide an example to show how a COVID dataset could be shared to a remote data scientist to perform data exploration, cleaning and visualization, while making sure only anonymized results are communicated.
The server side of BastionLab is developed in Rust for its memory safety, performance and community. It allows the use of cutting-edge libraries like polars, an open source DataFrame library in Rust several times faster than pandas, the go-to solution in Python.