In this talk, we will report on our experiences switching from Pandas to Polars in a real-world ML project. Polars is a new high-performance dataframe library for Python based on Apache Arrow and written in Rust. We will compare the performance of polars with the popular pandas library, and show how polars can provide significant speed improvements for data manipulation and analysis tasks. We will also discuss the unique features of polars, such as its ability to handle large datasets that do not fit into memory, and how it feels in practice to make the switch from Pandas. This talk is aimed at data scientists, analysts, and anyone interested in fast and efficient data processing in Python.
The pandas library is one of the most widely used tools for working with data in the Python ecosystem. However, pandas can be slow for medium and larger datasets, and many users have been looking for faster alternatives. In this talk, we introduce the new polars library, a high-performance dataframe library for Python based on Apache Arrow and written in Rust. We will report on our experiences switching from Pandas to Polars in a real-world ML project.
We will compare the performance of polars with pandas using various use-cases, and show how polars can provide significant speed improvements for common data manipulation and analysis tasks. Due to its speed it can even be an alternative for cases where people normally use distributed systems like Spark. For example, we will demonstrate how polars can process large datasets with minimal overhead, and how its massive use of parallelization can provide an additional speed boost.
We will also discuss how polars compares to other popular options like DuckDB and cuDF.
This talk is aimed at data scientists, analysts, and anyone interested in fast and efficient data processing in Python. Whether you are a pandas user looking for a faster alternative, or a Spark user interested in a simpler alternative, this talk will provide valuable insights and practical examples.