Introduction to Data Analysis Using Pandas

PyCon UK 2022

Data often doesn’t come in the best format for analysis, and understanding it enough to extract insights requires both time and the skills to filter, aggregate, reshape, and visualize it. This session will equip you with the knowledge you need to effectively use pandas to make this process easier.

This tutorial is for anyone with basic knowledge of Python and an interest in learning how to analyze data in Python. We will be working with Jupyter Notebooks, so attendees should familiarize themselves with the interface (i.e., know how to run/edit a cell) beforehand. #### Section 1: Getting Started With Pandas We will begin by introducing the `Series`, `DataFrame`, and `Index` classes, which are the basic building blocks of the pandas library, and showing how to work with them. By the end of this section, you will be able to create DataFrames and perform operations on them to inspect and filter the data. #### Section 2: Data Wrangling To prepare our data for analysis, we need to perform data wrangling. In this section, we will learn how to clean and reformat data (e.g. renaming columns, fixing data type mismatches), restructure/reshape it, and enrich it (e.g. discretizing columns, calculating aggregations, combining data sources). We will take breaks for exercises throughout and all solutions, slides, and notebooks will be provided. #### Environment Setup Follow the setup instructions [here](https://github.com/stefmolin/pandas-workshop#setup-instructions) to get your environment up and running before the session.

Speakers: Stefanie Molin