Live Stream: https://youtu.be/gnFzZRkQZ2c
This workshop will demonstrate a zero-to-hero tutorial on how to solve a classification task using deep learning. The tutorial kicks off demonstrating a simple classification task on synthetic data, first in low and then in high dimension. Then, a harder classification task based on FashinMNIST, a famous dataset containing images of clothes, will be tackled. Apart from solving the classification task itself, we will show how to generate and analyze embedding vectors that can be used to solve other downstream tasks, different from the original classification problem on which the model was trained. Finally, we are going to face a more advanced type of classification problem, namely, predicting links on a graph using Graph Neural Networks. Link prediction will be demonstrated on an open source dataset that contains information about collaborations among authors of scientific papers. The target of this workshop is to show how we can use Python to solve the the aforementioned tasks, taking into account both the data science aspects and the engineering and project lifecycle related ones. In particular, the python packages that we are going to cover in the workshop are PyTorch, PyTorch-Lightning, Deep Graph Library.
A Zero-To-Hero workshop that will demonstrate how to solve classification tasks on datasets and tasks of increasing complexity. The workshop will present both how to solve the tasks and how to structure a codebase according to software development best practices.
The first challenge presented in the workshop will be the classification of synthetically generated Gaussian blobs. First, we will be classifying low-dimensional Gaussian blobs and then we will extend the algorithm to higher-dimensional blobs. Moreover, the demo will also showcase Tensorboard as a tool to monitor model learning. The model presented in this initial part of the tutorial is the Fully Connected Multi-Layer Perceptron (MLP), the most well-known type of neural network.
The second challenge will be the classification of garments from images. To this end, we will use the well-known FashionMNIST dataset. The model presented in this second part is the Convolutional Neural Network (CNN), the mainstream neural network for image analysis. After solving the classification tasks, we will demonstrate how to obtain numerical embeddings from the CNN that can be used to solve a multitude of downstream tasks. We will demonstrate how these embeddings can be used to find and cluster similar items.
In the third and final part of the workshop, we will work with graph data and try to predict whether two authors of scientific papers are co-authors or not. We will demonstrate this task using the collab open source dataset, that is, a graph where the nodes will represent the authors and the edges will connect the co-authors. The problem we will solve is framed as a link prediction task, that is, to a certain extent, analogous to a classification task since we are going to try to predict whether an edge exists or not between two nodes. The model that will solve this task consists of two stages. The model presented in this final part of the tutorial is the Graph Neural Network (GNN), a powerful formalism for analysing graph data.
The models are going to be implemented using PyTorch-Lightning, that enforces a modular and maintainable software structure.