How to baseline in NLP and where to go from there

PyCon DE & PyData Berlin 2023

In this talk, we will explore the build-measure-learn paradigm and the role of baselines in natural language processing (NLP). We will cover the common NLP tasks of classification, clustering, search, and named entity recognition, and describe the baseline approaches that can be used for each task. We will also discuss how to move beyond these baselines through weak learning and transfer learning. By the end of this talk, attendees will have a better understanding of how to establish and improve upon baselines in NLP.

In this talk, we will explore the role of baselines in natural language processing (NLP) and discuss how to move beyond these baselines through weak learning and transfer learning. First, I will introduce the build-measure-learn paradigm, which is a framework for developing and improving products or systems. This paradigm involves building a solution, measuring its performance, and learning from the results to iteratively improve the solution. Baselines are an essential part of this process because they provide a starting point for comparison and a benchmark to measure against. Next, I will delve into the common NLP tasks of classification, clustering, search, and named entity recognition (NER). For each task, I will describe the baseline approaches that can be used. These baselines may not be the most advanced or sophisticated solutions, but they are often quick and easy to implement, and they can serve as a useful reference and guidance for further improvement. Finally, I will discuss how to move on from these baselines. One option is to use insights from the baselines to build a weak learning system, which is a machine learning model that relies on human-generated rules or patterns rather than a large dataset. Another option is to leverage transfer learning, which involves adapting a pre-trained model to a new task or domain by fine-tuning its parameters on a smaller dataset. In conclusion, this talk will provide a practical guide to establishing baselines in NLP and moving beyond them through weak learning and transfer learning.

Speakers: Tobias Sterbak