conferences | speakers | series

Contributing to an open-source content library for NLP

home

Contributing to an open-source content library for NLP
PyCon DE & PyData Berlin 2023

Bricks is an open-source content library for natural language processing, which provides the building blocks to quickly and easily enrich, transform or analyze text data for machine learning projects. For many Pythonistas, contributing to an open-source project seems scary and intimidating. In this tutorial, we offer a hands-on experience in which programmers and data scientists learn how to code their own building blocks and share their creations with the community with ease.

We will prepare some easy-to-use cases so that attendees with novice machine learning and NLP skills can participate in the session. A basic understanding of Python is required, but everyone who wants to learn more about machine learning, NLP, or open-source contributions is welcome. A brick is a modular piece of software that enriches, transforms, or analyzes text data for natural language processing, a sub-domain of machine learning. What sets a brick apart from a simple code snippet is its suitability for multiple execution environments. A brick module can also be executed in a demo playground, allowing users to try out different inputs to see if the brick meets their needs. In this session, we will begin by outlining some ideas for building a brick. After substantiating our ideas, we will make the code usable in different environments, such as the playground for testing inputs. Since SpaCy is commonly used in many NLP projects, we will also build a variant of the code that takes a SpaCy document as input. Add some documentation, and voila! You now have a brick.

Speakers: Leonard PĆ¼ttmann