The materials presented during this tutorial are open source and can be used by coaches and tutors who want to teach their students how to use Python for text processing and text classification. (A minimal understanding of programming (in any language) is required by the students)
The materials presented at this tutorial were initially created for high school and university students to help them to get started with their first machine learning project using textual data. Machine learning on textual data is more accessible for beginners because it does not involve missing data imputation, normalisation and scaling. It is also easier to analyse and interpret the results (e.g. why something was misclassified). There are many introductory courses on NLP on the internet, however, they are not for free and they either only cover complete basics¹, or do not cover machine learning algorithms² and treat models as a black box. Also, they do not show how to do research correctly (e.g. setting a baseline, making design decisions based on correct validation etc). These materials in the form of jupyter notebooks can be used by teachers to guide their students through an NLP research project from start to finish. These materials are of course not limited to teachers and tutors at academic institutions. Many companies rely on customer reviews, social media, client records, and various other content created in natural language, but often use sub-optimal solutions to analyse it (like MS Excel). These materials will give working professionals all the tools to get started with text analysis, as well as teach them the fundamentals of machine learning, so they can automate document labelling and other manual tasks with the help of document classification (e.g. Is a customer review positive or negative? Is a certain document about topic X or topic Y?). A minimal understanding of programming (in any language) is required. However, all necessary Python libraries will be covered. The aim of the tutorial would be to present the materials which contains 7 “lectures”, several practical exercises with solutions, and a case study and hence can be covered in either 10 hours (10 weeks) over a term or a 2-day workshop. ¹https://www.udemy.com/course/natural-language-processing/ ²https://www.udemy.com/course/nlp-natural-language-processing-with-python/
Speakers: Lisa Andreevna Chalaguine