Improving Machine Learning from Human Feedback

PyCon DE & PyData Berlin 2023

Large generative models rely upon massive data sets that are collected automatically. For example, GPT-3 was trained with data from “Common Crawl” and “Web Text”, among other sources. As the saying goes — bigger isn’t always better. While powerful, these data sets (and the models that they create) often come at a cost, bringing their “internet-scale biases” along with their “internet-trained models.” While powerful, these models beg the question — is unsupervised learning the best future for machine learning? ML researchers have developed new model-tuning techniques to address the known biases within existing models and improve the model’s performance (as measured by response preference, truthfulness, toxicity, and result generalization). All of this at a fraction of the training cost is very low compared to the initial training cost. This talk will explore these Reinforcement Learning from Human Feedback (RLHF) techniques and how open-source machine learning tools like PyTorch and Label Studio can tune off-the-shelf models using direct human feedback. We’ll start by covering traditional RLHF, in which a model is given a set of prompts to generate outputs. These prompt/output pairs are then graded by human annotators who rank pairs according to a desired metric, which are then used as a reinforcement learning data set to optimize the model to produce results closer to the metric criteria. Next, we’ll discuss recent advances within this field and the advantages they provide. One advance we’ll dive into is the use of Human Language Feedback, in which ranks are replaced with human-language summaries that take full advantage of the “full expressiveness of language that humans use.” This contextual feedback, along with the original prompt and output of the model, is used to generate a new set of model refinements. The model is then tuned with these refinements to match the new output to the human feedback. In a 2022 study, researchers at NYU reported that “using only 100 samples of human-written feedback finetunes a GPT-3 model to roughly human-level summarization ability.” It’s advances like these that are providing advantages in terms of accuracy and bias reduction. Finally, we’ll leave you with examples and resources on implementing these training methods using publicly available models and open-source tools like PyTorch and Label Studio to help retrain models for targeted applications. As this industry continues to grow, evolve, and develop into more widespread applications, we must approach this space with ethics and sustainability in mind. By combining the power and expansiveness of these widely-popular “internet-scale models” with specific, targeted, human approaches, we can avoid the “internet-scale biases” that threaten the legitimacy and trustworthiness of the industry as a whole.

Speakers: Erin Mikail Staples Nikolai