Predicting areas for PR Comments based on Code Vectors & Mailing List Data

FOSDEM 2019

Many of us have seen the small PR changing 20 lines with 20 comments and the large PR with over 1k line changes sail through (or stall) with no comments because who has the time to read all that text? What if we could predict areas of the PR that are more likely to need attention? This talk will explore creating a model to predict areas of PRs that will generate comments on, using a combination of historic comment data and mailing list stack traces. Then once we’ve built this model we’ll explore how to serve it live with Kubeflow as well as how Kubeflow and K8s can work together to allow us to scale our model training to a larger set of projects.

Speakers: Holden Karau Kris Nova