Intel GFX CI: Validation done the Linux way

FOSDEM 2019

The Linux community is slowly moving towards better quality trough automated testing to prevent regressions in mainline and stable trees. However, Linux is full of HW-specific code which makes validation of patches impossible for individual developers, which leads to regressions. In this talk, we will explain how we solved these issues by getting inspired by Linux's development model, and how we extend it to the development of our testsuite, CI infrastructure and bug handling.

After 2 years of activity, this led Linus Torvalds to say i915's quality has greatly improved compared to other graphic drivers.

Linux's development model has been described as being akin to a bazaar, where any developer can make changes to Linux as long as they strictly improve the state of Linux, without regressing any application that currently runs on it. This allows Linux users to update their kernels and benefit from the work of all developers, without having to fix anything in their applications when a new version comes. Unfortunately, it is impossible for developers to try their changes on all the different hardware and userspace combination being used in the wild.

Typically, a developer will mostly test the feature he/she is working on with the hardware at hand before submitting the patch for review. Once reviewed, the patch can land in a staging repository controlled by the maintainer of the subsystem the patch is changing. Validation of the staging tree is then performed ahead of sending these changes to Linus Torvalds (or one of his maintainers). Regressions caught at this point require to bisect the issue, which is time consuming and usually done by a separate team, which may become a bottleneck. Sometimes they let regressions through, hoping to be able to fix them during the -rc cycles.

To address this bottleneck, the developer should be responsible for validating the change completely. This leads to a virtuous cycle as not only developers can rework their patches until they do not break anything (saving the time of other people), but they also become more aware of the interaction their changes have on userspace, which improves their understanding of the driver which leads to better future patches.

To enable putting the full cost of integration on developers, validation needs to become 100% automated, have 100% code/HW coverage of the userspace usecases, and provide timely validation results to even the most time-pressured developers. To reach these really ambitious objectives, driver developers and validation engineers need to be considered as one team. The CI system developers need to provide a system capable of reaching the objectives, and driver developers need to develop a test suite capable of reaching the goal of having 100% code coverage of the whole driver on the CI system provided to them.

Finally, this increase in understanding of how validation is done allows developers to know if their patch series will be properly validated, which reduces the risk of letting regressions land in Linux.

The devil however lies in the details, so in this talk, we will explain how we are going from theory to practice, what is our current status and what we are doing to get closer to our ambitious goal! We will describe the current developer workflow and demonstrate how we empowered developers by providing timely testing as a transparent service to anyone sending patches to our mailing lists.

Speakers: Martin Peres