The Linux community is slowly moving towards better quality trough automated testing to prevent regressions in mainline and stable trees. However, Linux is full of HW-specific code which makes validation of patches impossible for individual developers, which leads to regressions. In this talk, we will explain how we solved these issues by getting inspired by Linux's development model, and how we extend it to the development of our testsuite, CI infrastructure and bug handling.
After 2 years of activity, this led Linus Torvalds to say i915's quality has greatly improved compared to other graphic drivers.
Linux's development model has been described as being
akin to a bazaar, where any developer can make changes to Linux as long
as they strictly improve the state of Linux, without regressing any
application that currently runs on it. This allows Linux users to update
their kernels and benefit from the work of all developers, without
having to fix anything in their applications when a new version comes.
Unfortunately, it is impossible for developers to try their changes on
all the different hardware and userspace combination being used in the wild.
Typically, a developer will mostly test the feature he/she is working on
with the hardware at hand before submitting the patch for review. Once
reviewed, the patch can land in a staging repository controlled by the
maintainer of the subsystem the patch is changing. Validation of the
staging tree is then performed ahead of sending these changes to Linus
Torvalds (or one of his maintainers). Regressions caught at this point
require to bisect the issue, which is time consuming and usually done by
a separate team, which may become a bottleneck. Sometimes they let
regressions through, hoping to be able to fix them during the -rc cycles.
To address this bottleneck, the developer should be responsible for
validating the change completely. This leads to a virtuous cycle as not
only developers can rework their patches until they do not break
anything (saving the time of other people), but they also become more
aware of the interaction their changes have on userspace, which improves
their understanding of the driver which leads to better future patches.
To enable putting the full cost of integration on developers, validation
needs to become 100% automated, have 100% code/HW coverage of the
userspace usecases, and provide timely validation results to even the
most time-pressured developers. To reach these really ambitious
objectives, driver developers and validation engineers need to be
considered as one team. The CI system developers need to provide a
system capable of reaching the objectives, and driver developers need to
develop a test suite capable of reaching the goal of having 100% code
coverage of the whole driver on the CI system provided to them.
Finally, this increase in understanding of how validation is done allows
developers to know if their patch series will be properly validated,
which reduces the risk of letting regressions land in Linux.
The devil however lies in the details, so in this talk, we will explain
how we are going from theory to practice, what is our current status and
what we are doing to get closer to our ambitious goal! We will describe
the current developer workflow and demonstrate how we empowered
developers by providing timely testing as a transparent service to
anyone sending patches to our mailing lists.