Interactive debugging inside CI systems

FOSDEM 2021

Commonly used CI systems operate as SAAS solutions, where the user is not running the CI stack locally. This lends itself to debugging pitfall as developers cannot easily reproduce the problem locally and cannot interactively examine it. This talk proposes an inverted design, where self-operated CI tool can be used both in the cloud as well as locally, supporting interactive debugging sessions.

The recent surge of CI systems has created an interesting new problem, where a failure occurs in a specific test environment but does not appear in the familiar environment used by the developer.

This problem is compounded by the batch nature of such systems, where a developer can merely push additional patches to some branch to trigger an asynchronous execution process.

During the development of Ubuntu Core operating system, this problem was amplified by the fact building and testing a full OS image is a time-consuming process, leading to cycles that spanned hours and lead to frustration. Snapd developers created the spread program to solve this, among other, problem.

The SAAS solution became a thin wrapper around spread, which allocates, provisions, uses and finally discards the test environment. Crucially, almost all errors can be reproduced locally, as spread runs as a standalone tool, using QEMU, LXD, Google Compute Engine or Linode as executors.

This allows anyone to run spread locally, in interactive mode, and explore the problem without putting additional load on the centralized CI system, greatly improving the debugging process.

Speakers: Zygmunt Krynicki