conferences | speakers | series

Troika: Submit, monitor, and interrupt jobs on any HPC system with the same interface

home

Troika: Submit, monitor, and interrupt jobs on any HPC system with the same interface
FOSDEM 2023

There are a wide variety of HPC systems across the world, and nearly as many ways of interacting with them using job submission systems. Therefore, migrating complex HPC workflows from a system to another may prove challenging. We present Troika, a tool that aims to abstract the details of the job submission system from the user, providing a single entry point for submitting, monitoring, and interrupting jobs on multiple HPC systems. Troika allows for a site-agnostic job script with directives, that can be translated to a script that the job submission system understands, based on configuration.

Troika has been designed with extensibility in mind, to enable support for as many job submission systems as possible, as well as differences in the use of such systems. Troika is free software written in Python, exposing multiple entry points for hooks and plug-ins. It is a fundamental part of ECMWF's 24/7 time-critical operational and research workflows, making the glue between the batch scheduler and the workflow manager, where it handles hundreds of thousands of jobs each day. We will present how Troika works, as well as giving insights into its current and future applications.

Speakers: Olivier Iffrig Axel Bonet