Many emerging scientific workflows that target high-end HPC systems require a complex interplay with resource and job management software (RJMS). However, portable, efficient and easy-to-use scheduling of these workflows is still an unsolved problem. In this talk, we present Flux, a next-generation RJMS designed specifically to address the key scheduling challenges of modern workflows in a scalable, easy-to-use, and portable manner. At the heart of Flux lies its ability to be seamlessly nested within batch allocations created by itself as well as other system schedulers (e.g., SLURM, MOAB, LSF, etc), serving the target workflows as their “personal RJMS instances”. In particular, Flux’s consistent and rich set of well-defined APIs portably and efficiently support those workflows that can feature non-traditional patterns such as complex co-scheduling, massive ensembles of small jobs and coordination among jobs in an ensemble.
We will also cover how the Flux-Framework project is structured around open-source development, including our use of the Collective Code Construction Contract (C4), RFCs, LGPL, and various online open-source platforms. We discuss how these choices of open-source processes have influenced the repo structure, the code, our collaborations, and even the sub-teams within the project.
Expected prior knowledge / intended audience: Audience should have basic knowledge of batch job systems; knowledge of or experience with running scientific workflows is a plus. There will be some background on common workflows in the talk. This will be interesting to HPC users, workflow developers, and admins.
Speaker bio: Stephen Herbein is a computer scientist in Livermore Computing at Lawrence Livermore National Laboratory. His research interests include batch job
scheduling, parallel IO, and data analytics. He is a part of the Flux team, developing next-generation IO-aware and multi-level schedulers for HPC.
Links to previous talks by the speaker:
- http://flux-framework.org/papers/Flux-DevDay-2018-Slides.pdf
- https://github.com/flux-framework/tutorials
See https://herbein.net/Herbein_CV.pdf for more (including papers on Flux)