- 1Barcelona Supercomputer Center (BSC), Earth Sciences, Barcelona, Spain
- 2Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona, Spain
- 3RIKEN Center for Computational Science, Kobe, Japan
Modern experimentation with Earth System Models (ESMs) is accelerated by the employment of automated workflows to handle the multiple steps such as simulation execution, post-processing, and cleaning, all while being portable and tracking provenance. And when executing on shared HPC platforms, users usually face long queue times, which increase the time to solution. The community has proposed to aggregate workflow tasks into a single submission in order to save in queue time with promising results. But by doing this the workflow manager has to deal with the remote task execution that otherwise would have been done by the HPC scheduler.
Therefore, we propose to integrate two workflow managers to create a versatile and general solution for the execution of these aggregated workflows: one that orchestrates the workflow globally and another that is in charge of running tasks within an allocation, which we refer to as "in situ."
In this work, we performed a qualitative and quantitative comparison of three suitable and representative workflow and workload managers running in situ, HyperQueue, Flux, and PyCOMPSs, on three of the top 20 HPCs: Lumi, MareNostrum 5, and Fugaku. We evaluated the portability and setup, failure tolerance, programmability, and provenance tracking of each of the tools in the qualitative part. In the quantitative part, we measured total runtime, task runtime, CPU and memory usage, disk write, and node imbalance of workflows running a memory-bound, a CPU-bound, and an IO-intensive application.
Our initial results yield recommendations to the community as to which workflow manager to use in situ. HyperQueue's easy installation and portability makes it the best solution for non-x86 platforms. Flux had the easiest running setup due to its preparedness to run nested in Slurm. Finally, PyCOMPSs is the only tool out of the three to provide provenance tracking with RO-Crates.
How to cite: Giménez de Castro Marciani, M., Acosta, M., Utrera, G., Castrillo, M., and Wahib, M.: Accelerating Earth System Workflows with In Situ Workflow Task Management, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11759, https://doi.org/10.5194/egusphere-egu26-11759, 2026.