EWoPe: Environmental Workflow Persistence methodology for multi-stage computational processes reproducibility
- 1University of Genoa, DISTAV Department of Earth, Environment and Life Sciences, Genoa, Italy
- 2CNR-IMATI Institute for Applied Mathematics and Information Technologies "Enrico Magenes", Genoa, Italy
Over the past few decades, geoscientists have progressively exploited and integrated techniques and tools from Applied Mathematics, Statistics, and Computer Science to investigate and simulate natural phenomena. Depending on situation, the sequence of computational steps may vary, leading to intricate workflows that are difficult to reproduce and/or further revisit. To ensure that such workflows can be repeated for validation, peer review, or further investigation, it is necessary to implement strategies of workflow persistence, that is the ability to maintain the continuity, integrity, and reproducibility of a workflow over time.
In this context, we propose an efficient strategy to support workflow persistence of Geoscience pipelines (i.e., Environmental Workflow Persistence methodology EWoPe). Our approach enables to document each workflow step, including details about data sources, processing algorithms, parameters, final and intermediate outputs. Documentation aids in understanding the workflow's methodology, promotes transparency, and ensures replicability.
Our methodology views workflows as hierarchical tree data structures. In this representation, each node describes data, whether it's input data or the result of a computational step, and each arc is a computational step that uses its respective nodes as inputs to generate output nodes. The relationship between input and output can be described as either one-to-one or one-to-many, allowing the flexibility to support either singular or multiple outcomes from a single input.
The approach ensures the persistence of workflows by employing JSON (JavaScript Object Notation) encoding. JSON is a lightweight data interchange format designed for human readability and ease of both machine parsing and generation. By this persistence workflow management, each node within a workflow consists of two elements. One encodes the raw data itself (in one or multiple files). The other is a JSON file that describes through metadata the computational step responsible for generating the raw data, including the reference to input data and parameters. Such a JSON file serves to trace and certify the source of the data, offering a starting point for retracing the workflow backward to its original input data or to an intermediate result of interest.
Currently, EWoPe methodology has been implemented and integrated into MUSE (Modeling Uncertainty as a Support for Environment) (Miola et al., STAG2022), a computational infrastructure to evaluate spatial uncertainty in multi-scenario applications such as in environmental geochemistry, reservoir geology, or infrastructure engineering. MUSE allows running specific multi-stage workflows that involve spatial discretization algorithms, geostatistics, and stochastic simulations: the usage of EWoPe methodology in MUSE can be seen as an example of its deployment.
By exploiting transparency of input-output relationships and thus ensuring the reproducibility of results, EWoPe methodology offers significant benefits to both scientists and downstream communities involved in utilizing environmental computational frameworks.
Acknowledgements: Funded by the European Union - NextGenerationEU and by the Ministry of University and Research (MUR), National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.5, project “RAISE - Robotics and AI for Socio-economic Empowerment” (ECS00000035) and by PON "Ricerca e Innovazione" 2014-2020, Asse IV "Istruzione e ricerca per il recupero", Azione IV.5 "Dottorati su tematiche green" DM 1061/2021.
How to cite: Miola, M., Cabiddu, D., Pittaluga, S., and Vetuschi Zuccolini, M.: EWoPe: Environmental Workflow Persistence methodology for multi-stage computational processes reproducibility, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8711, https://doi.org/10.5194/egusphere-egu24-8711, 2024.