EGU23-6857, updated on 09 Jan 2024
EGU General Assembly 2023
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Convergence of HPC, Big Data and Machine Learning for Earth System workflows

Donatello Elia1, Sonia Scardigno1, Alessandro D'Anca1, Gabriele Accarino1,2, Jorge Ejarque3, Francesco Immorlano1,2, Daniele Peano1, Enrico Scoccimarro1, Rosa M. Badia3, and Giovanni Aloisio1,2
Donatello Elia et al.
  • 1Centro Euro-Mediterraneo sui Cambiamenti Climatici (CMCC), Lecce, Italy
  • 2Università del Salento, Dept. of Engineering for Innovation, Lecce, Italy
  • 3Barcelona Supercomputing Center (BSC), Barcelona, Spain

Typical end-to-end Earth System Modelling (ESM) workflows rely on different steps including data pre-processing, numerical simulation, output post-processing, as well as data analytics and visualization. The approaches currently available for implementing scientific workflows in the climate context do not properly integrate the entire set of components into a single workflow and in a transparent manner. The increasing usage of High Performance Data Analytics (HPDA) and Machine Learning (ML) in climate applications further exacerbate the issues. A more integrated approach would allow to support next-generation ESM and improve the workflow in terms of execution and energy consumption.

Moreover, a seamless integration of components for HPDA and ML into the ESM workflow will open the floor to novel applications and support larger scale pre- and post-processing. However, these components typically have different deployment requirements spanning from HPC (for ESM simulation) to Cloud computing (for HPDA and ML). It is paramount to provide scientists with solutions capable of hiding the technical details of the underlying infrastructure and improving workflow portability.

In the context of the eFlows4HPC project, we are exploring the use of innovative workflow solutions integrating approaches from HPC, HPDA and ML for supporting end-to-end ESM simulations and post-processing, with a focus on extreme events analysis (e.g., heat waves and tropical cyclones). In particular, the envisioned solution exploits PyCOMPSs for the management of parallel pipelines, task orchestration and synchronization, as well as PyOphidia for climate data analytics and ML frameworks (i.e., TensorFlow) for data-driven event detection models. This contribution presents the approaches being explored in the frame of the project to address the convergence of HPC, Big Data and ML into a single end-to-end ESM workflows.

How to cite: Elia, D., Scardigno, S., D'Anca, A., Accarino, G., Ejarque, J., Immorlano, F., Peano, D., Scoccimarro, E., Badia, R. M., and Aloisio, G.: Convergence of HPC, Big Data and Machine Learning for Earth System workflows, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6857,, 2023.

Supplementary materials

Supplementary material file