EGU26-18841, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-18841
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Monday, 04 May, 14:00–15:45 (CEST), Display time Monday, 04 May, 14:00–18:00
 
Hall X4, X4.118
Efficient large-scale data structuring to support Earth System Science analytics workflows
Donatello Elia, Gabriele Tramonte, Cosimo Palazzo, Valentina Scardigno, and Paola Nassisi
Donatello Elia et al.
  • CMCC Foundation - Euro-Mediterranean Center on Climate Change, Lecce, Italy

The amount of data produced by Earth System Model (ESM) is continuously growing, driven by their higher resolution and complexity. Approaches for efficient data access, management, and analysis are, thus, needed now more than ever to tackle the challenges related to these large volumes. Moreover, data generated by ESM simulations could be organized in a way that is not the most effective for data analytics, slowing down scientists’ productivity. In this context, novel data formats and proper chunking strategies can significantly speed up access and processing of Earth system data and, in turn, the whole analysis workflow. 

In the scope of ESiWACE3 - Centre of Excellence in Simulation of Weather and Climate in Europe - we experimented the impact of different data formats and chunking configurations on high-performance data analytics operations/workflows. In particular, we evaluated performance of the well-known NetCDF format and the more recent cloud-native Zarr format, which is being increasingly used in Earth Science data analytics workflows and machine learning applications. Results show that the use of a proper data format and structure can noticeably reduce the time required for executing these analytics workflows, provided the structure is carefully tuned (e.g., chunking).

The work presents the main outcomes of such evaluation and how we are exploiting this knowledge to enhance Earth system data management workflows. In particular, the results achieved have contributed to enabling a more efficient access, delivery and analysis of large-scale data in CMCC’s tools and services, which are involved in different initiatives, including the ICSC - National Centre on High Performance Computing, Big Data and Quantum Computing.

How to cite: Elia, D., Tramonte, G., Palazzo, C., Scardigno, V., and Nassisi, P.: Efficient large-scale data structuring to support Earth System Science analytics workflows, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18841, https://doi.org/10.5194/egusphere-egu26-18841, 2026.