EGU26-9928, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-9928
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Monday, 04 May, 17:15–17:25 (CEST)
 
Room 2.24
Research data infrastructure evolution for handling km scale simulations of a warming world
Kameswarrao Modali1, Karsten Peters-von Gehlen1, Fabian Wachsmann1, Florian Ziemen1, Carsten Hinz2, Rajveer Saini2, and Siddhant Tibrewal3
Kameswarrao Modali et al.
  • 1Deutsches Klimarechenzentrum (DKRZ),Hamburg, Germany (modali@dkrz.de)
  • 2Jülich Supercomputing Center (JSC), Jülich, Germany
  • 3Max Planck Institue for Meteorology (MPI-M), Hamburg, Germany

With the advancement of technical capabilities, Earth System Models (ESM) are rapidly moving toward much higher spatial resolutions - down to kilometer scale - to better capture key processes and feedbacks needed for robust climate impact assessments. This growing model complexity places significant demands on data infrastructures, which must evolve to support widespread application of  high-resolution simulations.

This evolution is needed across various stages of the ESM simulation data life cycle, right from the choice of the variables that need to be part of the simulation output, the format of the output, residence period and transfer of the data across various active storage tiers and the final movement to the cold storage tier (tapes) for long time archival. Also tools to handle the discoverability of these data must be developed and implemented. The evolution of the infrastructure also must take hardware constraints into account and should ideally be in line with the FAIR principles.

As part of the Warm World Easier project, these developments were the adaptation of the model output to zarr, a cloud native format, the development of bespoke tools like ‘zarranalyzer’ to handle the movement of the data across storage tiers by creating tarballs suitable also for the tapes, creating reference files for these tarballs in parquet format to summarize the entire dataset and the inception of these into a metadata catalog following the SpatioTemporal Asset Catalog (STAC) standard. Finally, a virtual machine to host the STAC catalog with appropriate access rights for the data providers and data curators within the federated structure, as well as the end users, was set up. 

Applying this data handling concept to km-scale ESM data bridges the gap between infrastructures that produce flagship datasets and those that enable their efficient and reliable reuse by the community. For example, data generated at large, compute-focused HPC centers with limited storage could be transferred to partner centers that provide specialized data services for long-term access and reuse. 

Through the federated and seamless setup of the research data infrastructure, data handling matters are abstracted away from the data users. Hence, the developed setup provides an end to end solution, achieving the objective of providing the km scale ESM simulation output to a broader scientific community tackling the urgent societal problems arising due to a warming planet.

How to cite: Modali, K., Peters-von Gehlen, K., Wachsmann, F., Ziemen, F., Hinz, C., Saini, R., and Tibrewal, S.: Research data infrastructure evolution for handling km scale simulations of a warming world, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9928, https://doi.org/10.5194/egusphere-egu26-9928, 2026.