EGU25-10288, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-10288
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
STAC for federated data access to high-volume ESM datasets in preparation for Exascale
Kameswar Rao Modali1, Karsten Peters-von Gehlen1, Florian Ziemen1, Carsten Hinz2, Rajveer Saini2, and Martin Schultz2
Kameswar Rao Modali et al.
  • 1Deutsches Klimarechenzentrum(DKRZ), Hamburg, Germany
  • 2Jülich Supercomputing Center(JSC), Jülich, Germany

Currently certain earth system models, due to their advanced modeling capabilities and improved computational power, can perform simulations at extremely high resolutions as close to a km. The data from these simulations act as drivers for many downstream scientific research applications as well as decision making tools that aid in policy making. These applications in turn depend on shared or standalone computational resources at HPC infrastructures. As a result the federated data access system design is required to revolve around a triad comprising of:

  • Data

  • Analysis Tools

  • Computing resources

Further, at each of the HPC infrastructures, depending on the earth system model, the format of the data being produced varies. Furthermore, each center has its own combination of storage tiers, each of which are subject to specific hardware constraints. Also, based on the focus of the scientific research, the data usage pattern differs. Hence the organization of the data at each data center for efficient discoverability within the federated data access should cater to :

  • Technicalities of the data ( format, size, file count etc.)

  • Usage pattern of the data

  • Constraints arising due to the Specifications and Limitations of the storage tiers.

Spatial Temporal Asset Catalogs (STAC) fundamentally cater to the discoverability of data corresponding to a specific geographic location associated with a particular time instance or duration. ESM data are a natural fit for such representation. In the present work we provide an overview of the application of STAC for the federated data access within the Warmworld project at the DKRZ and JSC HPC centers. We explain how each of the aforementioned factors at each data center have been addressed and display concrete benefits for data producers and reusers.

How to cite: Modali, K. R., Peters-von Gehlen, K., Ziemen, F., Hinz, C., Saini, R., and Schultz, M.: STAC for federated data access to high-volume ESM datasets in preparation for Exascale, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10288, https://doi.org/10.5194/egusphere-egu25-10288, 2025.