- 1Deutsches Klimarechenzentrum(DKRZ), Hamburg, Germany
- 2Jülich Supercomputing Center(JSC), Jülich, Germany
Currently certain earth system models, due to their advanced modeling capabilities and improved computational power, can perform simulations at extremely high resolutions as close to a km. The data from these simulations act as drivers for many downstream scientific research applications as well as decision making tools that aid in policy making. These applications in turn depend on shared or standalone computational resources at HPC infrastructures. As a result the federated data access system design is required to revolve around a triad comprising of:
-
Data
-
Analysis Tools
-
Computing resources
Further, at each of the HPC infrastructures, depending on the earth system model, the format of the data being produced varies. Furthermore, each center has its own combination of storage tiers, each of which are subject to specific hardware constraints. Also, based on the focus of the scientific research, the data usage pattern differs. Hence the organization of the data at each data center for efficient discoverability within the federated data access should cater to :
-
Technicalities of the data ( format, size, file count etc.)
-
Usage pattern of the data
-
Constraints arising due to the Specifications and Limitations of the storage tiers.
Spatial Temporal Asset Catalogs (STAC) fundamentally cater to the discoverability of data corresponding to a specific geographic location associated with a particular time instance or duration. ESM data are a natural fit for such representation. In the present work we provide an overview of the application of STAC for the federated data access within the Warmworld project at the DKRZ and JSC HPC centers. We explain how each of the aforementioned factors at each data center have been addressed and display concrete benefits for data producers and reusers.
How to cite: Modali, K. R., Peters-von Gehlen, K., Ziemen, F., Hinz, C., Saini, R., and Schultz, M.: STAC for federated data access to high-volume ESM datasets in preparation for Exascale, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10288, https://doi.org/10.5194/egusphere-egu25-10288, 2025.