EGU26-9202, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-9202
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Friday, 08 May, 16:15–18:00 (CEST), Display time Friday, 08 May, 14:00–18:00
 
Hall X4, X4.98
PID-Driven Global Access to Flagship km-scale Climate Simulation Data
Karsten Peters-von Gehlen1, Kameswar Rao Modali2, Florian Ziemen2, Martin Bergemann2, Christopher Kadow2, Karl-Hermann Wieners3, Siddhant Tibrewal3, Ivonne Anders2, Katharina Berger2, Tobias Kölling3, Lukas Kluft3, Marco Kulüke2, and Fabian Wachsmann2
Karsten Peters-von Gehlen et al.
  • 1Deutsches Klimarechenzentrum GmbH (DKRZ), Datamanagement, Hamburg, Germany (peters@dkrz.de)
  • 2Deutsches Klimarechenzentrum GmbH (DKRZ), Datamanagement, Hamburg, Germany
  • 3Max Planck Institute for Meteorology, Hamburg, Germany

Climate science enterprise both produces and depends on extremely large datasets in order to meet the needs of diverse scientific and downstream user communities, especially as climate models are increasingly run at kilometre-scale resolutions, resulting in rapidly growing data volumes which increase demands on data handling infrastructures. Individual flagship simulations are no longer used by a single research group, but are routinely reused by dozens or even hundreds of researchers globally. Consequently, data findability, accessibility and reuse must be straightforward, data provenance must be transparent, and the full heritage of simulation data should be preserved in a machine-actionable manner to ensure scientific rigour, explainability and reproducibility.

In this contribution, we present a conceptual infrastructure-level approach developed within the WarmWorld project based on leveraging the versatility of globally unique persistent identifiers (PIDs) to address these challenges. Specifically, we illustrate that by assigning handles to simulation datasets already at the point of production, simulation data stored locally at a HPC data center can become part of a globally interoperable data ecosystem. In our concept, handle profiles contain an URL at which the dataset can be opened. Further, machine-actionable metadata, such as the detailed provenance information describing the employed model configuration or a data reuse license and citation, would be available from the handle landing page. Thus, the motivation behind the approach we follow here is akin to that of the FDO specifications.

Finalized simulation datasets would be exposed through globally accessible SpatioTemporal Asset Catalogs (STAC), where PIDs serve as the authoritative entry point for discovery and access. Data access would be handled by system libraries that resolve storage locations across heterogeneous storage tiers. Crucially, data access shall be designed to be globally open without the need for credentials, reflecting a strong demand from the climate research community, as clearly demonstrated during the WCRP kilometre-scale hackathon (May 2025).

Systematic assignment and pragmatic leveraging of handles assigned to locally stored datasets can thus enable scalable and interoperable access to flagship climate datasets across infrastructures and communities, effectively integrating traditionally closed HPC data environments into the global data space and facilitating interoperability with other large-scale data holdings.

How to cite: Peters-von Gehlen, K., Modali, K. R., Ziemen, F., Bergemann, M., Kadow, C., Wieners, K.-H., Tibrewal, S., Anders, I., Berger, K., Kölling, T., Kluft, L., Kulüke, M., and Wachsmann, F.: PID-Driven Global Access to Flagship km-scale Climate Simulation Data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9202, https://doi.org/10.5194/egusphere-egu26-9202, 2026.