- 1Forschungszentrum Jülich GmbH, Jülich Supercomputing Centre (JSC), Jülich, Germany (c.hinz@fz-juelich.de)
- 2Forschungszentrum Jülich GmbH, Jülich Supercomputing Centre (JSC), Jülich, Germany
- 3Deutsches Klimarechenzentrum GmbH (DKRZ), Datamanagement, Hamburg, Germany
- 4University of Cologne, Department of Mathematics and Computer Science, Cologne, Germany
Jülich Supercomputing Centre (JSC) is forging a public dataspace for Earth system data. Data will be made available on both storage clusters at JSC, ExaStore and Jülich Storage Cluster (JUST), which provide petabyte-scale storage to the exascale system JUPITER and our pre-exascale systems, respectively. We provide insights in the ongoing implementation of new services for the data management as well as the selected tools for data access. This also covers the creation of a metadata catalog based on the SpatioTemporal Asset Catalog (STAC) specifications.
Background:
Improvements in computational speed lead to better simulations in Earth System Modeling (ESM), by allowing them to resolve scales of a few kilometers. The volume of the resulting data greatly increases with the improvements in resolution and poses challenges for data processing and storage.
Currently a widespread use case gaining popularity in ESM is the training of machine learning (ML) models for weather and climate applications. They require fast access to datasets, which is supported by a special structure within the datasets with anemoi-zarr being a prominent file structure.
Numerical and ML applications demand an easy and FAIR access to datasets. The simplification of subsequent data processing and analysis requires access without the necessity to create individual local copies, either through shared storage or through access over the web.
JSC is a multipurpose high performance computing (HPC) center with ESM being a major user group. With Europe's first exascale system JUPITER, JSC has become the host for a second HPC infrastructure including the dedicated storage cluster ExaStore. ExaStore is designed to provide the high bandwidth, low latency and scalability required to efficiently support data-intensive workloads on JUPITER.
Jülich MeteoCloud is a central data repository for meteorological data on JUST, which is accessible from our pre-exascale systems, such as JUWELS and JURECA-DC. It covers a wide range of datasets, from reanalysis data to satellite observations with the total amount of data being currently about 4PB. With the extension to ExaStore we introduce a new branch for ML-ready datasets. The limited overall storage capacity at JSC calls for a reduction of data duplicates, in particular across project data spaces, and requires services for data movement and also staging of ML-ready datasets on demand.
Within the WarmWorld Easier project JSC and the German Climate Computing Center (DKRZ) co-develop and deploy services for data access. A core aspect is the findability of data, which is ensured with STAC. Each asset provides the necessary information to open the dataset described by the particular catalog entry in a specific way like, using file path when accessing from disk or URL for access through a web service.
With a combination of these approaches we will improve the infrastructure for Earth system sciences at JSC and provide reliable, low-latency access to stored datasets. As a first use case we will include ML-ready datasets for the WeatherGenerator project in the MeteoCloud.
How to cite: Hinz, C., Grießbach, S., Hoffmann, L., Kreshpa, E., Modali, K., Peters-von Gehlen, K., Rushchanskii, K., Saini, R., Stein, O., and Schultz, M.: Creation of a Public Dataspace for Earth System Data at Jülich Supercomputing Centre, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19486, https://doi.org/10.5194/egusphere-egu26-19486, 2026.