EGU26-9640, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-9640
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Tuesday, 05 May, 08:30–10:15 (CEST), Display time Tuesday, 05 May, 08:30–12:30
 
Hall X4, X4.85
A Service-Oriented Distributed Zarr Solution for Climate Data Access Across Heterogeneous HPC and Storage Infrastructures
Mostafa Hadizadeh, Martin Bergemann, Etor Lucio Eceiza, Andrej Fast, and Christopher Kadow
Mostafa Hadizadeh et al.
  • German Climate Computing Centre (DKRZ), Hamburg, Germany

Modern climate archives are increasingly distributed across heterogeneous storage systems, while analysis workflows are becoming more interactive, distributed, and cloud-native. Moreover, many high-performance computing centres (HPC) host large climate datasets on traditional file-based storage infrastructures, whereas computational resources are often located at different sites. This separation between data location and compute resources creates significant barriers to efficient, interactive, and scalable data access.

This situation calls for climate data access services that are scalable, flexible, and independent of specific client-side environments, while supporting common climate data formats such as NetCDF, GeoTIFF, Zarr, HDF5, and GRIB. Nevertheless, efficient remote access to large and heterogeneous climate archives remains a major bottleneck for modern scientific workflows.

We present aservice, the Freva Data Loader, which implements the logic required to open datasets from diverse storage backends and expose them as Zarr chunks  through a lightweight, web-friendly REST interface with modern authentication mechanisms.

The Freva Data Loader is implemented as a stateless, worker service exposing a REST interface for dataset access and Zarr endpoint generation. Upon receiving an authenticated request, the service resolves dataset metadata, opens the underlying data from the appropriate storage backend (e.g. POSIX file systems or object storage), and exposes the data as a Zarr-compatible, chunked stream. Authentication and authorisation are handled centrally using OAuth2, ensuring secure and controlled access across institutional boundaries. Requests are coordinated by a Loader component and distributed to worker instances via a message broker (Redis), enabling asynchronous execution and horizontal scalability.

The service decouples data access from client-side tooling and enables users and applications to access data stored on traditional posix HPC file  systems, tape archives, as well as cloud-based object storage through a unified Zarr interface. Instead of transferring complete files between data centres or downloading them in full, clients retrieve only the required data chunks on demand. Users and client applications can request chunked array access over the network and process data incrementally, supporting interactive exploration and scalable downstream computation using cloud-native, chunked storage semantics, while remaining compatible with existing analysis stacks based on Zarr, xarray, and Dask.

How to cite: Hadizadeh, M., Bergemann, M., Lucio Eceiza, E., Fast, A., and Kadow, C.: A Service-Oriented Distributed Zarr Solution for Climate Data Access Across Heterogeneous HPC and Storage Infrastructures, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9640, https://doi.org/10.5194/egusphere-egu26-9640, 2026.