Zarr at scale: virtualization, sharding, and performance optimizations for Earth science data

Max Jones; Joe Hamman; Davis Bennett; Kyle Barron; Justus Magin

doi:https://doi.org/10.5194/egusphere-egu26-15196

[Back] [Session ESSI2.2]

EGU26-15196, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-15196

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Zarr at scale: virtualization, sharding, and performance optimizations for Earth science data

Max Jones¹, Joe Hamman², Davis Bennett, Kyle Barron¹, and Justus Magin³

Max Jones et al.

¹Development Seed, Washington, D.C., United States of America (max@developmentseed.org)
²Earthmover
³LOPS - Laboratoire d'Oceanographie Physique et Spatiale, UMR 6523 CNRS-IFREMER-IRD-Univ.Brest-IUEM

As geoscientific datasets continue to grow in size and complexity, the Zarr community has developed a modern, open-source solution for storage and I/O of multi-dimensional arrays and metadata. Zarr offers a high-performance, highly scalable, cloud-native container for scientific data, which allows scientists to transcend the constraints of individual files and think in terms of coherent datasets. Zarr’s potential has led to widespread adoption across government, industry, and academia. In this presentation, we offer practical guidance for how to leverage the latest and greatest features in the Zarr ecosystem, including:

Sharding to reduce the number of files, benefiting HPC users in particular
Virtualization via VirtualiZarr and Icechunk to enable high-performance access to data spread across NetCDF4/HDF5, GRIB, or GeoTIFF files
Custom data types, compression schemes, and variable chunk grids
Client-side (i.e., in-browser) rendering of large multidimensional geospatial datasets

Through concrete examples and best practices, we demonstrate how the Zarr ecosystem enables researchers to work with multi-terabyte datasets as seamlessly as small files.

How to cite: Jones, M., Hamman, J., Bennett, D., Barron, K., and Magin, J.: Zarr at scale: virtualization, sharding, and performance optimizations for Earth science data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15196, https://doi.org/10.5194/egusphere-egu26-15196, 2026.