- 1Development Seed, Washington, D.C., United States of America (max@developmentseed.org)
- 2Earthmover
- 3LOPS - Laboratoire d'Oceanographie Physique et Spatiale, UMR 6523 CNRS-IFREMER-IRD-Univ.Brest-IUEM
As geoscientific datasets continue to grow in size and complexity, the Zarr community has developed a modern, open-source solution for storage and I/O of multi-dimensional arrays and metadata. Zarr offers a high-performance, highly scalable, cloud-native container for scientific data, which allows scientists to transcend the constraints of individual files and think in terms of coherent datasets. Zarr’s potential has led to widespread adoption across government, industry, and academia. In this presentation, we offer practical guidance for how to leverage the latest and greatest features in the Zarr ecosystem, including:
- Sharding to reduce the number of files, benefiting HPC users in particular
- Virtualization via VirtualiZarr and Icechunk to enable high-performance access to data spread across NetCDF4/HDF5, GRIB, or GeoTIFF files
- Custom data types, compression schemes, and variable chunk grids
- Client-side (i.e., in-browser) rendering of large multidimensional geospatial datasets
Through concrete examples and best practices, we demonstrate how the Zarr ecosystem enables researchers to work with multi-terabyte datasets as seamlessly as small files.
How to cite: Jones, M., Hamman, J., Bennett, D., Barron, K., and Magin, J.: Zarr at scale: virtualization, sharding, and performance optimizations for Earth science data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15196, https://doi.org/10.5194/egusphere-egu26-15196, 2026.