- 1University of Oxford, Department of Physics, Oxford, United Kingdom
- 2University of Helsinki, Helsinki, Finland
- 3European Centre for Medium-Range Weather Forecasts, Bonn, Germany
- 4National Centre for Atmospheric Science, Department of Meteorology, University of Reading, Reading, United Kingdom
- 5Colorado School of Mines, Golden, Colorado, United States of America
- 6The National Center for Atmospheric Research, Boulder, Colorado, United States of America
The rapid growth of weather and climate datasets is increasing the pressure on data centres and hinders scientific analysis and data distribution. For example, kilometre-scale weather and climate models can generate 20 gigabytes of data per second when run operationally, making it generally infeasible to store all output unless advanced compression is applied.
To address this challenge, novel lossy compression techniques, including recently so-called neural compressors which learn smaller representations of climate data, have been proposed with compression factors beyond 100x. However, if applied without care, lossy compression can remove valuable information from a dataset for often unknown downstream applications. It is therefore important to validate that the compression process does not alter scientific conclusions drawn from the data. Whether the compression error is tolerable is often easier to assess for domain experts and rarely well defined.
Here, we address this challenge by presenting a benchmark suite for lossy compression of climate data (atmosphere, ocean, and land). We are defining data sets that can be used to train neural compressors as well as corresponding evaluation methods. Compressors have to pass a set of tests for each data set while compressing into the smallest file size possible at a reasonable (de)compression speed. To ensure evaluation on a diverse set of inputs, the benchmark covers climate variables following various statistical distributions at medium to very high resolution in time (hourly to yearly) and space (~1 km to 150 km). Evaluation tests are for single and multi-variable compression of gridded data with stable or changing statistics, random data access or large archives, in medium to very large datasets.
To provide references towards what compression levels can be achieved with current state of the art lossy compressors, we also evaluate a set of baseline compressors (SZ3, ZFP, Real Information) on our benchmark tasks. The benchmark is a quality check for new compressors towards a standardization of climate data compression, aiming to make compressors with high compression factors safe to use and widely supported.
How to cite: Reichelt, T., Tyree, J., Kloewer, M., Dueben, P., Lawrence, B., Hammerling, D., Baker, A., Faghih-Naini, S., and Stier, P.: ClimateBenchPress: A Benchmark for Compression of Climate Data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-15912, https://doi.org/10.5194/egusphere-egu25-15912, 2025.