- 1Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, United States of America (runderwood@anl.gov)
- 2The University of Chicago, Chicago, United States of America
- 3University of Houston, Houston, United States of America
- 4Florida Sate University, Tallahassee, United States of America
As climate and weather scientists strive to increase accuracy and understanding of our world, models of weather and climate have increased in their resolution to square kilometers scale and become more complex increasing their demands for data storage. A recent study SCREAM run at 3.5km resolution produced nearly 4.5TB of data per simulated day, and the recent CMIP6 simulations produced nearly 28PB of data. At the same time, storage and power capacity at facilities conducting climate experiments are not increasing at the same rate as the volume of climate and weather datasets leading to a pressing challenge to reduce data volumes. While some in the weather and climate community have adopted lossless compression, these techniques frequently produce compression ratios on the order of 1.3$\times$, which are insufficient to alleviate storage constraints on facilities. Therefore, additional techniques, such as science-preserving lossy compression that can achieve higher compression ratios, are necessary to overcome these challenges.
While data compression is an important topic for climate and weather applications, many of the current assessments of the effectiveness of climate and weather datasets do not consider the state of the art in compressor design and instead, asses scientific compressors that are 3-11 years old, substantially behind the state of the art. In this report:
- We assess the current state of the art in advanced scientific lossy compressors against the state of the art in quality assessment criteria proposed for the ERA5 dataset to assess the current gaps between needed performance requirements and the capabilities of the current compressors.
- We present new capabilities that allow us to build an automated, user-friendly, and extensible pipeline for quickly finding compressor configurations that maximize compression ratios while preserving scientific integrity of the data using codes developed as part of the NSF FZ project.
- We demonstrate a number of capabilities that facilitate use within in the weather and climate community including NetCDF, HDF5, and GRIB file format support; support for innovation via Python, R, and Julia as well as low level languages such as C/C++; and the implementations of commonly used climate quality metrics including dSSIM, and the ability to extend to add new metrics in high-level languages
- Utilizing this pipeline, We find that with advanced scientific compressors, it is possible to achieve a 6.4x improvement or more in compression ratio over previously evaluated compressors
How to cite: Underwood, R., Liu, J., Zhao, K., Di, S., and Cappello, F.: Evaluating Advanced Scientific Compressors on Climate Datasets, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-7371, https://doi.org/10.5194/egusphere-egu25-7371, 2025.