Improving lossy compression for climate datasets with SZ3
- Argonne National Laboratory, MCS, Lemont, United States of America (cappello@anl.gov)
The projection into 2030 of the climate data volume increase brings an important challenge to the climate science community. This is particularly true for the CMIP7 that is projected to need about an Exabyte of storage capacity. Error-bounded lossy compression is explored as a potential solution to the above problem by different climate research teams. Several lossy compression schemes have been proposed leveraging different forms of decorrelation (transforms, prediction, HoSVD, DNN), quantization (linear, non-linear, vector), and encoding (dictionary-based, variable length, etc.) algorithms. Our experience with different applications shows that the compression methods often need to be customized and optimized to fit the specificities of the datasets to compress and the user requirements on the compression quality, ratio, and throughput. However, none of the existing lossy compression software for scientific data has been designed to be customizable. To address this issue, we developed SZ3, an innovative customizable, modular compression framework. SZ3 is a full C++ refactoring of SZ2 enabling the specialization, addition, or removal of each stage of the lossy compression pipeline to fit the specific characteristics of the datasets to compress and the use-case requirements. This extreme flexibility allows adapting SZ3 to many different use-cases, from ultra-high compression for visualization to ultra-high-speed compression between the CPU (or GPU) and the memory. Thanks to its unique set of features: customization, high compression ratio, high compression throughput, and excellent accuracy preservation, SZ3 won a 2021 R&D100 award. In this presentation, we present SZ3 and a new data prediction-based decorrelation method that significantly improves the compression ratios for climate datasets over the state-of-the-art lossy compressors, while preserving the same data accuracy. Experiments based on CESM datasets show that SZ3 can lead to up to 300% higher compression ratios than SZ2 with the same compression error bound and similar compression throughput.
How to cite: cappello, F., Di, S., and Underwood, R.: Improving lossy compression for climate datasets with SZ3, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9741, https://doi.org/10.5194/egusphere-egu22-9741, 2022.