EGU22-8762
https://doi.org/10.5194/egusphere-egu22-8762
EGU General Assembly 2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Lossy Data Compression and the Community Earth System Model

Allison H. Baker1, Dorit M. Hammerling2, Alex Pinard2, and Haiying Xu1
Allison H. Baker et al.
  • 1The National Center for Atmospheric Research, Boulder, Colorado, United States of America
  • 2Colorado School of Mines, Golden, Colorado, United States of America

Climate models such as the Community Earth System Model (CESM) typically produce enormous amounts of output data, and storage capacities have not increased as rapidly as processor speeds over the years. As a result, the cost of storing huge data volumes has become increasingly problematic and has forced climate scientists to make hard choices about which variables to save, data output frequency, simulation lengths, or ensemble sizes, all of which can negatively impact science objectives.  Therefore, we have been investigating lossy data compression techniques as a means of reducing data storage for CESM.  Lossy compression, by definition, does not exactly preserve the original data, but it achieves higher compression rates and subsequently smaller storage requirements. However, as with any data reduction approach, we must exercise extreme care when applying lossy compression to climate output data to avoid introducing artifacts in the data that could affect scientific conclusions.  Our focus has been on better understanding the effects of lossy compression on spatio-temporal climate data and on gaining user acceptance via careful analysis and testing. In this talk, we will describe the challenges and concerns that we have encountered when compressing climate data from CESM and will discuss developing appropriate climate-specific metrics and tools to enable scientists to evaluate the effects of lossy compression on their own data and facilitate optimizing compression for each variable.  In particular, we will present our Large Data Comparison for Python (LDCPy) package for visualizing and computing statistics on differences between multiple datasets, which enables climate scientists to discover potentially relevant compression-induced artifacts in their data.  Additionally, we will demonstrate the usefulness of an alternative to the popular SSIM that we developed, called the Data SSIM (DSSIM), that can be applied directly to the floating-point data in the context of evaluating differences due to lossy compression on large volumes of simulation data.

How to cite: Baker, A. H., Hammerling, D. M., Pinard, A., and Xu, H.: Lossy Data Compression and the Community Earth System Model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8762, https://doi.org/10.5194/egusphere-egu22-8762, 2022.