- 1Deutsches Klimarechenzentrum GmbH (DKRZ), Datamanagement, Hamburg, Germany (peters@dkrz.de)
- 2University of Helsinki, Institute for Atmospheric and Earth System Research, Helsinki, Finland
- 3European Centre for Medium-Range Weather Forecasts (ECMWF), Bonn, Germany
- 4Deutsches Klimarechenzentrum GmbH (DKRZ), Hamburg, Germany
It is apparent that the data amounts expected to be generated by current and upcoming Earth System Science research and operational activities stress the capabilities of HPC and associated data infrastructures. Individual research projects focusing on running global Earth System Models (ESMs) at spatial resolution of 5km or less can easily occupy several petabytes on disk. With multiple of such projects running on a single HPC infrastructure, the challenge of storing the data alone becomes apparent. Further, community-driven activities like model intercomparison projects – which are conducted for both conventional and high-resolution model setups – add to the aforementioned strain on storage systems. Hence, when planning for next-generation HPC systems, the storage requirements of state-of-the-art ESM-centered projects have to be clear so that systems are still fit-for-use 5 years down the road from the initial planning stage.
As computational hardware costs per performance unit (FLOP or Byte) are not decreasing anymore like they have in the past decades, HPC system key figures do not increase substantially anymore from one generation to the next. The mismatch between demands of research and what future systems can offer is therefore clear.
One apparent solution to this problem is to simply reduce the amount of data from ESM simulations stored on a system. Data compression is one candidate to achieve this. Current ESM projects already utilize application-side lossless compression techniques, which help reduce storage space. However, decompression may incur performance penalties, especially when read patterns misalign with the compression block sizes. Lossy compression offers the potential for higher compression rates, without access penalties for data retrieval. However, its suitability is highly content-dependent, raising questions about which lossy compression methods are best suited for specific datasets. On a large scale, applying lossy compression also prompts the consideration of how such data reduction could shape the design of next-generation HPC architectures.
With lossy compression not being very popular in the ESM-community so far, we present a key development of the ongoing ESiWACE3 project: an openly accessible Jupyter-based online laboratory for testing lossy compression techniques on ESM output datasets. This online tool currently comes with a set of notebooks allowing users to objectively evaluate the impact lossy compression has on analyses performed on the compressed compared to the input data. With some compressors promising compression ratios of 10x-1000x, providing such tools to ensure compression quality is essential. The motivation behind the online compression laboratory is to foster the acceptance of lossy compression techniques by conveying first-hand experience and immediate feedback of benefits or drawbacks of applying lossy compression algorithms.
Going one step further, we illustrate the impacts that applying lossy-compression techniques on ESM data on large-scales can have on the design decisions made for upcoming HPC infrastructures. We illustrate, among others, that increased acceptance and application of lossy compression techniques enables more efficient resource utilization and allows for smarter reinvestment of funds saved from reduced storage demands, potentially leading to the acquisition of smaller systems and thus enabling increased research output per resource used.
How to cite: Peters-von Gehlen, K., Tyree, J., Faghih-Naini, S., Dueben, P., Squar, J., and Fuchs, A.: Lossy Data Compression Exploration in an Online Laboratory and the Link to HPC Design Decisions, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19418, https://doi.org/10.5194/egusphere-egu25-19418, 2025.