SC2.5 | Data compression and reduction for Earth System Sciences datasets in practice
Data compression and reduction for Earth System Sciences datasets in practice
Co-organized by AS6/CL6/ESSI6/GI2/GM11/HS11/NP9
Convener: Juniper Tyree | Co-conveners: Sara Faghih-Naini, Clément Bouvier, Oriol Tinto
Fri, 08 May, 14:00–15:45 (CEST)
 
Room -2.82
Fri, 14:00
Earth System Sciences (ESS) datasets, particularly those generated by high-resolution numerical models, are continuing to increase in terms of resolution and size. These datasets are essential for advancing ESS, supporting critical activities such as climate change policymaking, weather forecasting in the face of increasingly frequent natural disasters, and modern applications like machine learning.

The storage, usability, transfer and shareability of such datasets have become a pressing concern within the scientific community. State-of-the-art applications now produce outputs so large that even the most advanced data centres and infrastructures struggle not only to store them but also to ensure their usability and processability, including by downstream machine learning. Ongoing and upcoming community initiatives, such as digital twins and the 7th Phase of the Coupled Model Intercomparison Project (CMIP7), are already pushing infrastructures to their limits. With future investment in hardware likely to remain constrained, a critical and viable way forward is to explore (lossy) data compression & reduction that balance efficiency with the needs of diverse stakeholders. Therefore, the interest in compression has grown as a means to 1) make the data volumes more manageable, 2) reduce transfer times and computational costs, while 3) preserving the quality required for downstream scientific analyses.

Nevertheless, many ESS researchers remain cautious about lossy compression, concerned that critical information or features may be lost for specific downstream applications. Identifying these use-case-specific requirements and ensuring they are preserved during compression are essential steps toward building trust so that compression can become widely adopted across the community.

This short course is designed as a practical introduction to compressing ESS datasets using various compression frameworks and to share tips on preserving important data properties throughout the compression process. After completing the hands-on exercises, using either your own or provided data, time will be set aside for debate and discussion to address questions about lossy compression and to exchange wishes and concerns regarding this family of methods. A short document summarising the discussion will be produced and made freely available afterwards.

To learn more about recent advances in data compression, please also join the ESSI2.2 oral and poster sessions.

The short course will include short pitch presentations to introduce you to the following scientific compressors by their developers:

  • Bit Rounding (Milan Klöwer)
  • ZFP (Peter Lindstrom, presented by Juniper Tyree)
  • SPERR (Sam Li)
  • EBCC (Langwen Huang)
  • LC (Martin Burtscher, presented by Juniper Tyree)
  • SZ (Robert Underwood)
  • LibPressio (Robert Underwood)
  • Compression Safeguards (Juniper Tyree)

Afterwards, you will have the chance to try out these compressors on a few provided datasets, or on datasets you bring, in a few example Jupyter notebooks.

If you want to run all examples on your own laptop, please visit https://github.com/climet-eu/egu26-compression-sc2-5 before the session and use the provided instructions to set up your Python environment and download all Python packages and files. During the session, we will also provide alternative ways to try out the compressors, though they may not run as quickly as a native installation. Please note that WiFi access at EGU is typically congested and should not be relied upon to download datasets during the course, though we will have USB sticks with the datasets available.

During the last 30 minutes of the course, we'll have a discussion between the presenters and you about the community's wishes and concerns regarding lossy compression of Earth System data.

Session assets

Speakers

  • Milan Klöwer, University of Oxford, United Kingdom
  • Samuel Li, Nvidia Corporation, United States of America
  • Langwen Huang
  • Robert Underwood, Argonne National Laboratory, United States of America
  • Juniper Tyree, University of Helsinki, Finland