EGU25-15003, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-15003
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Wednesday, 30 Apr, 16:50–17:00 (CEST)
 
Room -2.32
Exploring Lossy Data Compression in an Online Laboratory for Climate Science and Meteorology
Juniper Tyree1, Sara Faghih-Naini2, Peter Dueben2, Karsten Peters-von Gehlen3, and Heikki Järvinen1
Juniper Tyree et al.
  • 1Dynamical Meteorology, Institute for Atmospheric and Earth System Research, University of Helsinki, Finland (juniper.tyree@helsinki.fi)
  • 2Earth System Modelling, European Centre for Medium-Range Weather Forecasts
  • 3Data Management, Deutsches Klimarechenzentrum, Germany

While the output volumes from high-resolution weather and climate models are increasing exponentially, data storage, access, and analysis methods have not kept up. Data compression is a vital tool to keep up with this increase in data production. As lossless compression is no longer sufficient to produce the required compression ratios, lossy compression should be applied instead. However, information loss sounds scary. While mounting research shows that model and measurement data contains “false information” (e.g. noise or uncertainty from measurements or numerical inaccuracies) that can be removed for better compression without degrading the data quality, a convincing argument for lossy data compression can only be made by domain scientists themselves by trying it out for themselves.

Interactive code notebooks (e.g. Jupyter) have become popular for sharing and communicating computational experiments, analyses, and visualizations. While sharing the notebooks is easy, running them requires hosting a JupyterLab server and installing all Python and system libraries required for the notebook. This initial setup cost hinders quickly experimenting with a shared notebook and testing, e.g. a practical example of lossy data compression for oneself.

As part of the EuroHPC ESiWACE, Phase 3, Centre of Excellence (https://www.esiwace.eu/), we have been developing an Online Laboratory for Climate Science and Meteorology (https://lab.climet.eu), a JupyterLab instance that runs serverless just within your web browser and comes with many libraries pre-installed. With the online lab, which builds on the Pyodide and JupyterLite community projects, running and exploring a shared notebook can start within a minute. We use the online laboratory to provide domain scientists with an online compression laboratory, https://compression.lab.climet.eu, to reduce the barrier to experimenting with the effect of lossy compression on their own data. The lab also supports URL schemas to preload other third-party notebooks (and repositories) hosted via Git, as Gists, or behind any URL, so that sharing a ready-to-run notebook is as easy as sharing, e.g., https://lab.climet.eu/v0.2/github/juntyr/climet-lab-demo/v0.2.0/demo.ipynb. We are also working on quickly turning existing static-documentation example-notebooks into interactive documentation that invites immediate further exploration.

In this session, we want to showcase the online laboratory and the services it can provide to the earth science community by live demonstrating its applications in the compression laboratory and others. We also hope to gather feedback on the future direction of its development and collaborations with other open science tools to serve our communities best.

How to cite: Tyree, J., Faghih-Naini, S., Dueben, P., Peters-von Gehlen, K., and Järvinen, H.: Exploring Lossy Data Compression in an Online Laboratory for Climate Science and Meteorology, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-15003, https://doi.org/10.5194/egusphere-egu25-15003, 2025.