EGU26-20782, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-20782
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Monday, 04 May, 10:45–12:30 (CEST), Display time Monday, 04 May, 08:30–12:30
 
Hall X4, X4.58
Neural Compression of Remote Sensing Data for the Pre-Training of Geospatial Foundation Models
Sebastian Hoffmann1,2, Markus Zehner1, Vitus Benson1,2, Marieke Wesselkamp1, Georg Martius3,4, and Markus Reichstein1,2
Sebastian Hoffmann et al.
  • 1Max Planck Institute for Biogeochemistry (shoffmann@bgc-jena.mpg.de)
  • 2ELLIS Unit Jena
  • 3University of Tübingen
  • 4ELLIS Unit Tübingen

Over the course of a decade, a single Earth observation satellite mission, such as Sentinel-2, can generate more than 10 petabytes of data. While this wealth of information offers a unique opportunity for pre-training geospatial foundation models, storing and processing such massive datasets is challenging, even for powerful HPC systems. One potential solution is the use of lossy compression techniques, which remove irrelevant information (e.g., noise and redundancy) while preserving as much relevant content as possible. This approach enables significantly larger training datasets for self-supervised learning, potentially offsetting the loss in data quality and yielding performance gains in downstream tasks. However, at the time of writing, the application of lossy compression in this context remains largely underexplored.

Here, we use Vector Quantized Variational Autoencoders (VQ-VAE) for compression of Sentinel-2 and Sentinel-1 data. We show that VQ-VAE is able to achieve compression rates of up to 65x with minimal reconstruction errors. Compared to classic, general-purpose compression techniques such as JPEG-2000, the VQ-VAE model attains 2-3x higher compression rates for the same reconstruction error. We also present ablation studies on the use of compressed versus uncompressed data for pre-training masked autoencoders (MAE) under a fixed physical storage budget, reflecting the constraints of resource-limited HPC systems. Finally, inspired by previous work in computer vision, we explore using the learned quantization scheme to construct a probabilistic masked autoencoder. Instead of predicting a deterministic reflectance or backscatter value, our probabilistic model predicts a categorical distribution over the learned codewords and is trained using cross-entropy loss. This formulation naturally incorporates uncertainty or bimodality into the masked autoencoding task, for example under cloudy conditions.

Our results demonstrate the potential and feasibility of neural compression techniques for the pre-training of large geospatial foundation models. Looking ahead, we aim to incorporate our findings into the training of WeatherGenerator-Land, an upcoming multi-modal foundation model for Earth's land surface. WeatherGenerator-Land will be used for vegetation forecasting, predicting land-atmosphere interactions, and high-resolution land surface temperature forecasts, with a particular focus on heat waves and urban heat islands.

How to cite: Hoffmann, S., Zehner, M., Benson, V., Wesselkamp, M., Martius, G., and Reichstein, M.: Neural Compression of Remote Sensing Data for the Pre-Training of Geospatial Foundation Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20782, https://doi.org/10.5194/egusphere-egu26-20782, 2026.