Towards Foundation Models for Earth Observation; Benchmarking Datasets and Performance on Diverse Downstream Tasks

Anna Jungbluth; Matt Allen; Francisco Dorr; Joseph Gallego-Mejia; Laura Martínez-Ferrer; Freddie Kalaitzis; Raúl Ramos-Pollán

doi:https://doi.org/10.5194/egusphere-egu24-11514

[Back] [Session ESSI1.1]

EGU24-11514, updated on 09 Mar 2024

https://doi.org/10.5194/egusphere-egu24-11514

EGU General Assembly 2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Towards Foundation Models for Earth Observation; Benchmarking Datasets and Performance on Diverse Downstream Tasks

Anna Jungbluth¹, Matt Allen², Francisco Dorr³, Joseph Gallego-Mejia⁴, Laura Martínez-Ferrer⁵, Freddie Kalaitzis⁶, and Raúl Ramos-Pollán⁷

Anna Jungbluth et al.

¹European Space Agency, Climate Office, United Kingdom of Great Britain – England, Scotland, Wales (anna.jungbluth@esa.int)
²University of Cambridge
³Frontier Development Lab
⁴Universidad Nacional de Colombia
⁵Universitat de València
⁶University of Oxford
⁷Universidad de Antioquia

Satellite-based Earth Observation (EO) is crucial for monitoring land changes and natural hazards on a global scale. In addition to optical imagery, synthetic aperture radar (SAR) technology has proven indispensable, since radar pulses can penetrate clouds and detect millimeter changes on the ground surface. While SAR polarimetry data is easily available (e.g. via Google Earth Engine), interferometric products are harder to obtain due to complex pre-processing requirements.

In general, using the information contained in EO data (both optical and SAR) for specific downstream tasks often requires specialized analysis pipelines that are not easily accessible to the scientific community. In the context of applying machine learning to EO, self-supervised learning (SSL) - machine learning models that learn features in data without being provided with explicit labels - offer great potential to fully leverage the wealth and complexity of the available data.

In this work, we apply self-supervised learning techniques to create pre-trained models that can leverage the features learned from unlabelled EO data for a variety of downstream tasks. More specifically, we pre-train our models on optical imagery (Sentinel-2) or SAR data (Sentinel-1), and fine-tune our models to predict local events (e.g. fires, floods) and annual land characteristics (e.g. vegetation percentage, land cover, biomass). We compare a number of state-of-the-art SSL techniques (MAE¹, DINO², VICReg³, CLIP⁴) that have shown great performance on standard image or text based tasks. By adapting these models to our use case, we demonstrate the potential of SSL for EO, and show that self-supervised pre-training strongly reduces the requirement for labels.

In addition to the pre-trained models, we provide global benchmarking datasets of EO input data and associated downstream tasks ready for use in machine learning pipelines. Our data contains 25+ TB of co-registered and aligned tiles, covering South America, the US, Europe, and Asia. By comparing how well our pre-trained models perform on unseen data (both regionally and temporally), we investigate the generalizability of SSL techniques for EO research. With this, our work provides a first step towards creating EO foundation models that can predict anything, anywhere on Earth.

1. He, K. et al. Masked Autoencoders Are Scalable Vision Learners. (2021).

2. Caron, M. et al. Emerging Properties in Self-Supervised Vision Transformers. (2021).

3. Bardes, A., Ponce, J. & LeCun, Y. VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. (2021).

4. Radford, A. et al. Learning Transferable Visual Models From Natural Language Supervision. (2021).

How to cite: Jungbluth, A., Allen, M., Dorr, F., Gallego-Mejia, J., Martínez-Ferrer, L., Kalaitzis, F., and Ramos-Pollán, R.: Towards Foundation Models for Earth Observation; Benchmarking Datasets and Performance on Diverse Downstream Tasks, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11514, https://doi.org/10.5194/egusphere-egu24-11514, 2024.