- European Space Agency (ESA), ESRIN, Φ-lab, Italy
Recent advancements in AI and Self-Supervised Learning (SSL) have revolutionized large-scale computer vision models, enabling exceptional performance on downstream tasks in remote sensing with minimal labelled data. These models are pre-trained on large amounts of unlabelled data and, then, fine-tuned on specific downstream Earth Observation (EO) applications [1-3]. The scarcity of large-scale labelled datasets and the technical challenges of annotating the vast volumes of data collected by satellites pose significant barriers to achieving high accuracy in many important downstream tasks, which require extensive labeling at a large scale to be effective. Furthermore, the dynamic nature of Earth adds complexity, as labels at a particular moment in time are not enough.
Consequently, SSL and Foundation Models offer a powerful solution to these challenges. By pre-training a Foundation Model on extensive unlabelled data, only a small amount of labelled data is required for supervised fine-tuning on downstream tasks. This approach reduces the need for labelled EO data. This enables the development of general-purpose EO Foundation Models capable of solving a diverse range of problems.
In this work, we train the EO Foundation Model PhilEO Version 1.0 [2] on the dataset MajorTOM [6]. The model is trained on Sentinel-2 data, i.e. 23TB, Core-S2L2A. We scale up the pre-training data from less than 1TB in the dataset PhilEO-Globe [7] to 23TB. The model is trained using a combination of reconstruction and auxiliary task losses, including the Mean Squared Error (MSE) for geo-location longitude and latitude prediction. The architecture is a modified U-Net. The training is conducted on the Leonardo Davinci-1 supercomputer.
In this work, we also extend the capabilities of the evaluation framework for EO Foundation Models we recently introduced in [2]. We develop PhilEO-Bench++. To take advantage of multi-level features and of a U-Net-like architecture, for fine-tuning on downstream tasks, we use the decoder UPerNet [4]. Furthermore, to strengthen the evaluation of EO Foundation Models, we also perform confidence quantification and assessment [5] on both classification and regression tasks, including on land cover semantic segmentation and building density pixel-wise regression.
Experiments on the PhilEO-Bench downstream tasks of building density estimation, road segmentation and land cover mapping demonstrate the effectiveness of our model. For building density regression, for n-shots n=50 and n=100, the PhilEO model trained on MajorTOM achieves the MSEs 0.0191 and 0.0058, respectively.
References:
[1] D. Szwarcman, et al., “Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications,” arXiv:2412.02732, 2024.
[2] C. Fibaek, et al., “PhilEO Bench: Evaluating Geo-Spatial Foundation Models,” in Proceedings IGARSS, 2024.
[3] N. Dionelis, et al., “Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI,” arXiv:2406.18295, 2024.
[4] T. Xiao, et al., “Unified Perceptual Parsing for Scene Understanding,” 2018.
[5] N. Dionelis and N. Longepe, “Fine-Tuning Foundation Models with Confidence Assessment for enhanced Semantic segmentation,” 2024.
[6] A. Francis and M. Czerkawski, “MajorTOM: Expandable Datasets for Earth Observation,” IGARSS, 2024.
[7] B. Le Saux, et al., “The PhilEO Geospatial Foundation Model Suite,” EGU, 2024.
How to cite: Dionelis, N., Musto, R., Paoletti, G., Bosmans, J., Naylor, P., Sarti, S., Di Matteo, F., Cascarano, G., Fibaek, C., and Longepe, N.: Generalist Geospatial Foundation Model PhilEO on Satellite Sentinel-2 MajorTOM Multi-Spectral Data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16635, https://doi.org/10.5194/egusphere-egu25-16635, 2025.