EGU26-2530, updated on 13 Mar 2026
https://doi.org/10.5194/egusphere-egu26-2530
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Tuesday, 05 May, 14:00–15:45 (CEST), Display time Tuesday, 05 May, 14:00–18:00
 
Hall X4, X4.50
Efficient Earth Observation Representation Learning Using Metadata-Aware Mixture-of-Experts Masked Autoencoder
Mohanad Albughdadi1, Marica Antonacci2, Vasileios Baousis3, Federico Fornari2, Tolga Kaprol1, and Claudio Pisa2
Mohanad Albughdadi et al.
  • 1European Centre for Medium-Range Weather Forecasts, Bonn, Germany
  • 2European Centre for Medium-Range Weather Forecasts, Bologna, Italy
  • 3European Centre for Medium-Range Weather Forecasts, Reading, UK

Large-scale foundation models trained on multi-sensor satellite imagery has been driving recent advances in Earth Observation (EO) tasks. Although such models achieve impressive transferability across diverse downstream tasks, their computational and memory demands hinder accessibility, reproducibility, and deployment in resource-constrained environments. This work explores a compact and efficient alternative, introducing a metadata-aware Mixture-of-Experts Masked Autoencoder (MoE-MAE) for EO representation learning (Albughdadi, 2025).

The proposed MoE-MAE is a self-supervised transformer-based architecture with only 2.5 million parameters. It combines sparse expert routing and geo-temporal conditioning. The sparse routing allows token specialization while keeping active computation low. The geo-temporal conditioning injects information about latitude, longitude, and cyclic temporal attributes directly into the model. The proposed design enables the algorithm to exploit spatial and temporal regularities inherent in EO data without requiring dense, and computationally costly transformers.

The model is pretrained in the BigEarthNet-Landsat (BEN-LS) (Corley et al., 2025) dataset using a masked reconstruction loss function augmented with auxiliary unmasked and load-balancing losses to encourage stable expert utilization. The learned encoder representations are then evaluated using linear probing on two benchmark datasets: (1) BEN-LS, a multi-label land-cover dataset with explicit metadata, and (2) EuroSAT-Landsat (EuroSAT-LS) (Corley et al., 2025), a single-label classification datasets without metadata. Despite the encoder’s small size (~2.3 M parameters), the proposed MoE-MAE achieves competitive results with models’ orders of magnitude larger. On BEN-LS, the frozen encoder achieves a micro mean average precision of 0.767, comparable to SSL4EO-L ViT-S/16 MoCo v2 (0.775) (Stewart et al., 2023). On EuroSAT-LS, the model maintains strong transferability, achieving 84.2% accuracy, even in the absence of geo-temporal metadata.

Expert specialization across spatial patterns is revealed through adequate ablation and visualization studies, which show that some experts respond primarily to vegetation, others to water or textured regions. This demonstrates interpretable behaviour and complementary feature learning. Additionally, only about half of the model’s expert feed-forward capacity is activated per token, confirming computational sparsity in practice. These findings suggests that such models can retain strong representational power while substantially reducing training and inference costs.

This work presents a first step toward small-scale architectures for EO representation learning by integrating metadata, and leveraging sparse computation to approach the performance of massive transformers. Future work will extend this framework to multi-sensor and multi-temporal datasets to capture dynamic Earth processes efficiently.

Albughdadi, M. (2025). Lightweight Metadata-Aware Mixture-of-Experts Masked Autoencoder for Earth Observation.arXiv:2509.10919.

Stewart, A. J., Lehmann, N., Corley, I. A., Wang, Y., Chang, Y.-C., Braham, N. A. A., Sehgal, S., Robinson, C., & Banerjee, A. (2023). SSL4EO-L: Datasets and Foundation Models for Landsat Imagery. arXiv:2312.05241.

Corley, I., Sharma, L., and Crasto, R. (2025). Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models.arXiv:2506.08780.

How to cite: Albughdadi, M., Antonacci, M., Baousis, V., Fornari, F., Kaprol, T., and Pisa, C.: Efficient Earth Observation Representation Learning Using Metadata-Aware Mixture-of-Experts Masked Autoencoder, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2530, https://doi.org/10.5194/egusphere-egu26-2530, 2026.