- 1Ecole Normale Supérieure de Paris, Inria, CNRS, Paris, France
- 2Kayrros SAS, Paris, France
- 3Centre Borelli, CNRS, ENS Paris-Saclay, Université Paris-Saclay, Gif-sur-Yvette, France
- 4Laboratoire des Sciences du Climat et de l’Environnement, LSCE/IPSL, CEA-CNRS-UVSQ, Université Paris Saclay, Gif-sur-Yvette, France
In a context of rapid environmental change, delivering robust tree species mapping is essential. It enables better quantification of forest biomass, facilitates climate change adaptation through better forest management and supports biodiversity preservation. However, the scarce existing ground-truth datasets suffer from geographic sparsity, semantic inconsistencies and class imbalance, making current methods overfit to context and unsuitable for accurate large-scale tree species mapping. Therefore, it is imperative to design methods that learn spatially invariant representations for tree species mapping.
The surge of Earth Observation missions has unlocked vast amounts of Satellite Image Time Series (SITS) which capture phenology and spectral dynamics that are an asset for tree species classification. Leveraging this data, an increasing number of Foundation Models (FM) pre-trained using Self-Supervised Learning (SSL) have been introduced. Yet, due to the prevalence of patch-level annotations in tree species datasets, FMs are primarily evaluated on classification tasks instead of segmentation, preventing the production of pixel-level maps. Furthermore, spatial generalization remains largely unexplored, partially explained by the geographic sparsity of the labels. As a result, current models often overfit to local context: they perform well on training areas but fail to generalize to new spatial domains. Therefore this work focuses on rigorous spatial generalization evaluation and the development of methods to produce large-scale pixel-level tree species maps overcoming current spatial domain shifts.
To quantify this generalization gap, we propose a spatial zero-shot domain adaptation evaluation protocol, where frozen FMs are linearly probed through a segmentation task on a geographical region and tested on geographically distinct, unseen regions. We aligned 3 datasets in Europe (TreeSatAI, PureForest and a regional dataset covering Poland) into 6 classes to benchmark state-of-the-art FMs (AnySat, ALISE, Presto) pre-trained on SITS and introduce a new architecture addressing current limitations.
We propose a SSL framework based on the TimeSFormer backbone. It captures complex spatio-temporal dynamics using divided space and time attention. The model is pre-trained as a Masked Auto-Encoder on a European-scale unlabeled Sentinel-2 dataset to learn robust phenological features. To mitigate the observed spatial generalization gap, we investigate different strategies such as auxiliary conditioning and thermal temporal positional encoding.
Our evaluation protocol reveals a significant accuracy drop of state-of-the-art models when applied to unseen regions. This decline suggests that current FMs capture geographically-dependent features rather than intrinsic tree species characteristics, resulting in a spatial generalization gap.
Experiments confirm that the proposed architecture learns semantically rich features, evidenced by its high capacity to reconstruct missing time steps of satellite time series.
By quantifying the spatial domain shift, proposing a resilient SSL architecture and applying domain adaptation strategies this work addresses the important challenge of generalization in label-scarce regimes. It supports high-resolution forest monitoring, a prerequisite for precise carbon accounting and forest biodiversity conservation.
How to cite: Brood, S., Dumeur, I., Anger, J., de Truchis, A., Sean, E., Fayad, I., d'Aspremont, A., and Ciais, P.: Addressing Geographical Domain Shift in Tree Species Mapping via Foundation Models using Satellite Image Time Series, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13546, https://doi.org/10.5194/egusphere-egu26-13546, 2026.