Segmentation Model Benchmarking: A Strategic Prerequisite for Robust Geospatial Foundation Models

Mehran Alizadeh Pirbasti; Gavin McArdle; Vahid Akbari

doi:https://doi.org/10.5194/egusphere-egu26-21700

[Back] [Session ESSI1.11]

EGU26-21700, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-21700

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Segmentation Model Benchmarking: A Strategic Prerequisite for Robust Geospatial Foundation Models

Mehran Alizadeh Pirbasti¹, Gavin McArdle¹, and Vahid Akbari²

Mehran Alizadeh Pirbasti et al.

¹School of Computer Science, University College Dublin, Ireland. (mehran.alizadehpirbasti@ucdconnect.ie)
²Division of Computing Science and Mathematics, University of Stirling Stirling, U.K.

Pretraining geospatial foundation models (FMs) is expensive, and architectural choices control inductive bias for multiscale context, cross-resolution behavior, and band/sensor variation. Therefore, benchmarking reduces the risk of scaling the “wrong” base. A model benchmarking framework for geospatial image segmentation is a critical prerequisite for developing robust and scalable geospatial FMs. In the emerging era of Earth observation FMs, success hinges on strong, well-characterized base architectures that can generalize across sensors, modalities, and geographies. The extreme heterogeneity of Earth observation vision data (different spectral bands, resolutions, and regions) makes such generalization especially challenging, underscoring the need for systematic, controlled benchmarking across diverse model families to identify viable architectures for different scenarios.
Our work rigorously evaluates a broad spectrum of segmentation architectures and backbones under consistent conditions. We benchmark classical convolutional architectures (U-Net, DeepLab, UPerNet, FPN, PAN, and LinkNet) alongside modern transformer-based models (Dense Prediction Transformer (DPT) and SegFormer). For this comparison, we use representative backbones from both CNNs (ResNet and MobileNet) and Mix Vision Transformer (MiT). By comparing these heterogeneous models on equal footing, we determine which architectural patterns and hybrid combinations yield representations most conducive to generalization. This diversity in evaluation identifies well-founded architectural bases for geospatial FMs.
To guide architecture selection and pipeline design, we deploy a comprehensive suite of metrics covering both accuracy and efficiency. We evaluate segmentation accuracy via IoU, Dice, and boundary F1-score, and also measure efficiency (convergence speed and inference latency). These holistic benchmarks reveal critical trade-offs. For instance, some lightweight CNN models excel in speed, while transformer models boost boundary F-1 score. By capturing such nuances, our benchmark informs which architectures are best suited as general-purpose base models. It highlights how certain encoder–decoder combinations optimally balance performance and efficiency, and flags architectures with high transfer-readiness for new tasks and domains.
The result is a reproducible, transferable model landscape that serves as a blueprint for FM development. Our benchmark framework effectively preconditions the FM pipeline, enabling researchers to enter the scaling phase with proven architecture candidates that have demonstrated cross-task and cross-sensor robustness. This “model landscape” allows subsequent large-scale pretraining to confidently build on architectures that ensure broad downstream generalization even in agentic (autonomous) deployment scenarios.
Finally, we situate this work within the broader trend toward sensor-agnostic, self-supervised FMs in Earth observation. We argue that intelligent architecture search must precede any massive self-supervised pretraining effort. Early vetting of architectures under diverse conditions ensures that large-scale training resources are invested in the most promising designs. In summary, we frame this hybrid benchmarking framework as a strategic new layer in the geospatial FM ecosystem. The insights extend beyond segmentation, providing a reference point for building fine-tunable, sensor-agnostic foundation models that can be readily adapted to various downstream tasks and even deployed onboard satellites or other edge platforms. By solidifying architecture evaluation as an essential step, this work makes a serious scientific and strategic contribution toward the next generation of Earth observation AI.

How to cite: Alizadeh Pirbasti, M., McArdle, G., and Akbari, V.: Segmentation Model Benchmarking: A Strategic Prerequisite for Robust Geospatial Foundation Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21700, https://doi.org/10.5194/egusphere-egu26-21700, 2026.

Supplementary materials

Supplementary material file

Comments on the supplementary material

AC: Author Comment | CC: Community Comment | Report abuse

supplementary materials version 1 – uploaded on 08 May 2026, no comments