EGU26-9940, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-9940
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Monday, 04 May, 15:00–15:10 (CEST)
 
Room C
SHRUG-FM: Reliability-Aware Foundation Models for Earth Observation
Kai-Hendrik Cohrs1, Maria Gonzalez-Calabuig1, Vishal Nedungadi2, Zuzanna Osika3, Ruben Cartuyvels4, Steffen Knoblauch5, Joppe Massant6, Shruti Nath7, Patrick Ebel8, and Vasileios Sitokonstantinou2
Kai-Hendrik Cohrs et al.
  • 1University of València, Image Processing Laboratory (IPL), Valencia, Spain (kai.cohrs@uv.es)
  • 2Wageningen University & Research, Wageningen, Netherlands
  • 3Delft University of Technology, Delft, Netherlands
  • 4European Space Agency, Italy
  • 5Heidelberg University, Heidelberg, Germany
  • 6Ghent University, Ghent, Belgium
  • 7University of Oxford, Oxford, UK
  • 8Google Research

Following recent advances of foundation models in natural language processing and computer vision, there is growing interest in leveraging geospatial foundation models (GFMs) for Earth system monitoring and climate-relevant applications. In particular, GFMs promise to support large-scale observation of climate-driven extreme events such as wildfires, floods and landslides. However, despite strong benchmark results, recent studies indicate that GFMs for land-cover modelling and hazard mapping models can behave unreliably under real-world conditions. Pretraining datasets often underrepresent rare or extreme environmental regimes, leading to degraded model performance precisely in situations where robust predictions are most critical for climate risk assessment and disaster response. Furthermore, GFMs are often surpassed by simple supervised baselines, highlighting the need for systematic reliability analysis, including out-of-distribution (OOD) detection and uncertainty quantification.

We present SHRUG-FM (systematic handling of real-world uncertainty in geospatial foundation models), a reliability-aware prediction framework that integrates three complementary signals: (1) OOD detection in the input space, (2) OOD detection in the embedding space and (3) task-specific predictive uncertainty obtained from decoder ensembles. We evaluate SHRUG-FM on climate-relevant extreme-event applications, including burn-scar, flood and landslide segmentation. Our results show that elevated OOD scores consistently co-locate with degraded model performance, while uncertainty-based indicators successfully capture many low-confidence and erroneous predictions. By linking these reliability signals to hydro-environmental descriptors from HydroATLAS, we further demonstrate that model failures cluster in distinct geographic and hydroclimatic regimes, revealing interpretable gaps in the pretraining distribution and guiding future dataset design.

SHRUG-FM delivers practical, operationally relevant diagnostics for Earth system monitoring and prediction. It enables selective prediction, rejection strategies, and reliability-aware quality control. These capabilities are essential for integrating GFMs into real-world workflows for climate impact assessment, hazard monitoring and early warning systems. Future work will extend the framework to additional foundation models and climate-driven hazards.

How to cite: Cohrs, K.-H., Gonzalez-Calabuig, M., Nedungadi, V., Osika, Z., Cartuyvels, R., Knoblauch, S., Massant, J., Nath, S., Ebel, P., and Sitokonstantinou, V.: SHRUG-FM: Reliability-Aware Foundation Models for Earth Observation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9940, https://doi.org/10.5194/egusphere-egu26-9940, 2026.