EGU26-13502, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-13502
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Friday, 08 May, 08:30–10:15 (CEST), Display time Friday, 08 May, 08:30–12:30
 
Hall A, A.67
Integrating SAR and Multispectral Satellite Observations for Flood Inundation Mapping: A Cross-Modal Fusion Framework Leveraging Foundation Models and Gated Attention Mechanism
Yen Cheng Chen1 and Li Pen Wang2
Yen Cheng Chen and Li Pen Wang
  • 1National Taiwan University, Civil Engineering, Taipei City, Taiwan (yencheng91322@caece.net)
  • 2Department of Civil and Environmental Engineering, Imperial College London, London, UK

Flood inundation mapping has become increasingly critical as climate change intensifies the frequency and severity of flooding worldwide, amplifying risks to populations, infrastructure, and ecosystems. Recent advances in Earth Observation (EO) have shown unprecedented opportunities to monitor flood dynamics across large spatial scales.. However, significant challenges remain due to the limitations of single-sensor approaches. While multispectral imagery provides rich semantic information, it is frequently constrained by cloud cover during flood events. Conversely, Synthetic Aperture Radar (SAR) offers all-weather capability but suffers from signal ambiguity in complex terrains and urban environments. Effectively integrating these heterogeneous modalities therefore remains a challenge, particularly with limited labelled flood event data.

In this study, we propose a deep learning-based cross-modal fusion framework that leverages the representational capacity of Remote Sensing Foundation Models (RSFMs). High-level feature embeddings are extracted from Sentinel-1 and Sentinel-2 multispectral imagery by initializing modality-specific encoders with pretrained weights from state-of-the art multi-modal foundation models, providing a robust and semantically aligned feature space despite limited task-specific training data 

To integrate the multi-modal representations, we adopt a Gated Cross-Modal Attention mechanism, which adaptively modulates the information flow from each modality based on their observation reliability. Specifically, the model is trained to prioritise SAR features to ensure spatial continuity under cloud-obscured conditions, while simultaneously leveraging richer optical semantics to disambiguate SAR signals, correcting for example false detections caused by radar shadowing or smooth impervious surfaces. 

To assess the generalisation of the proposed framework across diverse regions and sensor conditions, we trained and evaluated our model using a comprehensive dataset compiled from publicly available benchmarks, including Kuro Siwo and WorldFloods. Our framework not only establishes a new benchmark for all-weather flood monitoring but also demonstrates the critical role of remote sensing foundation models in overcoming the limitations of traditional, data-hungry fusion approaches.

How to cite: Chen, Y. C. and Wang, L. P.: Integrating SAR and Multispectral Satellite Observations for Flood Inundation Mapping: A Cross-Modal Fusion Framework Leveraging Foundation Models and Gated Attention Mechanism, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13502, https://doi.org/10.5194/egusphere-egu26-13502, 2026.