Integrating SAR and Multispectral Satellite Observations for Flood Inundation Mapping: A Cross-Modal Fusion Framework Leveraging Foundation Models and Gated Attention Mechanism

Yen Cheng Chen; Li Pen Wang

doi:https://doi.org/10.5194/egusphere-egu26-13502

[Back] [Session HS6.5]

EGU26-13502, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-13502

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Integrating SAR and Multispectral Satellite Observations for Flood Inundation Mapping: A Cross-Modal Fusion Framework Leveraging Foundation Models and Gated Attention Mechanism

Yen Cheng Chen¹ and Li Pen Wang²

Yen Cheng Chen and Li Pen Wang

¹National Taiwan University, Civil Engineering, Taipei City, Taiwan (yencheng91322@caece.net)
²Department of Civil and Environmental Engineering, Imperial College London, London, UK

Flood inundation mapping has become increasingly critical as climate change intensifies the frequency and severity of flooding worldwide, amplifying risks to populations, infrastructure, and ecosystems. Recent advances in Earth Observation (EO) have shown unprecedented opportunities to monitor flood dynamics across large spatial scales.. However, significant challenges remain due to the limitations of single-sensor approaches. While multispectral imagery provides rich semantic information, it is frequently constrained by cloud cover during flood events. Conversely, Synthetic Aperture Radar (SAR) offers all-weather capability but suffers from signal ambiguity in complex terrains and urban environments. Effectively integrating these heterogeneous modalities therefore remains a challenge, particularly with limited labelled flood event data.

In this study, we propose a deep learning-based cross-modal fusion framework that leverages the representational capacity of Remote Sensing Foundation Models (RSFMs). High-level feature embeddings are extracted from Sentinel-1 and Sentinel-2 multispectral imagery by initializing modality-specific encoders with pretrained weights from state-of-the art multi-modal foundation models, providing a robust and semantically aligned feature space despite limited task-specific training data

To integrate the multi-modal representations, we adopt a Gated Cross-Modal Attention mechanism, which adaptively modulates the information flow from each modality based on their observation reliability. Specifically, the model is trained to prioritise SAR features to ensure spatial continuity under cloud-obscured conditions, while simultaneously leveraging richer optical semantics to disambiguate SAR signals, correcting for example false detections caused by radar shadowing or smooth impervious surfaces.

To assess the generalisation of the proposed framework across diverse regions and sensor conditions, we trained and evaluated our model using a comprehensive dataset compiled from publicly available benchmarks, including Kuro Siwo and WorldFloods. Our framework not only establishes a new benchmark for all-weather flood monitoring but also demonstrates the critical role of remote sensing foundation models in overcoming the limitations of traditional, data-hungry fusion approaches.

How to cite: Chen, Y. C. and Wang, L. P.: Integrating SAR and Multispectral Satellite Observations for Flood Inundation Mapping: A Cross-Modal Fusion Framework Leveraging Foundation Models and Gated Attention Mechanism, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13502, https://doi.org/10.5194/egusphere-egu26-13502, 2026.

OSPP voting tool

This contribution takes part in the OSPP contest. Please log in to see the relevant judging section.

Supplementary materials

Supplementary material link Supplementary material file

Comments on the supplementary material

AC: Author Comment | CC: Community Comment | Report abuse

supplementary materials version 1 – uploaded on 07 May 2026, no comments