- State Key Laboratory of Hydroscience and Engineering & Department of Hydraulic Engineering , Tsinghua University , China (17870367514@163.com)
Flood and surface water mapping from satellite observations remains challenging due to the complementary yet heterogeneous characteristics
of optical and synthetic aperture radar (SAR) data. While deep learning has achieved promising results, existing studies are often evaluated on
isolated datasets or focus on a single modality, limiting their comparability and operational relevance. In this study, we conduct a large-scale and systematic evaluation of optical, SAR, and combined optical–SAR learning strategies for flood and surface water mapping across multiple public satellite benchmarks. Using a common training and evaluation protocol, we compare lightweight convolutional networks and large pretrained vision models under single-modality and multimodal settings. The analysis reveals that attention-based multimodal fusion consistently improves water delineation accuracy on most datasets, while model capacity and preprocessing choices play a critical role in balancing missed detections and false alarms. On global-scale benchmarks, moderately sized backbones coupled with dedicated fusion mechanisms achieve robust performance without relying on extremely large models.These findings provide practical guidance for selecting architectures and fusion strategies in operational flood mapping and establish a reproducible benchmark for future optical and SAR studies.
How to cite: xiao, J., li, Z., and tian, F.: Evaluating multimodal optical and SAR learning strategies for flood and surface water delineation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7320, https://doi.org/10.5194/egusphere-egu26-7320, 2026.