EGU26-19324, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-19324
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Wednesday, 06 May, 11:30–11:40 (CEST)
 
Room 1.14
Improving Land Cover Semantic Segmentation through Deep Supervision
sara mobsite, Renaud Hostache, Laure Berti-Équille, Emmanuel Roux, and Joris Guérin
sara mobsite et al.
  • French National Research Institute for Sustainable Development, ESPACE-DEV, France

Increased interactions between humans, animals, and the environment contribute to wildlife habitat fragmentation and increase the risk of infectious disease emergence and transmission. These interactions can be characterized and analyzed through an understanding of land use and land cover (LULC) dynamics and spatial characteristics. LULC characterization is a key preliminary step for addressing eco-epidemiological questions using a landscape-based approach. The landscape, as the observable outcome of the spatio-temporal dynamics of environmental, animal, and human populations and their interactions at different spatial and temporal scales, allows the adoption of One Health and Planetary Health approaches. 

Automated analysis and characterization of LULC can be achieved through the application of deep learning techniques to satellite data. However, supervised pixel-level LULC classification using deep learning requires large amounts of expert-verified labeled data. When working with high-resolution imagery, the availability of well-labeled datasets is considerably more limited than for low-resolution products. In addition, class imbalance, underrepresentation of certain land cover categories, and their uneven spatial distribution pose major challenges. As a result, models relying on a single learning task often exhibit limited generalization performance in real-world settings. 

To address these challenges, we propose a deep learning autoencoder architecture that leverages both high- and low-resolution land cover maps. The model uses combined optical Sentinel-2 and radar Sentinel-1 data as input to the encoder. During decoding, low-resolution land cover maps are incorporated to capture the global spatial structure of the landscape. This information, introduced at early decoding stages, guides the learning process toward meaningful semantic representations at coarser scales. Subsequently, deeper decoding layers focus on learning finer semantic details under the supervision of high-resolution labels. 

We evaluated the proposed approach using the DFC2020 dataset, which consists of 5,128 samples with original LULC maps at 10-meter spatial resolution. Low-resolution supervision maps were generated by downsampling the original labels using nearest-neighbor interpolation. We assessed the impact of introducing deep supervision at different decoder depths. Results show that applying deep supervision early in the decoder with a weighting factor of 0.10 yielded the best performance. The mean Intersection over Union (IoU) improved from 46.28% ± 2.28 to 53.82% ± 0.71 across five independent runs. Moreover, the proposed model outperformed the widely used U-Net architecture, which achieved an IoU of 50.93% ± 1.25. 

These results demonstrate the effectiveness of deep supervision in enhancing pixel-level land cover classification by exploiting low-resolution information to improve global feature learning prior to refining fine-scale spatial details. This work was conducted within the framework of the MOSAIC Horizon Europe project, part of the Planetary Health cluster. 

 

How to cite: mobsite, S., Hostache, R., Berti-Équille, L., Roux, E., and Guérin, J.: Improving Land Cover Semantic Segmentation through Deep Supervision, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19324, https://doi.org/10.5194/egusphere-egu26-19324, 2026.