EGU26-11394, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-11394
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Tuesday, 05 May, 14:00–15:45 (CEST), Display time Tuesday, 05 May, 14:00–18:00
 
Hall X4, X4.51
HiD-FM: A High-Resolution Remote Sensing Foundation Model with Knowledge Distillation and Feature Fusion for Image Semantic Segmentation
Guosen Xu1, Huanfeng Shen1, Xinghua Li2, Mingjie Xu1, Dekun Lin1, and Tao Jiang1
Guosen Xu et al.
  • 1School of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, PR China(gsenxu@whu.edu.cn)
  • 2School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, PR China

The emergence of foundation models marks a transformative era in Earth observation, delivering powerful and adaptable tools to tackle the complexities of processing massive satellite imagery. Currently, land cover mapping faces two primary obstacles: 1) the prohibitive cost and high reliance on high-quality labels for data annotation; and 2) the significant spectral and spatial variability of identical ground objects caused by differences in temporal phases, locations, and sensors. Visual Foundation Models (VFMs), with their potent generalization capabilities, offer a means to effectively bridge the domain gap. Inspired by it, a high-resolution remote sensing foundation model leveraging knowledge distillation and feature fusion HiD-FM is proposed. Specifically, HiD-FM undergoes self-supervised pre-training on a dataset of one million high-resolution unlabeled images. By synergizing knowledge distillation with feature fusion, it integrates the generalization power of pre-trained VFMs into a semi-supervised learning framework, thereby boosting performance on unlabeled data and enhancing fine-grained feature representation. Extensive experiments on semantic segmentation tasks demonstrate that HiD-FM consistently outperforms some RSFMs (such as RVSA, SMLFR and CMID), particularly in data-scarce scenarios. On the LoveDA and GID-15 datasets, our method surpasses both specialized models and existing foundation models across various labeling ratios. Notably, using only 30% of the training data, HiD-FM achieved OA of 83.19% on the GID-15 dataset. Furthermore, transfer learning experiments on GF-2 imagery across diverse spatiotemporal contexts yielded superior visualization results. HiD-FM enables rapid and cost-effective adaptation to target domains, thereby significantly advancing the field of remote sensing interpretation.

How to cite: Xu, G., Shen, H., Li, X., Xu, M., Lin, D., and Jiang, T.: HiD-FM: A High-Resolution Remote Sensing Foundation Model with Knowledge Distillation and Feature Fusion for Image Semantic Segmentation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11394, https://doi.org/10.5194/egusphere-egu26-11394, 2026.