HiD-FM: A High-Resolution Remote Sensing Foundation Model with Knowledge Distillation and Feature Fusion for Image Semantic Segmentation

Guosen Xu; Huanfeng Shen; Xinghua Li; Mingjie Xu; Dekun Lin; Tao Jiang

doi:https://doi.org/10.5194/egusphere-egu26-11394

[Back] [Session ESSI1.11]

EGU26-11394, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-11394

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

HiD-FM: A High-Resolution Remote Sensing Foundation Model with Knowledge Distillation and Feature Fusion for Image Semantic Segmentation

Guosen Xu¹, Huanfeng Shen¹, Xinghua Li², Mingjie Xu¹, Dekun Lin¹, and Tao Jiang¹

Guosen Xu et al.

¹School of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, PR China(gsenxu@whu.edu.cn)
²School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, PR China

The emergence of foundation models marks a transformative era in Earth observation, delivering powerful and adaptable tools to tackle the complexities of processing massive satellite imagery. Currently, land cover mapping faces two primary obstacles: 1) the prohibitive cost and high reliance on high-quality labels for data annotation; and 2) the significant spectral and spatial variability of identical ground objects caused by differences in temporal phases, locations, and sensors. Visual Foundation Models (VFMs), with their potent generalization capabilities, offer a means to effectively bridge the domain gap. Inspired by it, a high-resolution remote sensing foundation model leveraging knowledge distillation and feature fusion HiD-FM is proposed. Specifically, HiD-FM undergoes self-supervised pre-training on a dataset of one million high-resolution unlabeled images. By synergizing knowledge distillation with feature fusion, it integrates the generalization power of pre-trained VFMs into a semi-supervised learning framework, thereby boosting performance on unlabeled data and enhancing fine-grained feature representation. Extensive experiments on semantic segmentation tasks demonstrate that HiD-FM consistently outperforms some RSFMs (such as RVSA, SMLFR and CMID), particularly in data-scarce scenarios. On the LoveDA and GID-15 datasets, our method surpasses both specialized models and existing foundation models across various labeling ratios. Notably, using only 30% of the training data, HiD-FM achieved OA of 83.19% on the GID-15 dataset. Furthermore, transfer learning experiments on GF-2 imagery across diverse spatiotemporal contexts yielded superior visualization results. HiD-FM enables rapid and cost-effective adaptation to target domains, thereby significantly advancing the field of remote sensing interpretation.

How to cite: Xu, G., Shen, H., Li, X., Xu, M., Lin, D., and Jiang, T.: HiD-FM: A High-Resolution Remote Sensing Foundation Model with Knowledge Distillation and Feature Fusion for Image Semantic Segmentation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11394, https://doi.org/10.5194/egusphere-egu26-11394, 2026.