Sentinel-2 Image Retrieval with Global, Cross-modal Embeddings

Yijie Zheng; Weijie Wu; Bingyue Wu; Guoqing Li; Mikolaj Czerkawski; Konstantin Klemmer

doi:https://doi.org/10.5194/egusphere-egu26-11672

[Back] [Session ESSI1.11]

EGU26-11672, updated on 29 Mar 2026

https://doi.org/10.5194/egusphere-egu26-11672

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Sentinel-2 Image Retrieval with Global, Cross-modal Embeddings

Yijie Zheng^1,2, Weijie Wu^1,2, Bingyue Wu³, Guoqing Li¹, Mikolaj Czerkawski⁴, and Konstantin Klemmer^5,6

Yijie Zheng et al.

¹Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China (zhengyijie23@mails.ucas.ac.cn)
²School of Electronic, Electrical and Communication, University of Chinese Academy of Sciences, Beijing, China
³Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China
⁴Asterisk Labs, London, UK
⁵LGND AI, Inc., San Fancisco, USA
⁶University College London, London, UK

Recent advancements in Earth embeddings have opened up new frontiers for geosciences, enabling efficient analysis of vast volumes of geospatial data. However, the practical utilization of these embeddings is often hindered by complex software environments and the requirement for specialized computational expertise. To help democratize access to Earth embeddings, we introduce EarthEmbeddingExplorer, an open-source, web-based application designed to enhance the accessibility, understanding and interactivity of Earth embeddings for the broader geoscience community.

EarthEmbeddingExplorer integrates multiple state-of-the-art foundation models, including SatCLIP, FarSLIP, and SigLIP, to support cross-modal retrieval of Sentinel-2 imagery via text, image, and geographic location queries. Our implementation leverages the MajorTOM Core-S2L2A dataset as the primary data source; we pre-computed approximately 250,000 embeddings per model based on a uniform spatial sampling of the MajorTOM grid. This approach ensures a representative global coverage of 1.2% of the Earth's land surface. To ensure accessibility, all models and datasets are hosted on open-source frameworks, specifically ModelScope and Hugging Face. The application provides an intuitive interface for visualizing the geographical distribution of the retrieved results, rendering top-match thumbnails, and exporting comprehensive metadata. Such transparent and low-cost access to large-scale embedding analysis is essential for identifying model-specific advantages and limitations. By enabling instant cross-model comparisons within specific spatiotemporal contexts, EarthEmbeddingExplorer allows users to evaluate model performance for their unique monitoring needs and domains of interest.

Ongoing development focuses on expanding EarthEmbeddingExplorer’s capabilities by integrating additional embedding models such as DINOv2, and increasing global spatial coverage. We are further implementing FAISS-based vector similarity search to enable near-instantaneous queries across tens of millions of global embeddings. Future iterations will prioritize modular software architecture, standardized APIs, and detailed documentation to facilitate community-driven contributions of new embedding models and datasets. The web applications are accessible at https://huggingface.co/spaces/ML4Sustain/EarthExplorer and at https://www.modelscope.cn/studios/VoyagerX/EarthExplorer.

How to cite: Zheng, Y., Wu, W., Wu, B., Li, G., Czerkawski, M., and Klemmer, K.: Sentinel-2 Image Retrieval with Global, Cross-modal Embeddings, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11672, https://doi.org/10.5194/egusphere-egu26-11672, 2026.

OSPP voting tool

This contribution takes part in the OSPP contest. Please log in to see the relevant judging section.