- 1CloudFerro S.A., Warsaw, Poland
- 2Asterisk Labs, London, England
Foundation Models enable rich semantic representations of Earth Observation data by using embeddings generated from large, heterogeneous, and often unlabeled datasets. One of their most impactful applications is semantic similarity search, which allows EO data discovery based on context and meaning rather than metadata alone.
This work presents global EO embedding datasets deployed within the Copernicus Data Space Ecosystem (CDSE), enabling large-scale semantic and similarity search across satellite imagery. The embeddings are generated using multimodal Foundation Models that map EO imagery and textual queries into a shared space, allowing natural language to retrieve semantically related observations. This approach supports the discovery of complex geospatial patterns such as land cover types, human activities, or environmental phenomena without explicit labeling.
To ensure global consistency and scalability, the embedding generation and indexing are supported by the Major TOM standard, which provides a unified geospatial reference framework based on a global grid of points. Major TOM enables sampling across EO missions avoiding destructive preprocessing thus preserving raw, undistorted pixel values.
Efficient similarity search over tens of millions of high-dimensional embeddings is achieved through FAISS vector indexing techniques, enabling immediate query results for global scale datasets. Foundation Model embeddings, combined with standardized geospatial indexing and high-performance vector search, form a practical and scalable foundation for next-generation EO data discovery.
How to cite: Augustyn, B., Kluczek, M., Bojanowski, J., and Czerkawski, M.: Similarity Search of Earth Observation Data Using Foundation Model Embeddings, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14376, https://doi.org/10.5194/egusphere-egu26-14376, 2026.