- 1CloudFerro S.A., Warsaw, Poland
- 2Asterisk Labs
Embeddings provide a compact representation of data in a lower-dimensional vector space, enabling faster and more efficient analysis compared to direct processing of high-dimensional data. Satellite imagery is an example of such data, as it is characterized by large volume and high dimensionality. With the rapid development of AI foundation models, embedding-based approaches can increasingly replace classical remote sensing techniques in tasks such as classification and regression, while maintaining or even improving the quality of results.
This work leverages the Global Embeddings Dataset from the Copernicus Data Space Ecosystem, which contains embeddings generated by multiple models, including SSL4EO DINOv2, SigLIP, DeCUR, and MMEarth. These models differ in sensing modality, input resolution, and embedding dimensionality, enabling diverse analyses based on heterogeneous data sources. Data standardization using the MajorTOM format facilitates automated processing and seamless integration of embeddings derived from different models.
Following the MajorTOM standard, more than 8 million images, comprising 9.368 trillion pixels of raw data, were processed to generate over 170 million embeddings from approximately 62 terabytes of satellite data. This scale demonstrates the feasibility of embedding-based approaches for efficient management and analysis of large-scale Earth observation datasets.
Embedding-based representations enable effective detection of environmental changes, which can be categorized as either abrupt events, such as wildfires, deforestation, or floods, or long-term processes, including river desiccation and gradual ecosystem degradation. Such change detection capabilities are applicable across multiple domains, including urban development, defense, and environmental monitoring. By operating on compressed representations, embeddings allow for efficient similarity and change analysis over temporal sequences, significantly accelerating the processing of large satellite data archives.
How to cite: Ostrowski, B., Bojanowski, J. S., Kluczek, M., and Czerkawski, M.: AI Foundation Models for Near Real-Time Environmental Monitoring from Satellite Data , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9658, https://doi.org/10.5194/egusphere-egu26-9658, 2026.