- 1Asterisk Labs, London, United Kingdom
- 2CloudFerro S.A., Warsaw, Poland
The current landscape of geospatial AI models is expanding rapidly, with new open-source models being released nearly every month. Consequently, with this rising number of potential general-purpose models, many of which claim state-of-the-art performance, their suitability for specific tasks or spatiotemporal contexts is often difficult to judge. A key step towards the democratisation of model benchmarking can be made by releasing large-scale datasets of pre-computed embeddings, such as those shared within the Major TOM project.
Yet, even with easy access to global and dense embeddings of a given model, it is not clear how to evaluate them on a global scale, given the scarcity and spatiotemporal biases of high-quality labels. This work explores a set of evaluation tests that can be conducted on a global scale, moving beyond canonical use cases to understand the inherent biases of individual models.
First, a set of proxy tasks with worldwide coverage is introduced. In this benchmark prototype, several sensitivity variables are tested, including time, location (estimation of spatiotemporal context), and VIIRS nightlights data (estimating a proxy for human activity). Despite not being traditional downstream tasks, these three variables have the advantage of uniform quality across the entire dataset. This allows for standardised, fair evaluation of representations extracted from Sentinel-2 and Sentinel-1 data across a range of pre-trained encoders as part of the Major TOM Embedding suite.
Secondly, a suite of techniques for comparing internal representation geometries of latent space vectors from multiple models is introduced to evaluate the similarities and differences between individual models. This approach does not require any reference labels, enabling a deeper understanding of geospatial semantic relationships encoded by different architectures.
Ultimately, this work advances the large-scale evaluation of deep learning models for Earth observation data, utilizing these model comparisons to develop a set of recommendations for future benchmarking efforts within the Earth Science community.
How to cite: Czerkawski, M., Kluczek, M., and Bojanowski, J. S.: Time, Space, and Nightlights: Global Evaluation of Major TOM Earth Embeddings, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13425, https://doi.org/10.5194/egusphere-egu26-13425, 2026.