- 1School of Engineering and Applied Sciences, Harvard University, Allston, MA, USA
- 2Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
Recent research in geospatial machine learning has demonstrated that models pretrained with self-supervised learning on Earth observation data can perform well on downstream tasks with limited training data. However, most of the existing geospatial benchmark datasets have few data modalities and poor global representation, limiting the ability to evaluate multimodal pretrained models at global scales. To fill this gap, we introduce MMEarth-Bench, a collection of five new multimodal downstream tasks with 12 input modalities, globally distributed data, and both in- and out-of-distribution test splits. We benchmark a diverse set of pretrained models on MMEarth-Bench and find that multimodal models generally perform best. While pretraining tends to improve model robustness in limited data settings, geographic generalization abilities remain poor and using multimodal inputs at test time can sometimes lead to geographic overfitting. In order to facilitate model adaptation to new downstream tasks and geographic domains, we propose a model-agnostic method for test-time training with multimodal reconstruction (TTT-MMR) that uses all the modalities available at test time, regardless of whether the pretrained model accepts them as input. We show that TTT-MMR improves model performance on both random and geographic test splits, and that geographic batching (TTT-MMR-Geo) leads to a good trade-off between regularization and specialization during TTT. Our dataset, code, and visualization tool are linked from the project page at https://lgordon99.github.io/mmearth-bench.
How to cite: Gordon, L., Belongie, S., Igel, C., and Lang, N.: MMEarth-Bench: Global Environmental Tasks for Multimodal Geospatial Models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12736, https://doi.org/10.5194/egusphere-egu26-12736, 2026.