ESSI1.9 | Advancing Foundation Models for Earth and Climate: Benchmarking, Best Practices and AI-Enabled Scientific Understanding
EDI
Advancing Foundation Models for Earth and Climate: Benchmarking, Best Practices and AI-Enabled Scientific Understanding
Convener: Takuya KurihanaECSECS | Co-conveners: Nikolaos DionelisECSECS, Anna Jungbluth, Conrad Albrecht, Gabriele Cavallaro, Valentine Anantharaj
Orals
| Mon, 28 Apr, 08:30–10:15 (CEST)
 
Room -2.92
Posters on site
| Attendance Mon, 28 Apr, 10:45–12:30 (CEST) | Display Mon, 28 Apr, 08:30–12:30
 
Hall X4
Posters virtual
| Attendance Tue, 29 Apr, 14:00–15:45 (CEST) | Display Tue, 29 Apr, 14:00–18:00
 
vPoster spot 4
Orals |
Mon, 08:30
Mon, 10:45
Tue, 14:00
Geospatial Foundation Models (GeoFMs) have shown great promise in a wide range of applications for Earth Observation (EO) and Earth System Modelling (ESM), as well as for Weather and Climate. With the increasing number of models being published, model inter-comparison is key to identify the best GeoFM for deployment. This session aims to highlight efforts on model development, benchmarking, fine-tuning, and their best practices for utilizing GeoFMs in real-world applications. We invite submissions focused on creating GeoFMs to leverage multi-modal, multi-temporal, and multi-resolution datasets towards sensor-independence. Diverse FMs for EO, ESM, Weather, and Climate can revolutionize data analysis by handling text, imagery, and time-series, enabling insights into natural hazards and climate resilience. Our session will cover advances in data curation, model architecture, scaling, benchmarking, pretraining, fine-tuning, and MLOps for GeoFMs, including use cases and deployment strategies.

The topics of our session revolving around GeoFMs are:
1. Benchmarks & Evaluation: Establish standardized fair evaluation metrics and benchmarks to assess the performance and capabilities of GeoFMs in multi-modal data analysis, ensuring reliability and efficiency.
2. Pre-Training Strategies & Best Practices: Discuss efficient data sampling strategies, proxy tasks, and scalable model training for efficient pre-training of GeoFMs. Guidelines for using existing pre-trained GeoFMs for a diverse set of applications, with focus on how to decide which models are best for certain use cases.
3. Sensor Independence: GeoFMs can process data from various sensors, enabling comprehensive analysis of the Earth's dynamics holistically.
4. Multi-Modal/Temporal: GeoFMs offer novel approaches to multi-modal data analysis and spatio-temporal change detection.
5. Scientific Insights: Highlighting the scientific insights enabled through the creation of GeoFMs, particularly in relation to geo-physical principles and causal relations.
6. Community Involvement & Impact: How to build an open-science community around GeoFMs that is easily accessible to all while keeping an eye on potential societal, environmental, and economic impacts when deploying GeoFMs.

We aim to foster discussions on current applications, challenges, and opportunities of GeoFMs seeking contributions from AI and domain researchers, climate modelers, industry experts, and stakeholders in AI, HPC, and Big Data.

Orals: Mon, 28 Apr | Room -2.92

The oral presentations are given in a hybrid format supported by a Zoom meeting featuring on-site and virtual presentations. The button to access the Zoom meeting appears just before the time block starts.
08:30–08:35
Advancing the Development of Foundation Models
08:35–08:45
|
EGU25-3328
|
On-site presentation
Rahul Ramachandran, Tsengdar Lee, and Kevin Murphy

NASA has collected—and continues to amass—petabytes of scientific data, ranging from the vastness of galaxies to the intricacies of cellular biology. These ever-expanding datasets provide unparalleled opportunities for discovery but pose significant challenges for managing data and extracting meaningful insights. Artificial intelligence (AI) and machine learning (ML) are emerging as transformative tools for addressing these issues. However, state-of-the-art deep neural networks often require large volumes of labeled training data, which are costly and time-intensive to generate. AI foundation models (FMs) offer a promising alternative by leveraging self-supervised learning to identify patterns within data. These FMs enable diverse applications with reduced dependence on compute resources and labeled datasets.

NASA’s Office of the Chief Science Data Officer has formulated a "5+1" strategy to develop AI foundation models for science. This strategy emphasizes creating foundation models (FMs) pre-trained using flagship datasets from each of NASA’s science divisions while building a science-specific language model to support cross-divisional applications. Key achievements include the release of INDUS, an encoder language model trained on scientific publications and technical documents; two versions of the Prithvi Geospatial model for environmental monitoring applications; and the Prithvi Weather and Climate model, designed to reconstruct atmospheric states from incomplete data and forecast future states. Additionally, a heliophysics foundation model for space weather applications is under development and is scheduled for release by mid-2025.

To encourage NASA’s research and application communities to use these FMs in their work and to support NASA’s new Earth Science to Action Strategy, the Earth Science Division has developed additional research and application solicitations to further enhance these FMs and to build applications and tools leveraging these FMs. These announcements are available in NASA’s Research Opportunities in Space and Earth Science (ROSES 2025).

NASA has forged strategic partnerships with private sector organizations, academia, and other entities grounded in open science principles to build these models. Each model is designed around a specific set of scientific use cases to ensure relevance and practical impact. All models and associated use case notebooks are shared openly. 

This presentation will provide an overview of the foundation models released to date, the workflows used in their design and development, and the roadmap for future models. It will also highlight upcoming workshops aimed at equipping the broader scientific community to effectively integrate these models into their research.

How to cite: Ramachandran, R., Lee, T., and Murphy, K.: AI Foundation Models for Science: Current Initiatives, Workflow, and Future Roadmap, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-3328, https://doi.org/10.5194/egusphere-egu25-3328, 2025.

08:45–08:55
|
EGU25-18029
|
ECS
|
On-site presentation
Riccardo Musto, Giancarlo Paoletti, Nikolaos Dionelis, Simone Sarti, Fabio Di Matteo, Jente Bosmans, Peter Naylor, Giacomo Donato Cascarano, Casper Fibaek, and Nicolas Longépé

Foundation Models are emerging as a transformative paradigm in Earth observation, offering powerful solutions to the challenges of processing and understanding satellite imagery at scale. The scarcity of large-scale labeled datasets and the technical challenges of annotating the vast volumes of data collected by satellites pose significant barriers to achieving high accuracy in many important downstream tasks. Furthermore, the dynamic nature of Earth adds complexity, as labels tied to a specific geographical region at a particular moment in time are insufficient to capture the evolving characteristics of the environment. Self-supervised learning techniques have emerged as a promising solution, enabling models to learn rich representations from unlabeled data while requiring minimal supervised fine-tuning for specific applications.
In this work, we present GeoDINO, a novel foundation model that adapts the DINO self-supervised learning architecture to handle multi-spectral Sentinel-2 data. While the original DINO framework has shown remarkable success in computer vision tasks through its teacher-student architecture and self-distillation approach, we extend it significantly for Earth observation applications. Our key innovation lies in the addition of multiple supervised auxiliary tasks: after the encoder generates representations, we attach specialized MLPs designed to predict various geospatial attributes including climate zones, permanent water bodies and geographical coordinates. Both the teacher and student networks are trained to predict these auxiliary labels, with the teacher network being updated through Exponential Moving Average (EMA) of the student's weights. This modification enables our model to learn not only from the self-supervised distillation process but also from the rich spatial and temporal information inherent in satellite imagery.
We are currently training GeoDINO on MajorTOM, a comprehensive Sentinel-2 dataset comprising 23TB of Core-S2L2A data, exploiting the Leonardo Davinci-1 Supercomputer. Furthermore, to validate our approach, we are also training the model on FastTOM and TinyTOM, two subsets of MajorTOM. Finally, the model will be evaluated within the PhilEO Bench framework to assess its performance on different tasks, including land cover classification, change detection, and building density estimation. Looking ahead, we plan to transition to the DINOv2 architecture to further enhance our model's capabilities. Through this research, we aim to demonstrate how self-supervised learning techniques, when properly adapted for Earth observation data, can address the fundamental challenges of data scarcity and temporal dynamics in remote sensing applications. The development of GeoDINO represents a step toward more efficient and adaptable Earth observation systems that can leverage the vast amounts of available satellite data while minimizing the need for extensive labeled datasets.
References:  
[1] M. Caron, et al., “Emerging Properties in Self-Supervised Vision Transformers”, arXiv:2104.14294, 2021
[2] C. Fibaek, et al., “PhilEO Bench: Evaluating Geo-Spatial Foundation Models,” in Proceedings IGARSS, 2024. 
[3] N. Dionelis, et al., “Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI,” arXiv:2406.18295, 2024. 
[4] N. Dionelis and N. Longepe, “Fine-Tuning Foundation Models with Confidence Assessment for enhanced Semantic segmentation,” 2024. 
[5] A. Francis and M. Czerkawski, “MajorTOM: Expandable Datasets for Earth Observation,” IGARSS, 2024. 
[6] B. Le Saux, et al., “The PhilEO Geospatial Foundation Model Suite,” EGU, 2024.

How to cite: Musto, R., Paoletti, G., Dionelis, N., Sarti, S., Di Matteo, F., Bosmans, J., Naylor, P., Donato Cascarano, G., Fibaek, C., and Longépé, N.: GeoDINO: A Vision Foundation Model for Earth Observation Leveraging DINO Architecture and Sentinel-2 Multi-Spectral Data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-18029, https://doi.org/10.5194/egusphere-egu25-18029, 2025.

08:55–09:05
|
EGU25-19378
|
On-site presentation
Guido Grosse, Pedram Ghamisi, Gabriele Cavallaro, Martin Herold, Andreas Huth, and Irena Hajnsek and the 3D-ABC Team

Understanding the global carbon budget with its carbon sources and sinks is scientifically important and economically relevant. In particular, vegetation and soils are major and highly dynamic carbon pools in the Earth System and a substantial part of the terrestrial carbon budget is influenced by land use changes, vegetation dynamics, and soil processes.

Recent advances in Foundation Models (FMs) are transforming AI, enabling remarkable generalization and zero-shot learning capabilities. Within the Helmholtz Foundation Model Initiative, we are developing the 3D-ABC FM, a tool targeting the accurate mapping of above- and below-ground carbon stocks in vegetation and soils at high spatial resolution. 3D-ABC aims to provide a seamless understanding of terrestrial carbon distribution by integrating multimodal remote sensing, climate, and elevation datasets, and addressing complex challenges such as multi-dimensionality and multi-resolution in FMs. Our unique 3D-ABC partnership brings together key capacities from the domains remote sensing, carbon monitoring, AI, and high-performance computing to take on such FM development.

The 3D-ABC FM integrates large-scale remote sensing data, including multispectral satellite imagery from the Harmonized Landsat-Sentinel-2 (HLS) dataset, TanDEM-X InSAR coherence data, and 3D lidar data from space (GEDI, ICESat 1&2), aircraft, and ground-based platforms. We also aim to incorporate ERA-5 Land climate reanalysis information, GLO-30 digital elevation data, as well as local lidar and field data on vegetation, soils, and carbon flux parameters. High-resolution forest models will be used to benchmark carbon fluxes.

To accommodate the diverse data modalities assembled for 3D-ABC and to address eight selected downstream tasks, the AI model employs an adaptive architecture, integrating a multi-modal input processor, an FM encoder, an adaptive fusion neck, and task-specific prediction heads. The multi-modal input processor handles data with varying spectral dimensions, automatically mapping inputs to a unified feature space. The FM encoder extracts generalized deep features from the normalized inputs, which are then integrated into universal feature representations through the adaptive fusion neck. This fusion enhances interactions across modalities. Finally, the universal features are decoded into various outputs tailored to the specific needs of downstream tasks. In the first FM training phase, a pretraining strategy leverages a masked autoencoder to train the multi-modal input processor, the encoder, and the fusion neck in an unsupervised manner, enabling the model to develop robust representation capabilities. In the second phase, by leveraging the principles of transfer learning, the pretrained model is fine-tuned using labeled data from various downstream tasks.

3D-ABC targets use of the JUWELS Booster and JUPITER high-performance computing (HPC) systems located at the Jülich Supercomputing Centre (JSC). The JUWELS Booster comprises 936 compute nodes, each equipped with four NVIDIA A100 GPUs. JUPITER, the first European exascale supercomputer, is currently being installed at JSC. Its Booster module will consist of ~6,000 compute nodes, each featuring four NVIDIA GH200 GPUs. To maximize efficient JUPITER utilization, 3D-ABC is leveraging the JUPITER Research and Early Access Program, which provides early access for code optimization and preparation to ensure FM applications are optimized and ready for deployment when the system becomes operational in 2025.

How to cite: Grosse, G., Ghamisi, P., Cavallaro, G., Herold, M., Huth, A., and Hajnsek, I. and the 3D-ABC Team: Towards a Foundation Model for Global Terrestrial 3D Above and Below Ground Carbon Stock Mapping (3D-ABC), EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19378, https://doi.org/10.5194/egusphere-egu25-19378, 2025.

09:05–09:15
|
EGU25-19892
|
On-site presentation
Sandro Fiore, Gabriele Padovani, Takuya Kurihana, Massimiliano Fronza, and Valentine Anantharaj

The growing interest in deep learning and large language models (LLMs) in recent years highlights their remarkable adaptability and ability to generalize, drawing researchers from a wide array of disciplines. Despite their promise, in many instances, these advancements have exposed a lack of transparency and rigor during development processes. Although this rapid pace of research undoubtedly offers numerous benefits, it has also led to an increasing prevalence of works conducted without rigor and in a superficial way. Code that is not accompanied by documentation and results that are not reproducible inevitably lead to confusion among researchers and an environment in which trust is not a fundamental aspect of the proposed work. The complexity of data manipulation, characterized by ad hoc transformations, exacerbates these issues by hindering the traceability of processes, and hyperparameter tuning introduces additional difficulties, requiring repeated experimentation that consumes excessive computational resources, especially for large models. 

To address these challenges, we introduce yProv4ML, a python library which provides an accessible option for tracking dataset and model statistics, hyperparameters, and energy metrics. It allows for the comparison of sets of experiments, and introduces a suite of directives to easily track the flow of information through provenance metadata. 

yProv4ML is a component of the yProv framework, a research project on multi-level provenance management which provides scientists with a rich software ecosystem consisting of a web service to manage track and analyze provenance documents. Leveraging the PROV-JSON standard for provenance artifact recording, yProv4ML ensures comprehensive documentation and reproducibility while facilitating a seamless integration process similar to well-known libraries such as MLFlow.

During the last year, yProv4ML was integrated in a variety of use cases in different domains (i.e., Climate Science, High Energy Physics and Earth Observation) in the context of the interTwin (https://www.intertwin.eu/) and ClimateEurope2 (https://climateurope2.eu/) EU projects, as well as the ICSC Italian National Project (https://www.supercomputing-icsc.it/en/icsc-home/). The collection of provenance data in these use cases not only helped facilitate the reproducibility of experiments, but also helped diagnose performance bottlenecks and ensure the reliability and integrity of results, all of which are critical to advancing the field of large-scale ML in a trustworthy manner.

How to cite: Fiore, S., Padovani, G., Kurihana, T., Fronza, M., and Anantharaj, V.: Enabling Seamless Provenance Collection in Large-Scale Machine Learning Tasks, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19892, https://doi.org/10.5194/egusphere-egu25-19892, 2025.

09:15–09:20
Benchmarks and Datasets for Evaluating Science Foundation Models
09:20–09:40
|
EGU25-3302
|
solicited
|
On-site presentation
Hamed Alemohammad, Sam Khallaghi, Denys Godwin, Rufai Balogun, Sujit Roy, and Rahul Ramachandran

There is a significant growth in development and utilization of foundation models for geospatial applications. These models are trained on large scale unlabeled data and commonly evaluated on downstream tasks using labeled datasets. While this approach provides a platform to assess the performance of the model for specific downstream tasks, there has been limited effort to quantify the characteristics of the foundation model after pre-training. 

Explainable AI (XAI) approaches aim to increase the accuracy and transparency of AI models and to make their results interpretable. In the case of geospatial foundation models, it is essential to assess if the model learns the spectral, spatial and temporal properties of geospatial data, and how this learning impacts the accuracy of model predictions. 

To this end, we introduce a new global XAI benchmark for geospatial foundation models using multispectral remote sensing imagery. This benchmark contains separate tasks that allows the user to test a foundation model’s properties in the embedding space, and demonstrate whether the model has learned spectral, spatial and temporal features. The spectral task consists of a set of chips with homogeneous spatial patterns from all major land cover classes. The spatial task consists of the same data used for spectral taks but regular spatial patterns are replaced with heterogeneous features representative of their true distribution. Finally, the temporal task includes a set of chips with time series imagery of pre- and post-event for disturbances such as wildfire and flood. 

In this presentation, we will demonstrate the results of using this benchmark to evaluate the properties of multiple geospatial foundation models.

How to cite: Alemohammad, H., Khallaghi, S., Godwin, D., Balogun, R., Roy, S., and Ramachandran, R.: An Explainable AI (XAI) Benchmark for Geospatial Foundation Models, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-3302, https://doi.org/10.5194/egusphere-egu25-3302, 2025.

09:40–09:50
|
EGU25-9632
|
On-site presentation
Marcin Kluczek, Mikolaj Czerkawski, and Jędrzej S. Bojanowski

The rapid growth of Earth Observation (EO) data from the Copernicus programme presents new opportunities for applying artificial intelligence (AI) and machine learning (ML) techniques. This work introduces a global embedding framework designed to improve the analysis of large EO datasets from Sentinel-1 and Sentinel-2 imagery. Following the Major TOM standard, we process over 8 million images, encompassing 9.368 trillion pixels of raw data, to generate more than 170 million embeddings from 62 terabytes of satellite data.

To enable that, a set of commonly used vision models (from both general and remote sensing domain, such as SigLIP, DINOv2, SSL4EO, DeCUR and MMEarth) are employed to derive efficient embedding representations of the input data. These embeddings support various applications, including text-to-image and image-to-image retrieval, as well as zero-shot classification, allowing for more effective integration of EO data into AI pipelines and providing valuable insights into global phenomena.

The current approach efficiently processes large-scale data, built on the CloudFerro cloud platform, with experiments demonstrating its usefulness in Earth Observation analysis. The results highlight the system’s reliability across different applications, emphasizing its potential to support data-driven decision-making on a global scale. This study also discusses key strategies for scalable cloud computing, GPU optimization, and multithreaded CPU processing to handle large volumes of EO data efficiently. 

How to cite: Kluczek, M., Czerkawski, M., and Bojanowski, J. S.: Developing Global Embeddings from Sentinel-1 and Sentinel-2 Data to Enhance Earth Observation Analysis, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-9632, https://doi.org/10.5194/egusphere-egu25-9632, 2025.

09:50–10:00
|
EGU25-8460
|
ECS
|
On-site presentation
Elena Plekhanova, Damien Robert, Johannes Dollinger, Philipp Brun, Jan Dirk Wegner, and Niklaus E. Zimmermann

With the biodiversity crisis and land use intensification, macroecological questions related to biodiversity assessment and conservation are becoming increasingly pressing. Such questions require global datasets such as satellite imagery. Traditional methods using satellite data rely heavily on supervised learning and annotated datasets, which are limited and difficult to generalize across geographical scales. In recent years, self-supervised learning (SSL) has opened the doors to learning expressive representations of massive datasets without annotations,  thus revolutionizing the analysis of remote sensing imagery. However, currently available datasets for pre-training such models have a skewed geographical distribution, focusing on cities and agricultural areas while failing to adequately represent regions of high ecological interest, such as rainforests or polar latitudes.

We propose a new Sentinel 2A (10m resolution) multiband dataset, globally distributed on a regular grid across the landmass(250k locations). At each location, the dataset captures four different seasons determined based on the local EVI-curve and includes NDVI index, which is widely used in ecological applications. Our temporal sampling is specifically designed to align with plant phenology rather than ad-hoc calendar dates. We use this data to pre-train Momentum Contrast and Seasonal Contrast SSL models that have shown similar performance on commonly-used benchmarks and advanced performance on macroecological downstream tasks, such as species distribution modelling. We anticipate that the dataset and model will be valuable for macroecological applications, such as deep species distribution modeling or large-scale biodiversity assessments.

How to cite: Plekhanova, E., Robert, D., Dollinger, J., Brun, P., Wegner, J. D., and Zimmermann, N. E.: SeCo-Eco: Global multiband seasonal pre-training dataset and self-supervised model for ecological applications, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-8460, https://doi.org/10.5194/egusphere-egu25-8460, 2025.

10:00–10:10
|
EGU25-7220
|
ECS
|
On-site presentation
Konlavach Mengsuwan and Masahiro Ryo

Foundation models have shown substantial potential in enhancing Earth observation by providing high accuracy while minimizing the need for manual annotation. However, the combined application of multiple foundation models for processing ground-truthing data remains largely underexplored. This study introduces a new approach for ground-level land use classification and ground-truthing by integrating a vision foundation model, the Segment Anything Model (SAM), with a general-purpose large language model (GPT-4o). Using high-resolution thermal and RGB imagery captured from human-eye height with a handheld camera, the proposed method generates object-level land use classifications and surface temperature profiles. Data collection was conducted in the Lusatia region of Germany, covering diverse land use types. SAM was utilized to segment complex landscape structures into meaningful elements such as roads, water bodies, and trees, followed by GPT-4o, which classified these segments into custom-defined land use categories. At a broad level (7 classification types), the workflow achieved approximately 80% accuracy, with high F1 scores for categories such as Road (0.89), Vegetation (0.82), and Built Structure (0.81). At a finer level (28 classification types), the method attained around 64% accuracy, effectively classifying detailed sub-classes such as Asphalt-Concrete Road (F1 = 0.85), Brick Road (F1 = 0.86), Tree (F1 = 0.74), and Arable Land (F1 = 0.68). By overlaying thermal imagery with classified segments, the method revealed distinct microclimatic patterns across land use types, with agricultural land showing the lowest surface temperatures (p < 0.001). The proposed workflow underscores the potential of combining SAM and GPT-4o to deliver robust ground-truthing data using portable cameras, advancing AI-enabled environmental monitoring.

How to cite: Mengsuwan, K. and Ryo, M.: Ground-level land surface classification and thermal analysis using foundation models, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-7220, https://doi.org/10.5194/egusphere-egu25-7220, 2025.

10:10–10:15

Posters on site: Mon, 28 Apr, 10:45–12:30 | Hall X4

The posters scheduled for on-site presentation are only visible in the poster hall in Vienna. If authors uploaded their presentation files, these files are linked from the abstracts below.
Display time: Mon, 28 Apr, 08:30–12:30
Poster session
X4.55
|
EGU25-8272
Mohanad Albughdadi, Marica Antonacci, Vasileios Baousis, Federico Fornari, Tolga Kaprol, and Claudio Pisa

The detection of environmental changes caused by natural disasters is critical for rapid response and effective management. In this study, we present a methodology for unsupervised change detection that leverges optical Sentinel-2 [1] and Synthetic Aperture Radar (SAR) Sentinel-1 [2] accessed through public SpatioTemporal Asset Catalogs (STAC) [3] along with Earth Observation (EO) foundation model, namely, Clay [4]. The analysis was conducted independently for each dataset to capitalize on the unique properties of these satellite sensors. Sentinel-1 offers robust surface texture sensitivity with its all-weather, day-and-night imaging capability, while Sentinel-2 provides detailed spectral and spatial information critical for vegetation and land-use analysis.

Clay foundation model, a large-scale pretrained Vision Transformer trained on EO data from various missions (Sentinel-1, Sentinel-2, Landsat, Planet, NAIP, LINZ, and MODIS), was used to extract spatially and spectrally rich embedding from Sentinel-1 and Sentinel-2 images. The model takes as an input the satellite imagery along with information about location and time and outputs mathematical representations of a given area at a certain time on Earth’s surface. The images were fed to the model as patches of size 256×256 along with the timestap of the scene, the spatial location and other metadata of the input image to estimate the embeddings that can be rearranged to be of size (1024×32×32). These embeddings were then analyzed using pixel-wise distance metrics to quantify changes between pre- and post-even imagery and the resulting distance image was then spatially interpolated to the size of the input image.

The approach was validated on satellite imagery of the Valencia region in Spain, an area significantly impacted by recent flooding on the 29th October 2024. For Sentinel-1, the method effectively highlighted surface water changes and structure affected by the floods in two scenes acquired on the 7th October and the 12th November 2024, while Sentinel-2 data captured variations in vegetation areas that was impacted by the floods using two scenes acquired on the 1st October and the 10 November 2024. By analyzing the datasets independently, this framework demonstrates the complementary insights offered by radar and optical imagery in assessing disaster impacts.

This study highlights the potential of leveraging open satellite data available via STAC catalogs and EO foundation models for unsupervised change detection in disaster monitoring, enabling rapid response without relying on specialized models tailored to specific regions. Unlike traditional approaches that require retraining for new areas due to geographical variability, this methodology is both scalable and adaptable, providing a generalizable framework for environmental monitoring, disaster response, and resilience planning. The results emphasize the value of integrating multi-sensor satellite imagery to enhance understanding of disaster impacts, facilitating more informed and timely decision-making.

References:

[1] https://earth-search.aws.element84.com/v1

[2] https://planetarycomputer.microsoft.com/api/stac/v1

[3] https://stacspec.org/en

[4] https://clay-foundation.github.io/model/index.html

How to cite: Albughdadi, M., Antonacci, M., Baousis, V., Fornari, F., Kaprol, T., and Pisa, C.: Unsupervised Change Detection Using Sentinel-1 and Sentinel-2 Imagery with the Clay Foundation Model: A Case Study of Flood-Affected Areas in Valencia Spain, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-8272, https://doi.org/10.5194/egusphere-egu25-8272, 2025.

X4.56
|
EGU25-19131
Conrad Albrecht, Ruben Gonzalez, Nassim Ait Ali Braham, Ranjini Bangalore, and Thomas Brunschwiler

Hyperspectral imagery (HSI) provides rich spectral information that is the basis for applications such as mineral mapping, trace gas identification, and precision agriculture. Yet, the development of HSI Foundation Models (FMs) is less advanced compared to multi-spectral remote sensing modalities.

In this study, we leverage the SpectralEarth dataset [1] to explore practical aspects of training robust HSI FMs. In particular, we shed light on the role of:

  • the impact of model architecture (transformers vs. convolutional networks),
  • self-supervised learning methods (contrastive vs. masked autoencoders),
  • model size & training data volume,
  • and the resulting computational requirements.

Through extensive experiments, this study aims to provide concrete guidelines for the development and effective application of FMs in the HSI domain. Moreover, we report on findings to identify downstream applications where hyperspectral imagery has an edge over multi-spectral photos [2], and where such an advantage is less likely to expect.

 

References

[1] Braham, Nassim Ait Ali, et al. "SpectralEarth: Training Hyperspectral Foundation Models at Scale." arXiv preprint arXiv:2408.08447 (2024)

[2] Bangalore, Ranjini, et al. "Hyperspectral foundation model trained by spectral reconstruction for greenhouse gas emission estimation", annual meeting of the American Geophysical Union (2024)

How to cite: Albrecht, C., Gonzalez, R., Braham, N. A. A., Bangalore, R., and Brunschwiler, T.: A Practical Guide to Hyperspectral Foundation Models, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19131, https://doi.org/10.5194/egusphere-egu25-19131, 2025.

X4.57
|
EGU25-14068
Valentine Anantharaj, Takuya Kurihana, Gabriele Padovani, and Sandro Fiore

AI foundation models have already demonstrated their usefulness in harnessing their potential in a wide range of science application domains. They derive their power from the large volumes of data, along with the computational methods used to exploit them using unprecedented amounts of compute power. We are inundated with data but have managed to exploit only a small fraction of the available data.  The Earth System Grid Federation (ESGF) is hosting nearly 16 PB of data collection from the Coupled Model Intercomparison Project (CMIP6), expected to grow to 5 - 10 more in the CMIP7 era. The NASA Earth Observation Data and Information System (EOSDIS) archive is expected to exceed 600 PB by 2030. 

AI-enabled solutions will require integrating multimodal data while being cognizant of the energy footprint introduced by the data and the computational methods. Currently, the energy consumption of transformer-based foundation models scale with the amount of data and corresponding model sizes. This impediment needs to be mitigated by developing data-efficient methods that lead to energy efficiency as well across all scales. There is little guidance in the research community on developing a computational plan for the optimal use of the resources for developing foundation models using multimodal scientific data. The benchmarks based on LLM scaling are still insufficient for vision transformers (ViTs), commonly adopted for geoscientific applications. We need a suite of community benchmarks based on ViT backbones and other methods at different scales to understand energy efficient methods for different classes of science problems.

Relatively few studies have focused on the issue of data efficiency for training science foundation models. We have adopted a smart sampling approach to extract the most informative samples is an effective means of significantly reducing the training data. We trained two ViT models, one with all available MODIS data over the ocean and another using an intelligently-sampled subset. We applied the models to classify clouds over the ocean.  Our preliminary results indicate that reasonably accurate models can be trained with only a fraction of total training data. Improvements in reduction of data translate directly into improvements for energy efficiency. 

How to cite: Anantharaj, V., Kurihana, T., Padovani, G., and Fiore, S.:  Data efficiency: The master key for unlocking energy efficiency, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-14068, https://doi.org/10.5194/egusphere-egu25-14068, 2025.

X4.58
|
EGU25-14848
|
ECS
Takuya Kurihana, Valentine Anantharaj, and Moetasim Ashfaq

Few hundred million to billion parameters of autoregressive transformer-based weather foundation models (FMs) have demonstrated generalizabilities for various downstream applications such as regional forecasting to downscaling. They occasionally outperform traditional physics-based models for medium-range forecasting skills as well as enable significantly faster execution speeds. These ML approaches are designed for timeframes ranging from hourly to up to 10 days, and sub-seasonal forecasting, defined as a range spanning two weeks to two months, often receives less attention for their downstream tasks due to the inherent challenges in predicting the chaotic nature of atmospheric systems. However, the sub-seasonal to seasonal forecast has socio-economic impacts influencing actions from seasonal extreme weather events and economic activities. While community standards for benchmarking studies have been conducted for the medium-range forecasts, the benchmarking of sub-seasonal forecasts still needs further efforts. In this study, we are aiming to fine-tune foundation models to predict sub-seasonal forecasts for various variables to conduct comprehensive benchmarking for weather foundation models. Particularly, to reduce the complexity of tasks, our fine-tune task forecasts two-week averaged atmospheric variables with a forecasting lead-time of two weeks. For this task, we resample the community standard dataset, WeatherBench, for the two-week averaged dataset. We primarily work with the Oak Ridge Base Foundation Model for Earth System Predictability (ORBIT), and extend the benchmarking to other FMs across Aurora, ClimaX, and Prithvi WxC models. Our initial fine-tuning task uses a 100 million parameters ORBIT model to predict geopotential height at 200 hPa with two-week lead time, a key indicator for extreme precipitation in Central Southeast Asia. The preliminary results demonstrate that the fine-tuned ORBIT predicts realistic spatial distributions achieving an MSE of 24.32 m when evaluated against the 2018 data. The comprehensive sub-seasonal forecasting benchmarking can highlight the potential of weather FMs whether they capture underlying principles of atmospheric dynamics, thereby enabling their performance to be extended to longer forecast lead-times. 

How to cite: Kurihana, T., Anantharaj, V., and Ashfaq, M.: Fine-tuning Foundation Models for Benchmarking Prediction Skills for Sub-seasonal Forecasting , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-14848, https://doi.org/10.5194/egusphere-egu25-14848, 2025.

X4.59
|
EGU25-16635
|
ECS
Nikolaos Dionelis, Riccardo Musto, Giancarlo Paoletti, Jente Bosmans, Peter Naylor, Simone Sarti, Fabio Di Matteo, Giacomo Cascarano, Casper Fibaek, and Nicolas Longepe

Recent advancements in AI and Self-Supervised Learning (SSL) have revolutionized large-scale computer vision models, enabling exceptional performance on downstream tasks in remote sensing with minimal labelled data. These models are pre-trained on large amounts of unlabelled data and, then, fine-tuned on specific downstream Earth Observation (EO) applications [1-3]. The scarcity of large-scale labelled datasets and the technical challenges of annotating the vast volumes of data collected by satellites pose significant barriers to achieving high accuracy in many important downstream tasks, which require extensive labeling at a large scale to be effective. Furthermore, the dynamic nature of Earth adds complexity, as labels at a particular moment in time are not enough. 

Consequently, SSL and Foundation Models offer a powerful solution to these challenges. By pre-training a Foundation Model on extensive unlabelled data, only a small amount of labelled data is required for supervised fine-tuning on downstream tasks. This approach reduces the need for labelled EO data. This enables the development of general-purpose EO Foundation Models capable of solving a diverse range of problems. 

In this work, we train the EO Foundation Model PhilEO Version 1.0 [2] on the dataset MajorTOM [6]. The model is trained on Sentinel-2 data, i.e. 23TB, Core-S2L2A. We scale up the pre-training data from less than 1TB in the dataset PhilEO-Globe [7] to 23TB. The model is trained using a combination of reconstruction and auxiliary task losses, including the Mean Squared Error (MSE) for geo-location longitude and latitude prediction. The architecture is a modified U-Net. The training is conducted on the Leonardo Davinci-1 supercomputer. 

In this work, we also extend the capabilities of the evaluation framework for EO Foundation Models we recently introduced in [2]. We develop PhilEO-Bench++. To take advantage of multi-level features and of a U-Net-like architecture, for fine-tuning on downstream tasks, we use the decoder UPerNet [4]. Furthermore, to strengthen the evaluation of EO Foundation Models, we also perform confidence quantification and assessment [5] on both classification and regression tasks, including on land cover semantic segmentation and building density pixel-wise regression. 

Experiments on the PhilEO-Bench downstream tasks of building density estimation, road segmentation and land cover mapping demonstrate the effectiveness of our model. For building density regression, for n-shots n=50 and n=100, the PhilEO model trained on MajorTOM achieves the MSEs 0.0191 and 0.0058, respectively.

 

References:  

[1] D. Szwarcman, et al., “Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications,” arXiv:2412.02732, 2024. 

[2] C. Fibaek, et al., “PhilEO Bench: Evaluating Geo-Spatial Foundation Models,” in Proceedings IGARSS, 2024. 

[3] N. Dionelis, et al., “Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI,” arXiv:2406.18295, 2024. 

[4] T. Xiao, et al., “Unified Perceptual Parsing for Scene Understanding,” 2018.

[5] N. Dionelis and N. Longepe, “Fine-Tuning Foundation Models with Confidence Assessment for enhanced Semantic segmentation,” 2024. 

[6] A. Francis and M. Czerkawski, “MajorTOM: Expandable Datasets for Earth Observation,” IGARSS, 2024. 

[7] B. Le Saux, et al., “The PhilEO Geospatial Foundation Model Suite,” EGU, 2024.

How to cite: Dionelis, N., Musto, R., Paoletti, G., Bosmans, J., Naylor, P., Sarti, S., Di Matteo, F., Cascarano, G., Fibaek, C., and Longepe, N.: Generalist Geospatial Foundation Model PhilEO on Satellite Sentinel-2 MajorTOM Multi-Spectral Data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16635, https://doi.org/10.5194/egusphere-egu25-16635, 2025.

X4.60
|
EGU25-19821
Eloisa Bentivegna, Valentine Anantharaj, Johannes Schmude, Sujit Roy, Ankur Kumar, Amy Lin, Sharana Shivanand, Theodore Papamarkou, Richard Allmendinger, Manil Maskey, and Rahul Ramachandran

AI-based weather emulators have begun to rival the accuracy of traditional numerical solvers, for a fraction of the computational cost. The question of whether they can be reliably deployed in all use cases (e.g., for the forecast of extreme scenarios), however, is still open. We outline an ensembling strategy based on architectural variations of the Prithvi WxC foundation model (FM), highlighting the impact of each of these variations on physical accuracy and ability to capture the distributional extremes. A simple of ensemble of 100 models is sufficient to observe the complex mapping between configuration parameters and the forecast sensitivity of different atmospheric variables. We characterize some features of this mapping and connect them to the task of predicting various weather extremes.

How to cite: Bentivegna, E., Anantharaj, V., Schmude, J., Roy, S., Kumar, A., Lin, A., Shivanand, S., Papamarkou, T., Allmendinger, R., Maskey, M., and Ramachandran, R.: From architecture to atmospheric sensitivity: studying forecast uncertainty with Prithvi-WxC, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19821, https://doi.org/10.5194/egusphere-egu25-19821, 2025.

X4.61
|
EGU25-14422
|
ECS
|
Highlight
Ankur Kumar, Sujit Roy, Udaysankar Nair, Manil Maskey, and Rahul Ramachandran

Weather Foundation Models in general represent a significant advancement in computational weather prediction by leveraging data-driven techniques to improve accuracy and speed. This study evaluates the Prithvi WxC foundation model for weather and climate by systematically assessing its adherence to fundamental physical constraints governing atmospheric processes. While traditional model validation primarily focuses on error statistics of modeled fields, this work takes a more comprehensive approach, incorporating a series of process-based tests to ensure the model's consistency with key atmospheric principles.

We first test the model's compliance with conservation of mass, ensuring that it respects the fundamental principle that mass is neither created nor destroyed within the atmosphere. Next, we examine the model's representation of geostrophic balance, critical for large-scale flow, by evaluating the relationship between the pressure gradient and the Coriolis force. The hypsometric equation is also applied to assess the vertical consistency of the model’s simulations, verifying that changes in pressure are appropriately related to temperature and height. To further evaluate large-scale flow dynamics, we analyze the model’s consistency with thermal wind relations, ensuring that temperature gradients are correctly reflected in the vertical wind profile. Finally, we tested our model on radiative and convective parameterization by comparing its performance of convection to established methods in conventional weather models, testing its ability to properly simulate convective processes. 

The results of these tests highlight the Prithvi WxC model's strengths and areas for improvement in terms of physical consistency. By adhering to these atmospheric principles, the findings of this study offer valuable insights into how the model can be refined, enhancing its potential applications in both weather forecasting and climate research.

 

How to cite: Kumar, A., Roy, S., Nair, U., Maskey, M., and Ramachandran, R.: Comprehensive Validation of the Prithvi WxC FM Through Atmospheric Process Analysis, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-14422, https://doi.org/10.5194/egusphere-egu25-14422, 2025.

X4.62
|
EGU25-17133
Esther Rodrigo-Bonet, Jordi Cerda, Michele Ronco, and Gustau Camps-Valls

Food insecurity is typically modelled using inter-regional data comprising economical, geophysical and social variables. Such datasets are often of varying granularity, with each variable corresponding to a certain granularity level (e.g., GDP is a national variable, while disaster displacement can be local or regional). Additionally, each level shows a specific causal relation of its variables.  Since countries affected by food insecurity are usually underdeveloped, collecting such variables is a challenging task, leading to highly-incomplete datasets. To deal with the multi-level complexity and incomplete nature of the data, we propose to build a hierarchical causal graph (HCG) structure of the variables, that can then be injected in different imputation methods. Specifically, we propose to classify the variables at different granularity levels, and use causal graph discovery to learn a causal graph at each level. We test the proposed approach for imputing food insecurity using a dataset of 300+ economical, geophysical and social variables for more than 70 countries.

How to cite: Rodrigo-Bonet, E., Cerda, J., Ronco, M., and Camps-Valls, G.: Hierarchical Causal Graph-Based Methods for Imputing Food Insecurity, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-17133, https://doi.org/10.5194/egusphere-egu25-17133, 2025.

X4.63
|
EGU25-1534
|
ECS
Jurrian Doornbos and Önder Babur

Uncrewed aerial vehicles (UAVs) have been identified as an important tool supporting more detailed remote sensing applications compared to satellite-based platforms, from agriculture to forest monitoring and sensing mountains. The insights the UAV can offer, due to its flexibility, high precision and sensor variety, are far beyond the previous approaches to measure the health of forests, yield in field crops, and even rockfall risk. The flexibility however also poses a problem, flight conditions, sensor types, flight height and angles all affect the generalization of developed approaches using UAVs. These supervised approaches also rely on large amounts of human-labelled datasets. A pathway to reduce high-label requirements is to utilize unsupervised training with Vision Transformers (ViTs). Pretrained Vision Transformers on large datasets generalize well to unseen data, with only supervised few samples required to specify the application. However, these models are often trained on massive web-scraped RGB datasets. Furthermore, RGB-ViTs miss the infrared domain to handle crucial vegetation information. Finally, UAV imagery is exclusively from the aerial perspective, this is missing in existing pretraining datasets.

We present an openly available, pre-trained Vision Transformer specifically for UAV multispectral imagery across various domains. Furthermore, various downstream applications such as canopy height modelling and semantic segmentation are evaluated and compared against RGB baselines. The main contribution is the openly available training dataset, and the pre-trained models, with recipes for finetuning a task-specific head.

The dataset is built around multispectral image contributions from the ICAERUS Drone Data Analytics Library and an additional database search on Zenodo and Data in Brief (Table 1.). This is followed by a quality check after, including radiometric calibration, and spectral alignment. Furthermore, all data is quantized into 16-bit float and sliced into smaller 224x224 chips with four channels (Green, Red, Red Edge and NIR).  A summary of included datasets is presented in Table 1. DINOv2-s and DINOv2-b were chosen for the architecture as there is much available documentation and provide a state-of-the-art vision foundation model. The training was done in minibatches of size 32, for 6 days on two V100 GPUs.

Early experiments suggest that the pre-trained models outperform existing DINOv2-s and DINOv2-b pre-trained foundation models in both the clarity of the features, as well as tuned on UAV-specific tasks (canopy height modelling, and semantic segmentation).

Table 1. Included datasets for pretraining, total size on disk is 399GB

How to cite: Doornbos, J. and Babur, Ö.: Features from Multispectral Drone Data: Curating, training and distributing Transformers for all, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-1534, https://doi.org/10.5194/egusphere-egu25-1534, 2025.

X4.64
|
EGU25-12338
Yang Lu, Jianan Jiang, Jia Zhong, and Ziming Zou

 Aurora is an important manifestation of solar-terrestrial physical processes. The aurora activities have rapid changes in spatial and intensity distribution during a substorm, especially the expansion phase. In this work, a newly developed aurora evolution model is built including the aurora images prediction and the substorm expansion duration prediction. The aurora images prediction model is trained based on the Convolutional Long Short-Term Memory network, using the aurora images captured by the ultraviolet imager on the Polar satellite during the substorm expansion phases. Given the images after the onset, the model can predict the following aurora images sequences during the substorm expansion phase. However, the images prediction model works well only for 30-45 minutes, which is close to the substorm expansion phase duration. Considering this, the expansion phase duration prediction model is trained using the solar wind and interplanetary magnetic field data. Using the traditional machine learning method, the duration is predicted by inputting these physics parameters.

How to cite: Lu, Y., Jiang, J., Zhong, J., and Zou, Z.: Aurora Evolution Model During the Substorm Expansion Phase using Machine Learning based method , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-12338, https://doi.org/10.5194/egusphere-egu25-12338, 2025.

X4.65
|
EGU25-19016
|
ECS
Erik Scheurer, Jiangtao Wang, Rocco Sedona, Stefano Maurogiovanni, Benedikt Blumenstiel, Johannes Jakubik, Paolo Fraccaro, Thomas Brunschwiler, Stefan Kesselheim, and Gabriele Cavallaro

Earth observation (EO) yields large-scale, multimodal datasets collected from various satellite missions, DEMs, land-use data, and textual metadata. Foundation models like 4M (Massively Multimodal Masked Modeling) can learn a joint embedding space that bridges modality gaps, mitigates missing data issues, and facilitates partial spatio-temporal alignment [1]. However, directly training such foundation models on the vast, high-dimensional original EO datasets is not only computationally intensive but also imposes substantial demands on storage resources.

To address this, one can leverage VQ-VAE (Vector Quantized-Variational AutoEncoder) as neural compressors to transform high-dimension multimodal inputs into a few discrete indices, significantly reducing data volume while preserving critical information. By inverting the tokenization process, we can reconstruct the original high-dimensional data with minimal quality loss, aided by adversarial and perceptual losses that enhance reconstruction fidelity.

Traditional VQ-based approaches, however, face challenges such as inefficient codebook utilization and limited latent space representation. To overcome these, we propose scaling strategies that complement 4M’s tokenizer-based architecture. By expanding the codebook size, latent dimensions, and network depth, our method captures the complexity of EO modalities more effectively. Specifically, we employ spherical quantization techniques like Grouped Spherical Quantization (GSQ) to address limitations in traditional approaches [2]. GSQ constrains codebook vectors to a spherical surface, stabilizing training, preventing code collapse, and promoting uniform codebook usage. Unlike standard VQ, GSQ uses spherical initialization and normalization to maintain consistent distances among codebook entries, ensuring robust latent space coverage even under extreme compression or large codebooks. From our empirical and ablation studies, alternative methods like LFQ (Lookup-Free Quantization), FSQ (Finite Scalar Quantization), and RVQ (Residual Vector Quantizer) often exhibit limitations, such as tightly coupling the latent dimension to codebook size or relying on specialized training losses. In contrast, spherical-based techniques effectively decouple latent dimensions from codebook vocabulary, providing greater flexibility and scalability as data demands increase.

Our approach enables neural compressors to adapt to varying scales of compression and complexity without compromising performance. Comprehensive scalability experiments—examining large codebooks, deeper networks, and diverse compression ratios—assessed the generalizability of the proposed compression strategies and demonstrated their effectiveness on high-dimensional, large-scale EO data with minimal information loss. By integrating advanced compression techniques with scalable architectures, this framework establishes a robust foundation for addressing multimodal challenges in EO research that significantly reduces the difficulty of training foundation models on multimodal high-dimensional EO data.

References

[1] Mizrahi, D., Bachmann, R., Kar, O. F., Yeo, T., Gao, M., Dehghan, A., & Zamir, A. (2023). 4M: Massively Multimodal Masked Modeling (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2312.06647

[2] Wang, J., Qin, Z., Zhang, Y., Hu, V. T., Ommer, B., Briq, R., & Kesselheim, S. (2024). Scaling Image Tokenizers with Grouped Spherical Quantization (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2412.02632

Acknowledgments

This work is performed in the Embed2Scale (Earth Observation & Weather Data Federation With AI Embeddings) project, funded by the EU’s Horizon Europe program under Grant Agreement number 101131841.

How to cite: Scheurer, E., Wang, J., Sedona, R., Maurogiovanni, S., Blumenstiel, B., Jakubik, J., Fraccaro, P., Brunschwiler, T., Kesselheim, S., and Cavallaro, G.: Scalable Efficient Compression in Large-Scale Earth Observation, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19016, https://doi.org/10.5194/egusphere-egu25-19016, 2025.

X4.66
|
EGU25-17360
|
ECS
Gabriele Padovani, Ankur Kumar, Takuya Kurihana, Sandro Fiore, and Valentine Anantharaj

AI foundation models hold considerable promise for leveraging the vast and diverse datasets available in atmospheric and geoscientific research. These models have the potential to advance scientific discovery by capturing complex spatial and temporal relationships inherent in earth system processes. However, the development and deployment of such models is often hindered by limited computational resources.

Accurate reconstruction of fine-scale atmospheric features from coarse-resolution data is a critical challenge in geoscientific modeling, as well as a benchmark for understanding the performance of climate-related models. High-resolution atmospheric data are essential for capturing localized phenomena, such as convective systems, topographic effects, and land-atmosphere interactions, that influence weather patterns and climate processes. However, the generation and storage of high-resolution datasets are computationally expensive, necessitating methods that can infer fine-scale structures from lower-resolution observations.

The primary objective of this study is to validate PrithviWxC [4], a ViT-based foundation model [1], for the task of downscaled image reconstruction. While Prithvi WxC was trained on 160 atmospheric variables from the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2) dataset [2], we implement a smaller version of the initial model, with 21 million parameters, and pretrained on the same dataset with a set of six variables. 

The evaluation process involves the assessment of the model's capacity to reconstruct fine-scale features through the process of downscaling atmospheric data. In this procedure, inputs at a high spatial resolution of 1 km from [5] are first coarsened to 25 km resolution, the same as European Centre of Medium-range Weather Forecasts Reanalysis v.5 (ERA5) [3], and subsequently upscaled to recover the original fine-grained structure. This process serves as a benchmark for assessing the model's capacity to learn and preserve spatial details during resolution transformations, which is an essential requirement for geoscientific modeling tasks.

We fine-tune the model for downscaled image reconstruction on a set of 30784 128x128 patches, and validate its output, produced after learning on a limited temporal period, on tiles coming from the ERA5 dataset, which encompass all seasonality. In particular, we aim at highlighting the model's ability to generalize to data domains beyond its pretraining distribution, demonstrating its adaptability and the transferability of knowledge embedded within ViT architectures. By applying PrithviWxC to a knowledge domain that is distinct from its original training context, we demonstrate the potential for cross-domain learning in geoscientific applications.

 

REFERENCES

[1] Dosovitskiy, Alexey. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).

[2] Gelaro, Ronald, et al. "The modern-era retrospective analysis for research and applications, version 2 (MERRA-2)." Journal of climate 30.14 (2017): 5419-5454.

[3] Hersbach, Hans, et al. "The ERA5 global reanalysis." Quarterly Journal of the Royal Meteorological Society 146.730 (2020): 1999-2049.

[4] Schmude, Johannes, et al. "Prithvi wxc: Foundation model for weather and climate." arXiv preprint arXiv:2409.13598 (2024).

[5] Wedi, Nils P., et al. "A baseline for global weather and climate simulations at 1 km resolution." Journal of Advances in Modeling Earth Systems 12.11 (2020): e2020MS002192.

How to cite: Padovani, G., Kumar, A., Kurihana, T., Fiore, S., and Anantharaj, V.: PrithviWxC Foundation Model Validation on Weather Downscaling for Cross Domain Learning, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-17360, https://doi.org/10.5194/egusphere-egu25-17360, 2025.

X4.67
|
EGU25-16756
Rikard Vinge, Michael L Marszalek, Jannik Schneider, and Conrad M Albrecht

With the rapidly growing production and utilization of Earth Observation (EO) data, the past decade sparked interest in the efficient compression of EO data into low-dimensional embeddings. In a parallel development, EO Foundation Models (FM), trained on large amounts of unlabeled data to be used in a wide range of applications, also utilize low-dimensional embeddings to distill representations of EO data [1, 2, 3]. In one aspect, EO FMs may serve as (lossy) neural compressors to improve data transfer and lower storage needs – effectively reducing the carbon footprint of EO data [4].

While the development in EO FMs rapidly advances, there is need for a novel benchmark scheme to evaluate the quality of (compressed) embeddings. The statement “foundational” or “general purpose representation” needs a test.

As part of the Horizon Europe project “Embed2Scale” [5], co-funded by the European Union (Horizon Europe contract No. 101131841), the Swiss State Secretariat for Education (SERI), and UK Research and Innovation (UKRI), we present a novel approach to benchmark learnt compression of multimodal Copernicus Sentinel data for various relevant application domains. In the form of a competition, contestants provide embeddings that are evaluated on a diverse set of problems based on real-life use cases relevant for the research community, governments, and corporate businesses. The problems are hidden from the contestants to evaluate the applicability of the embeddings to unknown problems. The benchmark statistically evaluates the performance of downstream tasks through fine-tuning of neural networks that fit into commodity hardware. We underline a practically relevant scenario where end users rarely have access to costly and energy-intensive acceleration hardware. The overall performance, i.e. the evaluation across all the benchmark’s problems, is crucial and ensures a diverse and fair evaluation of the embeddings. After the competition, the datasets in the benchmark are published and made available to the community.

[1] X. Sun et al., “RingMo: A remote sensing foundation model with masked image modeling,” IEEE Transactions on Geoscience and Remote Sensing, 2022.

[2] D. Wang et al., “Advancing plain vision transformer toward remote sensing foundation model,” IEEE Transactions on Geoscience and Remote Sensing, 2022.

[3] C. Bodnar et al., “Aurora: A foundation model of the atmosphere,” Tech. Rep., 2024.

[4] R. Wilkinson, M.M. Mleczko, R.J.W. Brewin, K.J. Gaston, M. Mueller, J.D. Shutler, X. Yan, K. Anderson, Environmental impacts of earth observation data in the constellation and cloud computing era,Science of The Total Environment, Volume 909,2024,168584,ISSN 0048-9697, https://doi.org/10.1016/j.scitotenv.2023.168584

[5] https://embed2scale.eu/

How to cite: Vinge, R., Marszalek, M. L., Schneider, J., and Albrecht, C. M.: Earth Observation embeddings at the test: A novel benchmark to evaluate (neural) compression for satellite imagery, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16756, https://doi.org/10.5194/egusphere-egu25-16756, 2025.

X4.68
|
EGU25-20521
|
ECS
Sujit Roy, Ankur Kumar, Rohit Lal, Udaysankar Nair, Manil Maskey, and Rahul Ramachandran

Accurate hurricane intensity estimation is critical for disaster preparedness, yet remains challenging for weather models trained on coarse-resolution datasets. This study proposes a hybrid approach that integrates NASA-IBM's Prithvi WxC model with a deep learning-based Hurricane Intensity Estimation (HIE) model. While the Prithvi WxC model excels in global atmospheric predictions, its coarse-grained outputs can struggle with precise hurricane intensity estimation. To address this, the HIE model is triggered when it identifies a hurricane in the Prithvi model output, providing corrected intensity predictions based on high-resolution data.

A dataset was created for training and evaluation, consisting of 6,000 unique initial conditions from 1980 to 2024 that resulted in hurricanes across all major basins. Ground truth hurricane tracks and intensity data were obtained from the HURDAT database The training phase focused on hurricane cases from 1980 to 2000, building a foundational understanding of global hurricane characteristics. Subsequently, the model was fine-tuned with 2000–2020 data to account for basin-specific variations and improve regional accuracy. The remaining cases (2020–2024) are reserved for validation and assessment. The HIE model employs advanced deep learning techniques to refine key intensity metrics, such as maximum sustained wind speeds and central pressure. By addressing the limitations of Prithvi WxC's coarse-resolution training data, the HIE model achieves greater precision, leveraging fine-grained atmospheric and oceanographic features. This two-step framework, hurricane detection by Prithvi WxC followed by intensity refinement by the HIE model, capitalizes on the strengths of both models to deliver improved predictions.

This highlights the potential of combining foundation models like Prithvi WxC with specialized deep-learning frameworks to overcome existing limitations in hurricane intensity estimation. By incorporating diverse data sources and leveraging modern machine-learning techniques, this hybrid approach bridges the gap between coarse-grained global models and the need for precise regional forecasting.

How to cite: Roy, S., Kumar, A., Lal, R., Nair, U., Maskey, M., and Ramachandran, R.: Integrating Prithvi WxC with a Hurricane Intensity Estimation Model for Accurate Hurricane Forecasting, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-20521, https://doi.org/10.5194/egusphere-egu25-20521, 2025.

Posters virtual: Tue, 29 Apr, 14:00–15:45 | vPoster spot 4

The posters scheduled for virtual presentation are visible in Gather.Town. Attendees are asked to meet the authors during the scheduled attendance time for live video chats. If authors uploaded their presentation files, these files are also linked from the abstracts below. The button to access Gather.Town appears just before the time block starts. Onsite attendees can also visit the virtual poster sessions at the vPoster spots (equal to PICO spots).
Display time: Tue, 29 Apr, 08:30–18:00
Chairpersons: Filippo Accomando, Andrea Vitale

EGU25-16634 | Posters virtual | VPS19

Finetuning and Benchmarking an AI Foundation Model for Cloud Gap Imputation  

Tadie Birihan Medimem, Gabriele Padovani, Takuya Kurihana, Ankur Kumar, Farid Melgani, Valentine G Anantharaj, and Sandro Luigi Fiore
Tue, 29 Apr, 14:00–15:45 (CEST) | vP4.17

Abstract: Cloud cover poses a significant obstacle in harnessing multi-spectral satellite imagery for various earth observation applications including disaster response, land use and land cover mapping. To address this issue, this study investigates the potential of Prithvi WxC foundation model (Johannes Schmude et al., 2024), a deep learning architecture designed for weather and climate applications, to perform cloud gap imputation. By leveraging its ability to capture atmospheric dynamics and predict missing data, Prithvi WxC offers a promising solution.

The primary objective is to assess the accuracy and efficiency of Prithvi WxC in reconstructing cloudy pixels in Moderate Resolution Imaging Spectroradiometer (MODIS) surface reflectance; MOD09 (Eric Vermote, 2015). MOD09 data provides valuable information about earth surface, cloud cover and atmospheric conditions, which are instrumental in informing the Prithvi WxC model during the finetuning and imputation process.

This research evaluates the Prithvi WxC foundation model for cloud gap imputation applications and benchmarks its performance against other foundation models, such as Prithvi EO (Jakubik et al., 2023, 2024). The process begins with preprocessing the MOD09 dataset, filtering out missing and cloudy pixels to create clean visible patches, while real-world cloudy patches are used as masks. The preprocessed data is then resampled to align with the temporal and spatial resolution requirements of both the Prithvi WxC and Prithvi EO foundation models. Through rigorous fine-tuning strategies, these models learn to reconstruct the masked regions, effectively filling the gaps caused by cloud cover. Finally, the fine-tuned foundation models are benchmarked using quantitative metrics, such as the Structural Similarity Index Measure (SSIM) and Mean Absolute Error (MAE), complemented by qualitative visual analysis.

This research explores the potential of Prithvi WxC foundation model, pre-trained on extensive weather and climate data, to improve cloud gap imputation in satellite imagery, and subsequently benchmarks it against earth observation foundation models, such as Prithvi EO. Through this evaluation, we aim to enhance scientific understanding via multi-modality and sensor-independent approaches.

 References

Johannes Schmude, Sujit Roy, Will Trojak, Johannes Jakubik, Daniel Salles Civitarese, Shraddha Singh, Julian Kuehnert, Kumar Ankur, Aman Gupta, Christopher E Phillips, Romeo Kienzler, Daniela Szwarcman, Vishal Gaur, Rajat Shinde, Rohit Lal, Arlindo Da Sil: Prithvi WxC: Foundation Model for Weather and Climate." arXiv preprint arXiv:2409.13598, 2024.

C. Roger, E. F. Vermote, J. P. Ray: https://modis-land.gsfc.nasa.gov/pdf/MOD09_UserGuide_v1.4.pdf. NASA, MODIS Surface Reflectance User’s Guide, Collection 6, 2015.

Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Þorsteinn Elí Gíslason, Benedikt Blumenstiel, Rinki Ghosal, Pedro Henrique de Oliveira, Joao Lucas de Sousa Almeida, Rocco Sedona, Yanghui Kang, Srija Chakraborty, Sizhe Wang, Ankur Kumar, Myscon Truong, Denys: Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications. https://arxiv.org/abs/2412.02732, 2024.

Johannes Jakubik, Sujit Roy, C. E. Phillips, Paolo Fraccaro, Denys Godwin, Bianca Zadrozny, Daniela Szwarcman, Carlos Gomes, Gabby Nyirjesy, Blair Edwards, Daiki Kimura, Naomi Simumba, Linsong Chu, S. Karthik Mukkavilli, Devyani Lambhate, Kamal Das, Ranji: Foundation Models for Generalist Geospatial Artificial Intelligence, 2023.

Eric Vermote: MOD09 MODIS/Terra L2 Surface Reflectance, 5-Min Swath 250m, 500m, and 1km. NASA LP DAAC., NASA GSFC and MODAPS SIPS, NASA, http://doi.org/10.5067/MODIS/MOD09.061, 2015.

How to cite: Medimem, T. B., Padovani, G., Kurihana, T., Kumar, A., Melgani, F., Anantharaj, V. G., and Fiore, S. L.: Finetuning and Benchmarking an AI Foundation Model for Cloud Gap Imputation , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16634, https://doi.org/10.5194/egusphere-egu25-16634, 2025.