EGU26-18011, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-18011
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Monday, 04 May, 16:20–16:30 (CEST)
 
Room -2.92
ESFM - A foundation model framework for heterogeneous data integration
Firat Ozdemir1, Yun Cheng1, Salman Mohebi1, Fanny Lehmann2, Simon Adamov3, Leonardo Trentini5, Langwen Huang7, Levi Lingsch6, Zhenyi Zhang5, Oliver Fuhrer3, Benedikt Soja5, Siddhartha Mishra6, Torsten Hoefler7, Sebastian Schemm4, and Mathieu Salzmann1
Firat Ozdemir et al.
  • 1Swiss Data Science Center, ETH Zürich and EPFL, Zurich and Lausanne, Switzerland
  • 2ETH AI Center, ETH Zurich, Zurich, Switzerland
  • 3ETH Zurich and MeteoSwiss, Zurich, Switzerland
  • 4University of Cambridge, Cambridge, UK
  • 5Institute of Geodesy and Photogrammetry, ETH Zurich, Zurich, Switzerland
  • 6Computational and Applied Mathematics Laboratory, ETH Zurich, Zurich, Switzerland
  • 7Scalable Parallel Computing Laboratory, ETH Zurich, Zurich, Switzerland

With increased availability of high quality diverse weather data, including reanalysis, satellite, surface stations, climate model data, the amount of data-driven foundation models (FM) in the environmental field has increased significantly over the past years with forecasting performances matching and sometimes exceeding physics-based numerical model predictions.  However, most FMs are trained with one dataset or a few datasets with similar sampling and/or resolution properties. While the proposed models achieve remarkable results with the datasets and variables they are trained on; it would be hard to anticipate similar performance under partially missing observations across different dimensions at test time. Similarly, typical design considerations risk limiting usage of these FMs to other heterogeneous datasets concerning the broader Earth sciences community.

We propose Earth System Foundation Model (ESFM), an FM capable of handling heterogeneous observations (i) across different resolutions, (ii) with spatially gridded and non-gridded nature, and (iii) with little to extreme sparsity. We achieve this through simple architectural design considerations and a masked training protocol. Namely, we bin similar ranges of grid resolutions together, while optimizing a different set of tokenizers for significantly different resolution bins to accommodate a single FM for observations across different resolutions. Similarly, we tokenize non-gridded data (i.e., station) separately with a single pixel patch size. Finally, we use variable specific tokenizers, coupled with learnable missing observation tokens, that allow ESFM to naturally accommodate for various subsets of available variables across different spatiotemporal positions. 

In this exploratory study, we show that ESFM is a flexible FM that can achieve impressive forecasting performance under different adverse setups with missing test data across any dimension on ERA5; spatio-temporal and inter-variable. We further test forecasting performance of ESFM in very sparse satellite imagery (3% pixel occupancy) data as well as station data. 

The proposed framework; also compatible for different backbone architectures than the one we experimented with; provides a general approach for integrating diverse Earth system data sources with varying resolutions, sampling patterns, and availability. This makes ESFM particularly relevant for the broader environmental sciences and Earth and space sciences, where challenges related to data heterogeneity and missing observations are central to the development of next-generation data-driven environmental modeling systems.

How to cite: Ozdemir, F., Cheng, Y., Mohebi, S., Lehmann, F., Adamov, S., Trentini, L., Huang, L., Lingsch, L., Zhang, Z., Fuhrer, O., Soja, B., Mishra, S., Hoefler, T., Schemm, S., and Salzmann, M.: ESFM - A foundation model framework for heterogeneous data integration, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18011, https://doi.org/10.5194/egusphere-egu26-18011, 2026.