EGU25-13544, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-13544
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Monday, 28 Apr, 16:20–16:50 (CEST)
 
Room -2.93
Hybrid data assimilation and machine learning algorithms for sparse observational data
Sibo Cheng1, Hongwei Fan2, Yilin Zhuang3, Tobias Sebastian Finn1, Lya Lugon1, Karine Sartelet1, Karthik Duraisamy3, Rossella Arcucci2, and Marc Bocquet1
Sibo Cheng et al.
  • 1CEREA, ENPC and EDF R&D, Institut Polytechnique de Paris, Île-de-France, France
  • 2Department of Earth Science and Engineering, Imperial College London, London, UK
  • 3Department of Aerospace Engineering, University of Michigan, Ann Arbor, 48105, MI, United States

Reconstructing spatiotemporal systems from sparse observations remains a long-standing challenge in several domains, including geoscience, air pollution and fluid dynamics. While various data assimilation (DA) and machine learning (ML) methods have shown potential, they still face significant challenges (see [1]): 

  • The computational burden of conventional DA algorithms (including error covariance specification), particularly for multivariate, high-dimensional systems.
  • Sparse and movable sensor placement, which makes conventional ML models (typically requiring fixed and regularly distributed input data) cumbersome.
  • The ill-defined nature of the sparse reconstruction problem, which poses significant risks of overfitting.

We will present our recent works aimed at addressing these challenges. More specifically, we developed latent DA algorithms [2] to reduce the computational burden of variational DA methods. These algorithms demonstrate great potential in efficiently assimilating sparse observations within a reduced-order latent space constructed by neural networks, thanks to the TorchDA library [3]. The latter enables GPU implementation of mainstream data assimilation methods and supports non-explicit state-observation transformation functions, provided they can be learned by a neural network.

We have also employed advanced deep learning techniques, including Voronoi-tessellation CNNs [4] and Vision Transformer-based autoencoders [5], to learn mappings from sparse observations to the complete physical space. These approaches effectively address challenges such as movable sensor placements and varying sensor numbers. Their integration with DA algorithms has also been evaluated.

Finally, our recent work explores [6] the utility of generative AI techniques, particularly denoising diffusion models, for field reconstruction from sparse observations. Generative AI methods offer two main advantages: first, they produce a sample from a probability distribution rather than predicting the mean as a fixed output, which can help mitigate overfitting caused by the illy-defined problem. Second, they inherently function as ensemble predictors by generating several samples, facilitating uncertainty quantification, which is essential in data assimilation. The numerical results tested on cases ranging from fluid dynamics benchmarks to semi-operational air pollution simulations will also be discussed.

[1] Cheng, S., Quilodrán-Casas, C., Ouala, S., Farchi, A., Liu, C., Tandeo, P., Fablet, R., Lucor, D., Iooss, B., Brajard, J., Xiao, D., Janjic, T., Ding, W., Guo, Y., Carrassi, A., Bocquet, M. and Arcucci, R, 2023. Machine learning with data assimilation and uncertainty quantification for dynamical systems: a review. IEEE/CAA Journal of Automatica Sinica

[2] Cheng, S., Chen, J., Anastasiou, C., Angeli, P., Matar, O.K., Guo, Y.K., Pain, C.C. and Arcucci, R., 2023. Generalised latent assimilation in heterogeneous reduced spaces with machine learning surrogate models. Journal of Scientific Computing

[3] Cheng, S., Min, J., Liu, C. and Arcucci, R., 2025. TorchDA: A Python package for performing data assimilation with deep learning forward and transformation functions. Computer Physics Communications

[4] Cheng, S., Liu, C., Guo, Y. and Arcucci, R., 2024. Efficient deep data assimilation with sparse observations and time-varying sensors. Journal of Computational Physics

[5] Fan, H., Cheng, S., de Nazelle, A.J. and Arcucci, R., 2024. ViTAE-SL: a vision transformer-based autoencoder and spatial interpolation learner for field reconstruction. Computer Physics Communications 

[6] Zhuang, Y., Cheng, S. and Duraisamy, K., 2025. Spatially-aware diffusion models with cross-attention for global field reconstruction with sparse observations. Computer Methods in Applied Mechanics and Engineering

How to cite: Cheng, S., Fan, H., Zhuang, Y., Finn, T. S., Lugon, L., Sartelet, K., Duraisamy, K., Arcucci, R., and Bocquet, M.: Hybrid data assimilation and machine learning algorithms for sparse observational data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-13544, https://doi.org/10.5194/egusphere-egu25-13544, 2025.