Hybrid data assimilation and machine learning algorithms for sparse observational data

Sibo Cheng; Hongwei Fan; Yilin Zhuang; Tobias Sebastian Finn; Lya Lugon; Karine Sartelet; Karthik Duraisamy; Rossella Arcucci; Marc Bocquet

doi:https://doi.org/10.5194/egusphere-egu25-13544

[Back] [Session NP1.1]

EGU25-13544, updated on 15 Mar 2025

https://doi.org/10.5194/egusphere-egu25-13544

EGU General Assembly 2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Oral | Monday, 28 Apr, 16:20–16:50 (CEST)

Room -2.93

Hybrid data assimilation and machine learning algorithms for sparse observational data

Sibo Cheng¹, Hongwei Fan², Yilin Zhuang³, Tobias Sebastian Finn¹, Lya Lugon¹, Karine Sartelet¹, Karthik Duraisamy³, Rossella Arcucci², and Marc Bocquet¹

Sibo Cheng et al.

¹CEREA, ENPC and EDF R&D, Institut Polytechnique de Paris, Île-de-France, France
²Department of Earth Science and Engineering, Imperial College London, London, UK
³Department of Aerospace Engineering, University of Michigan, Ann Arbor, 48105, MI, United States

Reconstructing spatiotemporal systems from sparse observations remains a long-standing challenge in several domains, including geoscience, air pollution and fluid dynamics. While various data assimilation (DA) and machine learning (ML) methods have shown potential, they still face significant challenges (see [1]):

The computational burden of conventional DA algorithms (including error covariance specification), particularly for multivariate, high-dimensional systems.
Sparse and movable sensor placement, which makes conventional ML models (typically requiring fixed and regularly distributed input data) cumbersome.
The ill-defined nature of the sparse reconstruction problem, which poses significant risks of overfitting.

We will present our recent works aimed at addressing these challenges. More specifically, we developed latent DA algorithms [2] to reduce the computational burden of variational DA methods. These algorithms demonstrate great potential in efficiently assimilating sparse observations within a reduced-order latent space constructed by neural networks, thanks to the TorchDA library [3]. The latter enables GPU implementation of mainstream data assimilation methods and supports non-explicit state-observation transformation functions, provided they can be learned by a neural network.

We have also employed advanced deep learning techniques, including Voronoi-tessellation CNNs [4] and Vision Transformer-based autoencoders [5], to learn mappings from sparse observations to the complete physical space. These approaches effectively address challenges such as movable sensor placements and varying sensor numbers. Their integration with DA algorithms has also been evaluated.

Finally, our recent work explores [6] the utility of generative AI techniques, particularly denoising diffusion models, for field reconstruction from sparse observations. Generative AI methods offer two main advantages: first, they produce a sample from a probability distribution rather than predicting the mean as a fixed output, which can help mitigate overfitting caused by the illy-defined problem. Second, they inherently function as ensemble predictors by generating several samples, facilitating uncertainty quantification, which is essential in data assimilation. The numerical results tested on cases ranging from fluid dynamics benchmarks to semi-operational air pollution simulations will also be discussed.

[1] Cheng, S., Quilodrán-Casas, C., Ouala, S., Farchi, A., Liu, C., Tandeo, P., Fablet, R., Lucor, D., Iooss, B., Brajard, J., Xiao, D., Janjic, T., Ding, W., Guo, Y., Carrassi, A., Bocquet, M. and Arcucci, R, 2023. Machine learning with data assimilation and uncertainty quantification for dynamical systems: a review. IEEE/CAA Journal of Automatica Sinica

[2] Cheng, S., Chen, J., Anastasiou, C., Angeli, P., Matar, O.K., Guo, Y.K., Pain, C.C. and Arcucci, R., 2023. Generalised latent assimilation in heterogeneous reduced spaces with machine learning surrogate models. Journal of Scientific Computing

[3] Cheng, S., Min, J., Liu, C. and Arcucci, R., 2025. TorchDA: A Python package for performing data assimilation with deep learning forward and transformation functions. Computer Physics Communications

[4] Cheng, S., Liu, C., Guo, Y. and Arcucci, R., 2024. Efficient deep data assimilation with sparse observations and time-varying sensors. Journal of Computational Physics

[5] Fan, H., Cheng, S., de Nazelle, A.J. and Arcucci, R., 2024. ViTAE-SL: a vision transformer-based autoencoder and spatial interpolation learner for field reconstruction. Computer Physics Communications

[6] Zhuang, Y., Cheng, S. and Duraisamy, K., 2025. Spatially-aware diffusion models with cross-attention for global field reconstruction with sparse observations. Computer Methods in Applied Mechanics and Engineering

How to cite: Cheng, S., Fan, H., Zhuang, Y., Finn, T. S., Lugon, L., Sartelet, K., Duraisamy, K., Arcucci, R., and Bocquet, M.: Hybrid data assimilation and machine learning algorithms for sparse observational data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-13544, https://doi.org/10.5194/egusphere-egu25-13544, 2025.