AETHER: AI Enhancement for Third-gen Earth observing ImageR. Reaching 3x spatial upsampling and 10x temporal upsampling from existing MTG-I products.

Nicolas Dublé; Sylvain Tanguy; Lucas Arsene; Vincent Poulain; Danaele Puechmaille; Oriol Hinojo Comellas; Miruna Stoicescu

doi:https://doi.org/10.5194/egusphere-egu26-22777

[Back] [Session GI2.1]

EGU26-22777, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-22777

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

AETHER: AI Enhancement for Third-gen Earth observing ImageR. Reaching 3x spatial upsampling and 10x temporal upsampling from existing MTG-I products.

Nicolas Dublé¹, Sylvain Tanguy¹, Lucas Arsene¹, Vincent Poulain¹, Danaele Puechmaille², Oriol Hinojo Comellas², and Miruna Stoicescu²

Nicolas Dublé et al.

¹Thales Service Numériques, Toulouse, France
²EUMETSAT, Darmstadt, Germany

The Meteosat Third Generation (MTG) mission represents a major step forward in geostationary meteorological observation by combining, onboard Meteosat-12, multiple instruments with highly complementary characteristics. Among them, the Flexible Combined Imager (FCI) provides multispectral images of the full Earth disk every ten minutes with a spatial resolution reaching 1 km at nadir, while the Lightning Imager (LI) observes the same scene at a much higher temporal sampling, but with a coarser spatial resolution of approximately 4.5 km at nadir. Although designed for distinct operational purposes, these two sensors offer a unique opportunity for joint exploitation, as they observe identical atmospheric phenomena under fundamentally different spatio-temporal trade-offs. In this context, Thales investigates the use of artificial intelligence techniques to leverage this complementarity and generate enhanced observation products from existing MTG-I data.

The core hypothesis of this work is that the high temporal density of LI observations implicitly encodes fine-scale spatial information. In other words, temporal correlations within LI time series can partially compensate for the sensor’s lower spatial resolution. By exploiting these correlations, fine spatial features can be reconstructed from high temporal frequencies. The availability of reference matching high resolution data enables to consider this process without the need for artificially degraded training data.

To implement this hypothesis, a hybrid deep learning architecture combining convolutional neural networks (CNNs) and Transformers is proposed. CNN components are used to efficiently extract local spatial structures, such as gradients, cloud edges, and internal texture patterns, while Transformer-based attention mechanisms model short- and long-range temporal dependencies across successive LI acquisitions. This combination enables a joint representation of spatial detail and temporal coherence, while remaining compatible with large data volumes and near-operational processing constraints.

The proposed approach is evaluated along two complementary scientific tasks. The first focuses on spatial super-resolution of LI images using LI temporal sequences alone. The second addresses the fusion of FCI and LI data to generate a product combining high spatial resolution with high temporal frequency. In both cases, the results are conclusive. The use of FCI images as a cross-reference makes it possible to assess the physical consistency of reconstructed features and to prevent the introduction of spurious, non-physical details. The super-resolved products remain radiometrically consistent with the input observations, with low radiance discrepancies (RMSE below 1), while recovering finer spatial structures than those achievable through conventional interpolation methods. Compared to standard SISR (Single Image Super Resolution), CNN + Temporal Conv1D, CNN + sparse Conv3D approaches, the hybrid CNN–Transformer model achieves the best overall performance.

As a perspective, the proposed method shows strong potential for operational deployment. Its computational efficiency allows approximately one hour of MTG data—corresponding to about sixty full-disk Earth images—to be processed in less than five minutes on standard computing infrastructure with one Nvidia H-100 configuration, paving the way for the routine generation of high-resolution, high-frequency products from existing geostationary missions.

How to cite: Dublé, N., Tanguy, S., Arsene, L., Poulain, V., Puechmaille, D., Hinojo Comellas, O., and Stoicescu, M.: AETHER: AI Enhancement for Third-gen Earth observing ImageR. Reaching 3x spatial upsampling and 10x temporal upsampling from existing MTG-I products., EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-22777, https://doi.org/10.5194/egusphere-egu26-22777, 2026.