Benchmarking Deterministic and Generative Machine Learning Models for Statistical Climate Downscaling over Europe

Kevin Debeire; Veronika Eyring; Niels Thuerey

doi:https://doi.org/10.5194/egusphere-egu26-12407

[Back] [Session ITS1.8/CL0.2]

EGU26-12407, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-12407

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Benchmarking Deterministic and Generative Machine Learning Models for Statistical Climate Downscaling over Europe

Kevin Debeire^1,2, Veronika Eyring^1,3, and Niels Thuerey²

Kevin Debeire et al.

¹Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Physik der Atmosphäre, Oberpfaffenhofen, Germany
²School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
³Institute of Environmental Physics (IUP), University of Bremen, Bremen, Germany

Climate models typically operate at coarse spatial resolution (~100 km) due to computational constraints, yet many climate-change impact assessments require fine-scale information (<10 km). In this study, we systematically benchmark three state-of-the-art machine-learning approaches for statistical downscaling, using the storm-resolving ICON NextGEMS dataset as reference. All methods take coarse-resolution climate fields as input and generate physically plausible high-resolution predictions. We compare: (1) UNet, a deterministic encoder–decoder architecture; (2) CorrDiff, which augments the UNet backbone with a diffusion model to produce probabilistic ensembles; and (3) CorrDiff++, which replaces diffusion with flow-matching to improve sampling efficiency. We perform 10× downscaling (0.56° to 0.056°) over central Europe for six surface variables, including temperature, wind, and precipitation. The models are evaluated along multiple dimensions: deterministic accuracy (bias, correlation), probabilistic skill (ensemble reliability and sharpness), and physical realism (energy spectra, temporal coherence, representation of extremes). Our results highlight fundamental trade-offs between computational cost, physical consistency, and uncertainty quantification. These insights provide guidance on when the additional complexity of generative models is justified for climate science applications.

How to cite: Debeire, K., Eyring, V., and Thuerey, N.: Benchmarking Deterministic and Generative Machine Learning Models for Statistical Climate Downscaling over Europe, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12407, https://doi.org/10.5194/egusphere-egu26-12407, 2026.