EGU General Assembly 2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

A two-stage machine learning framework using global satellite data of cloud classes for process-oriented model evaluation

Arndt Kaps1, Axel Lauer1, Gustau Camps-Valls2, Pierre Gentine3,4, Luis Gómez-Chova2, and Veronika Eyring1,5
Arndt Kaps et al.
  • 1Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Physik der Atmosphäre, Oberpfaffenhofen, Germany
  • 2Image Processing Laboratory (IPL), University of València, València, Spain
  • 3Department of Earth and Environmental Engineering, Columbia University, NY, USA
  • 4Center for Learning the Earth with Artificial intelligence and Physics (LEAP), Columbia University, NY, USA
  • 5University of Bremen, Institute of Environmental Physics (IUP), Bremen, Germany

Clouds play a key role in weather and climate but are quite challenging to simulate with global climate models as the relevant physics include non-linear processes on scales covering several orders of magnitude in both the temporal and spatial dimensions. The numerical representation of clouds in global climate models therefore requires a high degree of parameterization, which makes a careful evaluation a prerequisite not only for assessing the skill in reproducing observed climate but also for building confidence in projections of future climate change. Current methods to achieve this usually involve the comparison of multiple large-scale physical properties in the model output to observational data. Here, we introduce a two-stage data-driven machine learning framework for process-oriented evaluation of clouds in climate models based directly on widely known cloud types. The first step relies on CloudSat satellite data to assign cloud labels in line with cloud types defined by the World Meteorological Organization (WMO) to MODIS pixels using deep neural networks. Since the method is supervised and trained on labels provided by CloudSat, the predicted cloud types remain objective and do not require a posteriori labeling. The second step consists of a regression algorithm that predicts fractional cloud types from retrieved cloud physical variables. This step aims to ensure that the method can be used with any data set providing physical variables comparable to MODIS. In particular, we use a Random Forest regression that acts as a transfer model to evaluate the spatially relatively coarse output of climate models and allows the use of varying input features. As a proof of concept, the method is applied to coarse grained ESA Cloud CCI data. The predicted cloud type distributions are physically consistent and show the expected features of the different cloud types. This demonstrates how advanced observational products can be used with this method to obtain cloud type distributions from coarse data, allowing for a process-based evaluation of clouds in climate models.

How to cite: Kaps, A., Lauer, A., Camps-Valls, G., Gentine, P., Gómez-Chova, L., and Eyring, V.: A two-stage machine learning framework using global satellite data of cloud classes for process-oriented model evaluation, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-676,, 2022.


Display file

Comments on the display

to access the discussion