EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Machine learning workflow for deriving regional geoclimatic clusters from high-dimensional data

Sebastian Lehner1,2, Katharina Enigl1,2, and Matthias Schlögl1,3
Sebastian Lehner et al.
  • 1Department Climate-Impact-Research, GeoSphere Austria, Vienna, Austria
  • 2Department of Meteorology and Geophysics, University of Vienna, Vienna, Austria
  • 3Institute of Statistics, University of Natural Resources and Life Sciences, Vienna, Austria

Geoclimatic regions represent climatic forcing zones, which constitute important spatial entities that serve as a basis for a broad range of analyses in earth system sciences. The plethora of geospatial variables that are relevant for obtaining consistent clusters represent a high-dimensionality, especially when working with high-resolution gridded data, which may render the derivation of such regions complex. This is worsened by typical characteristics of geoclimatic data like multicollinearity, nonlinear effects and potentially complex interactions between features. We therefore present a nonparametric machine learning workflow, consisting of dimensionality reduction and clustering for deriving geospatial clusters of similar geoclimatic characteristics. We demonstrate the applicability of the proposed procedure using a comprehensive dataset featuring climatological and geomorphometric data from Austria, aggregated to the recent climatological normal from 1992 to 2021.
The modelling workflow consists of three major sequential steps: (1) linear dimensionality reduction using Principal Component Analysis, yielding a reduced, orthogonal sub-space, (2) nonlinear dimensionality reduction applied to the reduced sub-space using Uniform Manifold Approximation and Projection, and (3) clustering the learned manifold projection via Hierarchical Density-Based Spatial Clustering of Applications with Noise. The contribution of the input features to the cluster result is then assessed by means of permutation feature importance of random forest models. These are trained by treating the clustering result as a supervised classification problem. Results show the flexibility of the defined workflow and exhibit good agreement with both quantitatively derived and synoptically informed characterizations of geoclimatic regions from other studies. However, this flexibility does entail certain challenges with respect to hyperparameter settings, which require careful exploration and tuning. The proposed workflow may serve as a blueprint for deriving consistent geospatial clusters exhibiting similar geoclimatic attributes.

How to cite: Lehner, S., Enigl, K., and Schlögl, M.: Machine learning workflow for deriving regional geoclimatic clusters from high-dimensional data, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-17197,, 2023.