EGU24-17782, updated on 11 Mar 2024
https://doi.org/10.5194/egusphere-egu24-17782
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Variability among Machine Learning Explanations for Precipitation Forecasting in Köppen Climate Zones

Ali Ulvi Galip Senocak1, Sinan Kalkan2, M. Tugrul Yilmaz3, Ismail Yucel4, and Muhammad Amjad5
Ali Ulvi Galip Senocak et al.
  • 1Middle East Technical University, Graduate School of Natural and Applied Sciences , Department of Civil Engineering, Türkiye (e162324@metu.edu.tr)
  • 2Middle East Technical University, Graduate School of Natural and Applied Sciences , Department of Computer Engineering, Türkiye (skalkan@metu.edu.tr)
  • 3Middle East Technical University, Graduate School of Natural and Applied Sciences , Department of Civil Engineering, Türkiye (tuyilmaz@metu.edu.tr)
  • 4Middle East Technical University, Graduate School of Natural and Applied Sciences , Department of Civil Engineering, Türkiye (iyucel@metu.edu.tr)
  • 5National University of Sciences and Technology, Islamabad, Pakistan (amjadiqm@gmail.com)

A plethora of studies have used machine learning for quantitative precipitation forecasting. However, only a fraction of those studies have focused on the explainability of the utilized machine learning models. Consequently, to the best of the authors' knowledge, the variability in explainability concerning predictor clusters (i.e., grouped predictor categories based on shared attributes such as climate categories) has not received attention in the literature.

This study aims to address this gap by analyzing variability in explanations at the model level regarding different Köppen Climate Zones (i.e., arid, temperate, and continental climates). To this end, Türkiye is selected as the study area, which has a complex topography and omnigenous in climate types. The utilized dataset covers 687 stations spanning 10 different climate zones (clustered into B, C, and D Köppen climate zones) and more than one million rows covering four years as temporal coverage. While the ground truth is defined as the daily observed precipitation amount, the predictors consist of daily total precipitation forecasts of numerical weather prediction models (ECMWF, GFS, ALARO, and WRF) with a 24-hour lead time, geographical parameters (elevation, roughness, slope, aspect, distance to the sea, latitude and longitude), and seasonality (day of the year, and month). The study uses a multi-layer perceptron (Root Mean Squared Error = 3.6 mm/day),  as the machine learning method with two hidden layers (with Gaussian Error Linear Unit non-linearity). It utilizes Huber-Loss (delta = 1.5) as the loss function to mitigate the adverse effects of the long-tailed dataset. A Linear Interpretable Mogel Agnostic (LIME) approach is utilized to explain the predictions by MLP. Topographical, coordinate-based, and seasonality predictors are grouped except for the distance to the sea.

The importance assessments of predictors are compared with drop-out loss, which quantifies the decline in model performance that occurs when a predictor is removed, showing the relevance of the predictors to the predictions of models. Analysis results indicate that the ECMWF forecasts are the most important predictor for the model for all three climate types, with a drop-out loss value of 0.531 for arid (B) climate zones, 1.617 for temperate (C) climate zones, and 0.901 for continental (D) climate zones. Seasonality is more utilized for generating the predictions for continental climate zones (0.05 vs 0.02 for both arid and temperate zones). Another noteworthy result is that the distance to the sea predictor negatively affects the model over arid zones (-0.03) while positively contributing to both continental (0.013) and temperate zones (0.102). Moreover, the drop-out loss for distance to the sea (0.102) exceeds the WRF forecast's (0.076) over temperate climate zones. This might be related to the average distance to the sea (0.99 degrees over temperate, 1.66 over arid, and 1.72 over continental zones). Similarly, topographical parameters have a positive effect over arid (0.003) and continental zones (0.014) while having a negative effect over temperate (-0.012) zones. These results indicate that both multi-model machine learning designs can be beneficial for complex datasets, and the influence of parameters can vary over different input clusters.

How to cite: Senocak, A. U. G., Kalkan, S., Yilmaz, M. T., Yucel, I., and Amjad, M.: Variability among Machine Learning Explanations for Precipitation Forecasting in Köppen Climate Zones, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17782, https://doi.org/10.5194/egusphere-egu24-17782, 2024.