Quantile regression forests for post-processing ECWMF ensemble precipitation forecasts: hyperparameter optimization and comparison to EMOS
- Royal Netherlands Meteorological Institute (KNMI), R&D Weather and Climate modeling, De Bilt, Netherlands
Ensemble forecasts are important due to their ability to characterize forecast uncertainty, which is fundamental when forecasting extreme weather. Ensemble forecasts are however often biased and underdispersed and thus need to be post-processed.
A common approach for this is the use of ensemble model output statistics (EMOS), where a parametric distribution is fitted with a limited number of predictors. With recent advances in computer science and increased amounts of data available, machine learning techniques, like random forests, are becoming more popular for high dimensional regression problems. In this research, we explore the use of the quantile regression forest (QRF), a random forest adapted for conditional quantile estimation, applied to medium range gridded probabilistic precipitation forecasts. QRFs are non-parametric and allow for a larger number of predictors, which means they can possibly consider more dependencies that might otherwise not be captured with a simple EMOS.
A QRF takes several hyperparameters that influence the way the decision trees in the forest are constructed. We explore the minimum number of samples needed in a leaf to split it (minimum node size) and the number of predictors explored in each split (mtry). A hyperparameter space is constructed by setting ranges for both minimum node size and mtry, and the optimal hyperparameter set is determined by performing a cross validated grid search. Here, each model is assessed based on the continuous ranked probability skill score (CRPSS). For comparison, EMOS is applied with a zero-adjusted gamma (ZAGA) distribution, using a limited number of predictors that are physically correlated to precipitation. Both methods are verified on a separate testing data set and evaluated using several scores, including CRPSS and Brier skills score (BSS).
We consider 4 years (November 2018 – October 2022) of archived operational ECMWF-IFS ensemble forecasts for the Netherlands. The data is split into November 2018 – October 2021 for training and cross-validation, and October 2021 – October 2022 for testing, separating data for season, initialization time and lead-time. Forecasts are post-processed up to +10 days. Ensemble statistics on 60+ forecast variables are used as predictors. Spatially and temporally aggregated, gauge-adjusted radar observations are used as predictand. The raw ensemble is considered as the benchmark.
The results of this research will determine what method will be used to post-process the ensemble precipitation forecasts in the context of the early warning center (EWC) of the Royal Netherlands Meteorological Institute. The most suitable method could differ between shorter and longer lead times.
How to cite: van der Kooij, E., Squintu, A., Whan, K., and Schmeits, M.: Quantile regression forests for post-processing ECWMF ensemble precipitation forecasts: hyperparameter optimization and comparison to EMOS, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-14560, https://doi.org/10.5194/egusphere-egu23-14560, 2023.