NP5.1 | Advances in statistical post-processing, blending, and verification of deterministic and probabilistic forecasts
Advances in statistical post-processing, blending, and verification of deterministic and probabilistic forecasts
Co-organized by AS1/CL5/HS13
Convener: Maxime TaillardatECSECS | Co-conveners: Stéphane Vannitsem, Jochen Broecker, Sebastian LerchECSECS, Stephan HemriECSECS, Daniel S. Wilks, Julie BessacECSECS
Orals
| Wed, 26 Apr, 14:00–15:45 (CEST)
 
Room -2.31
Posters on site
| Attendance Tue, 25 Apr, 14:00–15:45 (CEST)
 
Hall X4
Posters virtual
| Attendance Tue, 25 Apr, 14:00–15:45 (CEST)
 
vHall ESSI/GI/NP
Orals |
Wed, 14:00
Tue, 14:00
Tue, 14:00
Statistical post-processing techniques for weather, climate, and hydrological forecasts are powerful approaches to compensate for effects of errors in model structure or initial conditions, and to calibrate inaccurately dispersed ensembles. These techniques are now an integral part of many forecasting suites and are used in many end-user applications such as wind energy production or flood warning systems. Many of these techniques are flourishing in the statistical, meteorological, climatological, hydrological, and engineering communities. The methods range in complexity from simple bias correction up to very sophisticated distribution-adjusting techniques that take into account correlations among the prognostic variables.

At the same time, a lot of efforts are put in combining multiple forecasting sources in order to get reliable and seamless forecasts on time ranges from minutes to weeks. Such blending techniques are currently developed in many meteorological centers. These forecasting systems are indispensable for societal decision making, for instance to help better prepare for adverse weather. Thus, there is a need for objective statistical framework for "forecast verification'', i.e. qualitative and quantitative assessment of forecast performance.

In this session, we invite presentations dealing with both theoretical developments in statistical post-processing and evaluation of their performances in different practical applications oriented toward environmental predictions, and new developments dealing with the problem of combining or blending different types of forecasts in order to improve reliability from very short to long time scales.

Orals: Wed, 26 Apr | Room -2.31

Chairpersons: Maxime Taillardat, Stéphane Vannitsem, Jochen Broecker
14:00–14:05
Assessing predictive performance
14:05–14:15
|
EGU23-9083
|
NP5.1
|
solicited
|
Virtual presentation
|
Barbara Casati, Cristian Lussana, and Alice Crespi

The ERA5 global reanalysis has been compared against a high-resolution regional reanalysis (COSMO-REA6) by means of scale-separation diagnostics based on 2d Haar discrete wavelet transforms. The presented method builds upon existing methods and enables the assessment of bias, error and skill for individual spatial scales, separately. A new skill score (evaluated against random chance) and the Symmetric Bounded Efficiency are introduced. These are compared to the Nash-Sutcliffe and the Kling-Gupta Efficiencies, evaluated on different scales, and the benefits of symmetric statistics are illustrated. As expected, the wavelet statistics show that the coarser resolution ERA5 products underestimate small-to-medium scale precipitation compared to COSMO-REA6. The newly introduced skill score shows that the ERA5 control member (EA-HRES), despite its higher variability, exhibits better skill in representing small-to-medium scales with respect to the smoother ensemble members. The Symmetric Bounded Efficiency is suitable for the intercomparison of reanalyses, since it is invariant with respect to the order of comparison.

How to cite: Casati, B., Lussana, C., and Crespi, A.: Scale-separation diagnostics and the Symmetric Bounded Efficiency for the inter-comparison of precipitation reanalyses, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9083, https://doi.org/10.5194/egusphere-egu23-9083, 2023.

14:15–14:25
|
EGU23-2701
|
NP5.1
|
Virtual presentation
|
Annette Möller and Friederike Grupe

This work investigates several statistical tests in the context of probabilistic weather forecasting and ensemble postprocessing. The tests are commonly used for comparing predictive performance of e.g. two statistical postprocessing models.  

In the first part of the analysis a case study is conducted on temperature data consisting of observations and ensemble forecasts. The tests are applied to compare the performance of two probabilistic temperature forecasts at different stations, for different lead times, investigating several standard verification metrics to measure prediction performance. The analysis shows that the tests generally behave consistently in the context of temperature forecasts. However, for certain scenarios some tests might be be preferred over the others. In general, the combination of the original Diebold-Mariano test with the continuous ranked probability score (CRPS) to assess forecast accuracy leads to the most consistent and reliable results.

The second part of the analysis uses simulated data to investigate the general behaviour of the tests in different postprocessing scenarios as well as their size and power properties. Again, the original Diebold-Mariano test appears to perform most reliably and shows no noticeable inconsistent behaviour for different simulation settings.

How to cite: Möller, A. and Grupe, F.: Investigating properties of statistical tests for comparing predictive performance with application to probabilistic weather forecasting, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2701, https://doi.org/10.5194/egusphere-egu23-2701, 2023.

14:25–14:35
|
EGU23-11660
|
NP5.1
|
ECS
|
On-site presentation
|
Sam Allen and Johanna Ziegel

It is often stated that the goal of probabilistic forecasting is to issue predictive distributions that are as sharp as possible, subject to being calibrated. To assess the calibration of ensemble forecasts, it is customary to employ rank histograms. Rank histograms not only assess whether or not an ensemble prediction system is calibrated, but they also reveal what (if any) systematic biases are present in the forecasts. This information can readily be relayed back to forecasters, helping to improve future predictions. Such is the utility of rank histograms, several extensions have been proposed to evaluate the calibration of probabilistic forecasts for multivariate outcomes. These extensions typically introduce a so-called pre-rank function that condenses the multivariate forecasts and observations into univariate objects, from which a standard rank histogram can be constructed. Several different approaches to construct multivariate rank histograms have been proposed, each of which differs in the choice of pre-rank function. Existing pre-rank functions typically aim to preserve as much information as possible when condensing the multivariate forecasts and observations into univariate objects. Although this is sensible when testing for multivariate calibration, the resulting rank histograms are often difficult to interpret, and are therefore rarely used in practice.        
We argue that the principal utility of these histogram-based diagnostic tools is that they provide forecasters with additional information regarding the deficiencies that exist in their forecasts, in turn allowing them to address these shortcomings more readily; interpretation is therefore a key requirement. We demonstrate that there are very few restrictions on the choice of pre-rank function when constructing multivariate rank histograms, meaning forecasters need not restrict themselves to the few proposed already, but can instead choose a pre-rank function on a case-by-case basis, depending on what information they want to extract from their forecasts. We illustrate this by introducing a range of possible pre-rank functions when assessing the calibration of probabilistic spatial field forecasts. The pre-rank functions that we introduce are easy to interpret, easy to implement, and they provide complementary information. Several pre-rank functions can therefore be employed to achieve a more complete understanding of the multivariate forecast performance. Finally, having chosen suitable pre-rank functions, tests for univariate calibration based on rank histograms can readily be applied to the multivariate rank histograms. We illustrate this here using e-values, which provide a theoretically attractive way to sequentially test for the calibration of probabilistic forecasts.

How to cite: Allen, S. and Ziegel, J.: Assessing the calibration of multivariate ensemble forecasts: E-values and the choice of pre-rank function, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11660, https://doi.org/10.5194/egusphere-egu23-11660, 2023.

Forecast post-processing
14:35–14:45
|
EGU23-946
|
NP5.1
|
ECS
|
On-site presentation
Mariana Clare, Zied Ben Bouallegue, Matthew Chantry, Martin Leutbecher, and Thomas Haiden

The large data volumes available in weather forecasting make post-processing an attractive field for applying machine learning. In turn, novel statistical machine learning methods that can be used to generate uncertainty information from a deterministic forecast are of great interest to forecast users, especially given the computational cost of running high resolution ensembles. In this work, we show how one such method, a Bayesian Neural Network (BNN), can be used to post-process a single global high resolution forecast for 2m temperature. This methodology improves both the accuracy of the forecast and adds uncertainty information, by predicting the distribution of the forecast error relative to its own analysis.

Here we assess both model and data uncertainty using two different BNN approaches. In the first approach, the BNN’s parameters are defined to be distributions rather than deterministic parameters, thereby generating an ensemble of models that can be used to quantify model uncertainty. In the second approach, the BNN remains deterministic but predicts a distribution rather than a deterministic output thereby quantifying data uncertainty. Our BNN results are benchmarked against simpler statistical methods, as well as statistics from the ECMWF operational ensemble.

Finally, in order to add trustworthiness to the BNN predictions, we apply an explainable AI technique (Layerwise Relevance Propagation) so as to understand whether the variables on which the BNN bases its prediction are physically reasonable or whether it is instead learning spurious correlations.

How to cite: Clare, M., Ben Bouallegue, Z., Chantry, M., Leutbecher, M., and Haiden, T.: Combining Bayesian Neural Networks with explainable AI techniques for trustworthy probabilistic post-processing, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-946, https://doi.org/10.5194/egusphere-egu23-946, 2023.

14:45–14:55
|
EGU23-13849
|
NP5.1
|
On-site presentation
|
John Bjørnar Bremnes

In recent years neural networks have successfully been applied to probabilistic post-processing of numerical weather prediction forecasts. In the Bernstein Quantile Networks (BQN) method predictive quantile distributions are specified by Bernstein polynomials and their coefficients linked to input features through flexible neural networks. However, precipitation presents an additional challenge due to its mixed distributed nature with a considerable proportion of dry events for short accumulation periods. In this presentation, it is demonstrated how the BQN method can be modified to mixed distributed variables like precipitation by introducing a latent variable and treating zero precipitation cases as censored data. The method is tested on both synthetic and real precipitation forecast data and compared to an approach where a model of the probability of precipitation is combined with a model of precipitation amounts using the laws of probability.

 

How to cite: Bremnes, J. B.: Censored Bernstein quantile networks for probabilistic precipitation forecasting, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13849, https://doi.org/10.5194/egusphere-egu23-13849, 2023.

14:55–15:05
|
EGU23-2592
|
NP5.1
|
ECS
|
On-site presentation
Romain Pic, Clément Dombry, Maxime Taillardat, and Philippe Naveau

Most Numerical Weather Prediction (NWP) systems use statistical postprocessing methods to correct for bias and underdispersion errors made by ensemble forecasting. This underdispersion leads to an underestimation of extreme events. Thus, many statistical postprocessing methods have been used to take into consideration the extremal behavior of meteorological phenomena such as precipitation. State-of-the-art techniques are based on Machine Learning combined with knowledge from Extreme Value Theory in order to improve forecasts. However, some of the best techniques do not consider the spatial dependency between locations. For example, Taillardat et al. (2019) trains a different Quantile Regression Forest at each location of interest and Rasp & Lerch (2018) uses neural networks with an embedding for the station's information in order to train a global model.
The dataset used corresponds to 3-h precipitation amounts produced by the radar-based observations of ANTILOPE and the 17-members ensemble forecast system called PEAROME. The dataset spans over the south of France with a grid resolution of 0.025 degrees. Our method uses a U-Net-like neural network in order to take into account the spatial structure of the data and the output of our model is a parameterized law at each grid point. Among the choices available in the literature, we focused on the Extended Generalized Pareto Distribution  and the truncated logistic with a point mass in 0. The model is trained by minimizing the scoring rules such as the Continuous Ranked Probability Score, the Log-Score or weighted versions of the aforementioned scoring rules. The method developed here is then compared to the raw ensemble as well as state-of-the-art techniques through scoring rules, skill scores and ROC curves.

References :

  • L. Pacchiardi, R. Adewoyin, P. Dueben, and R. Dutta. Probabilistic forecasting with generative networks via scoring rule minimization. Dec. 2021. arXiv:2112.08217
  • M. Taillardat, A.-L. Fougères, P. Naveau, and O. Mestre. Forest-based and semiparametric methods for the postprocessing of rainfall ensemble forecasting. Weather and Forecasting, 34(3):617–634, jun 2019. doi: 10.1175/waf-d-18-0149.1.

How to cite: Pic, R., Dombry, C., Taillardat, M., and Naveau, P.: U-Net based Methods for the Postprocessing of Precipitation Ensemble Forecasting, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2592, https://doi.org/10.5194/egusphere-egu23-2592, 2023.

15:05–15:15
|
EGU23-10153
|
NP5.1
|
On-site presentation
Gregory Duane, Francine Schevenhoven, and Jeffrey Weiss

The established benefits of post-processing the results of multi-model ensembles, even by simple averaging, suggest a more radical approach: The models should be combined more frequently in run-time so as to form a single “supermodel”.  Simple nudging of models to one another, as frequently as the models might assimilate data from observations, combines model fusion with a reasonable degree of model autonomy.

Key to the success of the supermodeling approach is the phenomenon of chaos synchronization, known in the field of nonlinear dynamics, wherein two chaotic systems synchronize when connected through only a few of their variables, despite sensitive dependence on initial conditions. Synchronization gives rise to consensus among models. The nudging coefficients can be trained so that that consensus agrees with observations, because the effective dynamics of the trained supermodel, regarded as a single dynamical system, matches the dynamics of nature. Yet the number of independent nudging coefficients that must be trained is far less than the number of trainable parameters in a typical climate model.

It is expected that supermodeling will be especially useful for improving the representation of localized structures, such as blocking patterns, which will wash out if de-synchronized output fields of different models are combined by averaging.  We confirm a hypothesis that such coherent structures will synchronize even when the underlying fields do not, because the internal synchronization within each structure re-enforces synchronization between models: A configuration of CAM4 and CAM5 models, of different resolution, connected by nudging, exhibits correlated blocking activity even when the flows do not otherwise synchronize.  

We further explore the basis for correlated blocking activity in a pair of coupled quasi-geostrophic channel models. The local synchronization error is lower in a region of the channels where blocks form than elsewhere in the channels. Blocking correlations emerge as a vestige of “chimera synchronization”, the phenomenon in which complete synchronization of two spatially extended systems is intermittent in space as well as time. Such partial synchronization of different models in the regions of blocks - and of other structures such as jets, fronts, and large-scale convection - would be particularly useful for projecting climate-change patterns in extreme events associated with those structures.

How to cite: Duane, G., Schevenhoven, F., and Weiss, J.: Synchronization of Blocking Patterns in Diifferent Models, Connected So As to Form a “Supermodel” of Future Climate, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-10153, https://doi.org/10.5194/egusphere-egu23-10153, 2023.

15:15–15:25
|
EGU23-14712
|
NP5.1
|
ECS
|
On-site presentation
Antonello A. Squintu, Eva van der Kooij, Kirien Whan, and Maurice Schmeits

In the framework of KNMI’s Early Warning Center (EWC), ECMWF ensemble (ENS) predictions are used to issue medium-range forecasts of severe weather. Timely forecasts of wind gusts extremes are important to prevent potential damage. However, ensemble forecasts are affected by biases and under- or over-dispersion. These errors lead to a reduction in the skill of the forecasts, especially for long lead-times and for extreme cases, such as windstorms and deep convective episodes. Hence, statistical post-processing is a fundamental step in the establishment of a skillful weather alert system for extreme wind gust events.     

However, weather models like ECMWF-IFS are subject to frequent updates, which include changes in the calculation of certain diagnostic variables and by consequence in statistical features of their ensemble distribution. This is the case for ECMWF wind gusts forecasts, whose bias has been reduced with the last update in October 2021. Therefore, the use of pre-update wind gusts forecasts in the training of the post-processing model must be considered with care.

In the context of the development of an Ensemble Model Output Statistics (EMOS) model, this limitation has been tackled by reconstructing wind-gusts forecasts with a preliminary EMOS model. This step has been performed by including in the regression those variables that are used by ECMWF for the calculation of wind gusts, which were less affected by the update.

The reconstructed wind gusts forecasts have been added to a set of summary statistics of the ensemble distribution of variables physically related to wind gusts. A process of forward selection has been applied to identify the most relevant contributions to the general EMOS model, highlighting reconstructed wind gusts as the most important predictor for all lead-times.

The post-processed forecasts obtained with this experimental EMOS model have been verified and compared to those calculated with a conventional EMOS model (performed ignoring the above caveats) and with the results of a non-parametric Quantile Regression Forest. These models have been trained on the same period (2018-2021) and tested on the period that has followed the update (2021-2022), including only grid-points and stations that cover the territory of the Netherlands and distinguishing between summer and winter half-years. The method showing the best performance will be employed operationally for the post-processing of ECMWF-ENS wind gust forecasts over the Netherlands and will be used in the EWC weather alert system.

How to cite: Squintu, A. A., van der Kooij, E., Whan, K., and Schmeits, M.: NWP model updates and post-processing: a strategy for an EMOS model on ECMWF wind gusts forecasts, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14712, https://doi.org/10.5194/egusphere-egu23-14712, 2023.

15:25–15:35
|
EGU23-17348
|
NP5.1
|
On-site presentation
Martin Widmann, Noemi Gonczol, Michael Angus, and Robert Neal

Accurate predictions of heavy precipitation in India are vital for impact-orientated forecasting, and an essential requirement for mitigating the impact of damaging flood events. Operational forecasts from non-convection-permitting models can have large biases in the intensities of heavy precipitation, and while convection-permitting models can perform better, their operational use over large areas is not yet feasible. Statistical postprocessing can reduce these biases for relatively little computational cost, but few studies have focused on postprocessing forecasts of monsoonal rainfall.

We present a postprocessing method for operational precipitation forecasts based on local precipitation distributions for 30 Indian weather types. It is applied to ensemble forecasts for daily precipitation with 12km spatial resolution and lead times of up to 10 days from the Indian National Centre for Medium Range Weather Forecasting (NCMRWF) Ensemble Prediction System (NEPS). The method yields local probabilistic forecasts that are the weighted mean of the observed local precipitation distributions for each weather type, with weights given by the relative frequency of the weather types in the forecast ensemble.

The general forecast skill is determined through the Continuous Ranked Probability Skill Score (CRPSS) and the skill for predicting the exceedance of the local 90th percentile is quantified through the Brier Skill Score (BSS). The CRPSS shows moderate improvement over most of India for forecasts with one day lead time, and substantial improvements almost everywhere for longer lead times. The BSS for one day forecasts indicates a spatially complex pattern of higher and lower performance, while for longer lead times the forecasts for heavy precipitation are improved almost everywhere. The improvements with respect to both measures are particularly high over mountainous or wet regions. We will also present reliability diagrams for the raw and postprocessed forecasts of threshold exceedances.

 

 

How to cite: Widmann, M., Gonczol, N., Angus, M., and Neal, R.: Postprocessing of ensemble precipitation forecasts over India using weather types, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-17348, https://doi.org/10.5194/egusphere-egu23-17348, 2023.

15:35–15:45
|
EGU23-9328
|
NP5.1
|
On-site presentation
Jonas Bhend, Jonathan Demaeyer, Sebastian Lerch, Cristina Primo, Bert Van Schaeybroeck, Aitor Atencia, Zied Ben Bouallègue, Jieyu Chen, Markus Dabernig, Gavin Evans, Jana Faganeli Pucer, Ben Hooper, Nina Horat, David Jobst, Janko Merše, Peter Mlakar, Annette Möller, Olivier Mestre, Maxime Taillardat, and Stéphane Vannitsem

Statistical postprocessing of forecasts from numerical weather prediction systems is an important component of modern weather forecasting systems. A growing variety of postprocessing methods has been proposed, but a comprehensive, community-driven comparison of their relative performance is yet to be established. Important reasons for this lack include the absence of a fair intercomparison protocol, and, the difficulty of constructing a common comprehensive dataset that can be used to perform such intercomparison. Here we introduce the first version of the EUPPBench, a dataset of time-aligned medium-range forecasts and observations over Central Europe, with the aim to facilitate and standardize the intercomparison of postprocessing methods. This dataset is publicly available [1], includes station and gridded data, ensemble forecasts for training (20 years) and validation (2 years) based on the ECMWF system. The initial dataset is the basis of an ongoing activity to establish a benchmarking platform for postprocessing of medium-range weather forecasts. We showcase a first benchmark of several methods for the adjustment of near-surface temperature forecasts and outline the future plans for the benchmark activity. 

 

[1] https://github.com/EUPP-benchmark/climetlab-eumetnet-postprocessing-benchmark

How to cite: Bhend, J., Demaeyer, J., Lerch, S., Primo, C., Van Schaeybroeck, B., Atencia, A., Ben Bouallègue, Z., Chen, J., Dabernig, M., Evans, G., Faganeli Pucer, J., Hooper, B., Horat, N., Jobst, D., Merše, J., Mlakar, P., Möller, A., Mestre, O., Taillardat, M., and Vannitsem, S.: The EUPPBench postprocessing benchmark, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-9328, https://doi.org/10.5194/egusphere-egu23-9328, 2023.

Posters on site: Tue, 25 Apr, 14:00–15:45 | Hall X4

Chairpersons: Sebastian Lerch, Maxime Taillardat
Assessing predictive performance
X4.91
|
EGU23-12316
|
NP5.1
Zied Ben Bouallegue

Reliability is a key attribute of an ensemble forecast. Typically, this means that one expects that the ensemble spread reflects the potential error of the corresponding ensemble mean forecast. In the realistic case of an unperfect forecast, reliability deficiencies can be diagnosed with tools such as the reliability diagram and the rank histogram. Along with the computation of scores, the use of these diagnostic tools is common practice in ensemble forecast verification when assessing univariate forecasts. But what does reliability mean in practical terms when assessing multivariate forecasts? Here the concept of reliability is revisited in the simplest of the multivariate cases: the bivariate forecast. As a result, we propose a set of new diagnostic tools with an emphasis on the cross-variable reliability aspect. Real case examples are used for illustration and discussion.

How to cite: Ben Bouallegue, Z.: On the reliability of bivariate forecasts, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12316, https://doi.org/10.5194/egusphere-egu23-12316, 2023.

X4.92
|
EGU23-8824
|
NP5.1
|
ECS
Maxime Taillardat, Anne-Laure Fougères, Philippe Naveau, and Raphaël De Fondeville

Verifying probabilistic forecasts for extreme events is a highly active research area because popular media and public opinions are naturally focused on extreme events, and biased conclusions are readily made. In this context, classical verification methods tailored for extreme events, such as thresholded and weighted scoring rules, have undesirable properties that cannot be mitigated, and the well-known continuous ranked probability score (CRPS) is no exception.

Here, we define a formal framework for assessing the behavior of forecast evaluation procedures with respect to extreme events, which we use to demonstrate that assessment based on the expectation of a proper score is not suitable for extremes. Alternatively, we propose studying the properties of the CRPS as a random variable by using extreme value theory to address extreme event verification. An index is introduced to compare calibrated forecasts, which summarizes the ability of probabilistic forecasts for predicting extremes. The strengths and limitations of this method are discussed using both theoretical arguments and simulations.

How to cite: Taillardat, M., Fougères, A.-L., Naveau, P., and De Fondeville, R.: Evaluating probabilistic forecasts of extremes using continuous ranked probability score distributions, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8824, https://doi.org/10.5194/egusphere-egu23-8824, 2023.

X4.93
|
EGU23-12242
|
NP5.1
Marion Mittermaier and Eric Gilleland

Spatial sampling remains a conundrum for verification. The observations that are required are rarely on a grid, nor are they homogenously spaced. They are often located where there are people, easy access and do not sample the variable in a representative way. In an aggregate sense, scores derived from such observation locations, will give areas with greater observation density more weight in the aggregate if the variations in network density are not accounted for. Furthermore the performance in some parts of the domain may not be represented at all if there are no observations there. Gridded analyses on the other hand often provide complete coverage, and offer great ease of use, but adjacent grid boxes are not independent. Given this relative wealth of coverage and uniform sampling, we tend to use all available grid points for computing aggregate scores for an area or region, despite knowing that this is likely to produce too-narrow confidence intervals and inflate any statistical significance that may be present. 

In this presentation a variety of approaches, both empirical and statistical, are explored to establish what we ought to include when computing aggregate scores. Three different empirical sampling approaches are compared to selections from statistical coverage or network design algorithms. The empirical options include what is termed “strict” sub-sampling, whereby a sample is taken from the full grid and the reduction in sample size is explored by systematically continually taking a sub-sample from the sub-sample. The second is a systematic reduction in sample size from the original grid whereby each sample is drawn from the original grid, taken every other grid point, then every 3rd grid point, every 4th etc. The third is a mean computed from N random draws of reducing sample size. These empirical options do not respect land or sea locations. They are purely intended at looking at the behaviour and stability of the sample score. The coverage design algorithms provide a methodology for deriving homogeneous samples for irregularly spaced surface networks over land, and regularly spaced sampling of grids over the ocean, to achieve an optimal blend of sampling for regions that cover both land and sea.  These sample sizes and sample scores are compared to a statistically computed effective sample size. 

Some interesting and surprising results emerge. One of which is that as little as 1% of the total number of grid points may be sufficient for measuring the performance of the forecast on a grid, though the proportion of the total will always be dependent on (to varying degrees) the variable, the threshold or event of interest, the metric or score, and the characteristics of the geographical region of interest. 

How to cite: Mittermaier, M. and Gilleland, E.: Exploring empirical and statistical approaches for determining an appropriate sample size for aggregate scores, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12242, https://doi.org/10.5194/egusphere-egu23-12242, 2023.

X4.94
|
EGU23-11230
|
NP5.1
Clément Dombry, Romain Pic, Philippe Naveau, and Maxime Taillardat

The theoretical advances on the properties of scoring rules over the past decades have broaden the use of scoring rules in probabilistic forecasting. In meteorological forecasting, statistical postprocessing techniques are essential to improve the forecasts made by deterministic physical models. Numerous state-of-the-art statistical postprocessing techniques are based on distributional regression evaluated with the Continuous Ranked Probability Score (CRPS). However, theoretical properties of such minimization of the CRPS have mostly considered the unconditional framework (i.e. without covariables) and infinite sample sizes. We circumvent these limitations and study the rate of convergence in terms of CRPS of distributional regression methods. We find the optimal minimax rate of convergence for a given class of distributions. Moreover, we show that the nearest neighbor method and the kernel method for distributional regression reach the optimal rate of convergence in dimension larger than 2 and in any dimension, respectively.
Associated article: https://doi.org/10.1016/j.ijforecast.2022.11.001

How to cite: Dombry, C., Pic, R., Naveau, P., and Taillardat, M.: Mathematical Properties of Continuous Ranked Probability Score Forecasting, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11230, https://doi.org/10.5194/egusphere-egu23-11230, 2023.

Forecast post-processing
X4.95
|
EGU23-14425
|
NP5.1
|
ECS
|
Jakob Wessel, Chris Ferro, and Frank Kwasniok

Numerical weather prediction (NWP) models usually output their forecasts at a multiplicity of different lead times. For example, the Met Office ensemble prediction system for the UK (MOGREPS-UK) predicts atmospheric variables on a 2.2km grid for up 126h on hourly and sub-hourly timesteps. Even though for applications, information is often required on this range of lead times, many post-processing methods in the literature are either applied at fixed lead time or by fitting individual models for each lead time. This is also the case in systems used in practice such as the Met Office IMPROVER system. However, this is 1) computationally expensive because it requires the training of multiple models if users are interested in information at multiple lead times and 2) prohibitive because it restricts the training data used for training post-processing models and the usability of fitted models.

In this work we investigate lead time dependence of ensemble post-processing methods by looking at ensemble forecasts in an idealized Lorenz96 system as well as temperature forecast data from the Met Office MOGREPS-UK system. First, we investigate the lead time dependence of estimated model parameters in non-homogenous Gaussian regression (NGR -- a standard ensemble post-processing technique) and find substantial smoothness. Secondly, we look at the usability of models fitted for one lead time and employed at another to then thirdly fit models that are “lead time continuous”, meaning they work for multiple lead times simultaneously by including lead time as a covariate using spline regression. We show that these models can achieve similar performance to the classical “lead time separated” models, whilst saving substantial computation time. Fourthly and finally we make first steps towards the development of a cheap computational model including seasonality and working continuously over the lead time, needing to be fit only once.

How to cite: Wessel, J., Ferro, C., and Kwasniok, F.: Lead time continuous statistical post-processing of ensemble weather forecasts, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14425, https://doi.org/10.5194/egusphere-egu23-14425, 2023.

X4.96
|
EGU23-8594
|
NP5.1
|
ECS
|
David Huk, Rilwan Adewoyin, and Ritabrata Dutta

This work develops a novel method for generating conditional probabilistic rainfall forecasts with temporal and spatial dependence. A two-step procedure is employed. Firstly, marginal location-specific distributions are modelled independently of one another. Secondly, a spatial dependency structure is learned in order to make these marginal distributions spatially coherent.
To learn marginal distributions over rainfall values, we propose a class of models termed Joint Generalised Neural Models (JGNMs). These models expand the linear part of generalised linear models with a deep neural network allowing them to take into account non-linear trends of the data while learning the parameters for a distribution over the outcome space.
In order to understand the spatial dependency structure of the data, a model based on censored copulas is presented. It is designed for the particularities of rainfall data and incorporates the spatial aspect into our approach. Uniting our two contributions, namely the JGNM and the Censored Spatial Copulas into a single model, we get a probabilistic model capable of generating possible scenarios on short to long-term timescales, able to be evaluated at any given location, seen or unseen. We show an application of it to a precipitation downscaling problem on a large UK rainfall dataset and compare it to existing methods.

How to cite: Huk, D., Adewoyin, R., and Dutta, R.: Joint Generalized Neural Models and Censored Spatial Copulas for Probabilistic Rainfall Forecasting, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8594, https://doi.org/10.5194/egusphere-egu23-8594, 2023.

X4.97
|
EGU23-12232
|
NP5.1
|
ECS
|
Faranak Tootoonchi, Andrijana Todorović, Thomas Grabs, and Claudia Teutschbein

Climate models are used to generate future hydroclimatic projections for exploring how climate change may affect water resources. Their outputs, however, feature systematic errors due to parametrization and simplification of processes at the spatiotemporal scales required for impact studies. To minimize the adverse effects of such biases, an additional bias adjustment step is typically required.

Over the past decade, adjustment methods with different levels of complexity have been developed that consider one or several variables at a time, consequently adjusting one or multiple features of climate model simulations. Despite attempts in developing such methods and the growing use of some, the selection of methods for accurate simulation of streamflow remains subjective and still highly debated. In this study, we seek to answer whether sophisticated multivariate bias adjustment methods outperform simple univariate methods in the simulation of streamflow signatures.

To this end, we systematically investigated the ability of two simple univariate and two advanced multivariate methods to accurately represent various hydrological signatures relevant for water resources management in high latitudes. We offer practical guidelines for choosing the most suitable bias adjustment methods based on the objective of each study (i.e., hydrologic signatures of interest) and the hydroclimatic regime of the study catchments.

How to cite: Tootoonchi, F., Todorović, A., Grabs, T., and Teutschbein, C.: Impacts of uni- and multivariate bias adjustment methods on simulations of hydrological signatures in high latitude catchments, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12232, https://doi.org/10.5194/egusphere-egu23-12232, 2023.

X4.98
|
EGU23-15152
|
NP5.1
|
ECS
Francesco Zanetta, Daniele Nerini, Matteo Buzzi, and Mark A. Liniger

Correctly representing surface wind is critical for applications such as renewable energy, snow modelling or warning systems. However, numerical weather prediction models with their limited resolution cannot fully represent the strong variability due to complex topography. Downscaling techniques – functionally equivalent to postprocessing when the ground truth is given by observational data - can achieve remarkable results in reducing systematic biases of raw models and can be calibrated to yield accurate probabilistic information at any point in space. 

These techniques can be further improved at analysis time by including real-time measurements, allowing to produce a probabilistic sub-grid resolution analysis of surface wind. Such a product would enable other interesting applications, such as detailed climatologies or nowcasting, and could serve as a ground truth for training deep learning-based postprocessing models with generative approaches, allowing to model spatially and temporally consistent ensembles.  

The first important challenge is to integrate measurements in a statistically optimized and efficient way. Here, we share our ongoing work and preliminary results in a comparative analysis of different approaches, from naïve interpolations to geostatistical techniques or novel approaches based on neural networks. The analysis is based on a multi-year archive of hourly wind observations and NWP analyses from the operational COSMO-1E model over Switzerland. 

How to cite: Zanetta, F., Nerini, D., Buzzi, M., and Liniger, M. A.: Towards sub-kilometer resolution probabilistic analysis of surface wind in complex terrain, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15152, https://doi.org/10.5194/egusphere-egu23-15152, 2023.

X4.99
|
EGU23-2628
|
NP5.1
|
ECS
Sameer Balaji Uttarwar, Anna Napoli, Diego Avesani, and Bruno Majone

Global seasonal weather forecasts have inherent biases compared to observational datasets over mountainous regions. This can be attributed to the model's inaccurate representation of local and global environmental processes on the Earth. In this context, the objective of this study is to assess the variation of seasonal weather forecast biases with respect to static and dynamic environmental variables over the Trentino-South Tyrol region (north-eastern Italian Alps), characterized by complex terrain.

The research employs the latest fifth-generation seasonal weather forecast system (SEAS5) dataset produced by the European Center for Medium-Range Weather Forecast (ECMWF), available at a horizontal grid resolution of 0.125° x 0.125° with 25 ensemble members in a re-forecast period from 1981 to 2016. The reference dataset is a high-resolution gridded observation (250 m x 250 m) over the region of interest. The spatiotemporal variation of monthly weather (i.e., precipitation and temperature) forecast biases over the region is inferred using several statistical indicators at observational dataset grid resolution. The static and dynamic environmental variables (i.e., respectively, terrain characteristics and large-scale atmospheric circulation indices) are used univariately to interpret their relationship with monthly weather forecast biases using the linear regression technique. A statistically significant linear relation between monthly weather forecast biases and terrain characteristics, as well as large-scale atmospheric circulation indices, has been found depending on seasonality and ensemble members.

Given significant univariate linear correlation, a simple linear bias reduction model is developed and assessed by implementing a random subsampling technique in which the regression parameters are simulated by splitting the data into calibration (70%) and validation (30%). The results reveal a reduction in the monthly weather forecast bias over the region.

This study demonstrates that the local and global environmental variables should be explicitly considered in the bias correction and downscaling of the seasonal weather forecasts over complex terrain.

How to cite: Uttarwar, S. B., Napoli, A., Avesani, D., and Majone, B.: Seasonal Weather Forecast Biases Dependence on Static and Dynamic Environmental Variables in the Alpine Region, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2628, https://doi.org/10.5194/egusphere-egu23-2628, 2023.

X4.100
|
EGU23-14560
|
NP5.1
|
ECS
Eva van der Kooij, Antonello Squintu, Kirien Whan, and Maurice Schmeits

Ensemble forecasts are important due to their ability to characterize forecast uncertainty, which is fundamental when forecasting extreme weather. Ensemble forecasts are however often biased and underdispersed and thus need to be post-processed.

A common approach for this is the use of ensemble model output statistics (EMOS), where a parametric distribution is fitted with a limited number of predictors. With recent advances in computer science and increased amounts of data available, machine learning techniques, like random forests, are becoming more popular for high dimensional regression problems. In this research, we explore the use of the quantile regression forest (QRF), a random forest adapted for conditional quantile estimation, applied to medium range gridded probabilistic precipitation forecasts. QRFs are non-parametric and allow for a larger number of predictors, which means they can possibly consider more dependencies that might otherwise not be captured with a simple EMOS.

A QRF takes several hyperparameters that influence the way the decision trees in the forest are constructed. We explore the minimum number of samples needed in a leaf to split it (minimum node size) and the number of predictors explored in each split (mtry). A hyperparameter space is constructed by setting ranges for both minimum node size and mtry, and the optimal hyperparameter set is determined by performing a cross validated grid search. Here, each model is assessed based on the continuous ranked probability skill score (CRPSS). For comparison, EMOS is applied with a zero-adjusted gamma (ZAGA) distribution, using a limited number of predictors that are physically correlated to precipitation. Both methods are verified on a separate testing data set and evaluated using several scores, including CRPSS and Brier skills score (BSS).

We consider 4 years (November 2018 – October 2022) of archived operational ECMWF-IFS ensemble forecasts for the Netherlands. The data is split into November 2018 – October 2021 for training and cross-validation, and October 2021 – October 2022 for testing, separating data for season, initialization time and lead-time. Forecasts are post-processed up to +10 days. Ensemble statistics on 60+ forecast variables are used as predictors. Spatially and temporally aggregated, gauge-adjusted radar observations are used as predictand. The raw ensemble is considered as the benchmark.

The results of this research will determine what method will be used to post-process the ensemble precipitation forecasts in the context of the early warning center (EWC) of the Royal Netherlands Meteorological Institute. The most suitable method could differ between shorter and longer lead times.

How to cite: van der Kooij, E., Squintu, A., Whan, K., and Schmeits, M.: Quantile regression forests for post-processing ECWMF ensemble precipitation forecasts: hyperparameter optimization and comparison to EMOS, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14560, https://doi.org/10.5194/egusphere-egu23-14560, 2023.

X4.101
|
EGU23-5821
|
NP5.1
|
ECS
|
Lucas Schmutz, Soulivanh Thao, Mathieu Vrac, and Gregoire Mariethoz

General circulation models (GCMs) are of extreme importance to making future climate projections. Those are used extensively by policymakers to manage responses to anthropogenic global warming and climate change.

To extract a robust global signal and evaluate uncertainties, individual models are often assembled in Multi-Model Ensembles (MMEs). Various approaches to combine individual models have been developed, such as the Multi-Model Mean (MMM) or its weighted variants.

Recently, Thao et al. (2022) proposed a model comparison approach based on graph cuts. Graph cut optimization was developed in the field of computer vision to efficiently approximate a solution for low-level computer vision tasks such as image segmentation (Boykov et al., 2001). Applied to MMEs, it allows selecting for each gridpoint the best-performing model and produces a patchwork of models that maximizes performances while avoiding spatial discontinuities. Thus, it considers the local performance of individual models in contrast with approaches such as MMM or similar methods that use global weights.

Here we propose a new multivariate combination approach of MMEs based on graph cuts. Compared to the existing univariate method, our approach ensures that the relationships between variables, that are present in GCMs, are locally preserved while providing coherent spatial fields. Moreover, we measure the local performance of models using the Hellinger distance between multi-decadal distributions. This allows a combination of models that is not only indicative of the average behavior (e.g. mean temperature or mean precipitation) but of the entire multivariate distribution, including more extreme values that have a high societal and environmental impact.

REFERENCES 

Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239. https://doi.org/10.1109/34.969114

Thao, S., Garvik, M., Mariethoz, G., & Vrac, M. (2022). Combining global climate models using graph cuts. Climate Dynamics, February. https://doi.org/10.1007/s00382-022-06213-4

How to cite: Schmutz, L., Thao, S., Vrac, M., and Mariethoz, G.: A multivariate approach to combine general circulation models using graph cuts, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5821, https://doi.org/10.5194/egusphere-egu23-5821, 2023.

X4.102
|
EGU23-13327
|
NP5.1
|
ECS
Luca Monaco, Roberto Cremonini, and Francesco Laio

Direct model output forecasts by Numerical Weather Prediction models (NWPs) present some limitations caused by errors mostly due to sensitivity to initial conditions, sensitivity to boundary conditions and deficiencies in parametrization schemes (i.e. orography).
These sources of error are unavoidable, and atmospheric chaotic dynamics make prediction errors spread rapidly in time in the course of the forecast, inducing both systematic and random errors.
Nonetheless, in the last 50 years, NWPs had a significant decrease in the impact of these sources of errors, even in the long-term forecast, thanks for instance to an ever-increasing computational capability, but their relevance is still not neglectable.
Moreover, different NWPs present specific different pros and cons which are findable empirically. For instance, in the case of precipitation forecast in north-west Italy, low-resolution models (e.g. ECMWF-IFS) are more reliable in terms of space and time in predicting the average precipitation, while high-resolution models (e.g. COSMO-2I) tend to forecast better the maximum precipitation. Research purposes apart, actual limitations must be seen in an operational context, where weather forecasts’ skillfulness and associated uncertainty are information of the utmost importance to the forecaster and in general to the user of a certain forecasts system.

To tackle these limitations of NWPs and the need for an uncertainty-quantified meteorological forecast, we propose a machine learning-based multimodel post-processing technique for precipitation forecast. We focus on precipitation since it is the most important variable in the issue of spatially localized weather alert notice by the Italian Civil Protection system and at the same time it is one of the most challenging variables to forecast. 
We use a Convolutional Neural Network (CNN) to obtain deterministic and probabilistic forecast grids over 24h up to 48h focusing on North-West Italy, using several high and low-resolution deterministic NWPs as input and using high-resolution rain-gauge corrected radar observations for the training. The effect of the usage of different convolutional parameters (e.g. stride, padding) is taken into account. The deterministic output grid is chosen as the grid with the lowest mean square error obtained during the training, and it is compared with the linear regression of the input NWPs and with every single model. The probabilistic output grid is generated by considering the statistical ensemble of the twenty grids with the lowest mean square error obtained during the training, and it is compared with the logistic regression of the input NWPs and with ECMWF-EPS as a benchmark, both at different precipitation thresholds.

How to cite: Monaco, L., Cremonini, R., and Laio, F.: Towards a machine learning based multimodel for precipitation forecast over the italian peninsula, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13327, https://doi.org/10.5194/egusphere-egu23-13327, 2023.

Posters virtual: Tue, 25 Apr, 14:00–15:45 | vHall ESSI/GI/NP

Chairperson: Maxime Taillardat
Forecast post-processing
vEGN.5
|
EGU23-2902
|
NP5.1
|
ECS
|
David Jobst, Annette Möller, and Jürgen Groß

Statistical postprocessing of ensemble forecasts has become a common practice in research to correct biases and errors in calibration. Meanwhile, machine learning methods such as quantile regression forests or neural networks are often suggested as promising candidates in literature. However, interpretation of these methods is not always straightforward. 
Therefore, we propose the D-vine (drawable-vine) copula based postprocessing, where for the construction of a multivariate conditional copula the graphical D-vine model serves as building plan. The conditional copula is based on this tracetable model using bivariate copulas, which allow to describe linear as well as non-linear relationships between the response variable and its covariates. Additionally, our highly data-driven model selects the covariates based on their predictive strength and thus provides a natural variable selection mechanism, facilitating interpretability of the model. Finally, (non-crossing) quantiles from the obtained conditional distribution can be utilized as postprocessed ensemble forecasts. 
In a case study for the postprocessing of 10 m surface wind speed ensemble forecasts with 24 hour lead time we compare local and global D-vine copula based models to the zero-truncated ensemble model output statistics (tEMOS) for different sets of predictor variables at 60 surface weather stations in Germany. Furthermore, we investigate different types of training periods for both methods. We observe that the D-vine based postprocessing yields a comparable performance with respect to tEMOS models if wind speed ensemble variables are included only and a significant improvement if additional meteorological and station specific weather variables are integrated. The case study indicates that training periods capturing seasonal patterns are performing best for both models. Additionally, we provide a criterion for calculating the variable importance in D-vine copulas and utilize it to outline which predictor variables are the most important for the correction of 10 m surface wind speed ensemble forecasts.

How to cite: Jobst, D., Möller, A., and Groß, J.: D-Vine Copula based Postprocessing of Wind Speed Ensemble Forecasts, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-2902, https://doi.org/10.5194/egusphere-egu23-2902, 2023.

vEGN.6
|
EGU23-1365
|
NP5.1
|
Bobby Antonio, Andrew McRae, Dave MacLeod, Fenwick Cooper, John Marsham, Laurence Aitchison, Tim Palmer, and Peter Watson

Existing weather models are known to have poor skill over Africa, where there are regular threats of drought and floods that present significant risks to people's lives and livelihoods. Improved precipitation forecasts could help mitigate the negative effects of these extreme weather events, as well as providing significant financial benefits to the region. Building on work that successfully applied a state-of-the-art machine learning method (a conditional Generative Adversarial Network, cGAN) to postprocess precipitation forecasts in the UK, we present a novel way to improve precipitation forecasts in East Africa. We address the challenge of realistically representing tropical convective rainfall in this region, which is poorly simulated in conventional forecast models. We use a cGAN to postprocess ECMWF high resolution forecasts at 0.1 degree resolution and 6-18h lead times, using the iMERG dataset as ground truth, and investigate how well this model can correct bias, produce reliable probability distributions and create samples of rainfall with realistic spatial structure. We will also present performance in extreme rainfall events. This has the potential to enable cost effective improvements to early warning systems in the affected areas.

How to cite: Antonio, B., McRae, A., MacLeod, D., Cooper, F., Marsham, J., Aitchison, L., Palmer, T., and Watson, P.: Improving post-processing of East African precipitation forecasts using a generative machine learning model, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1365, https://doi.org/10.5194/egusphere-egu23-1365, 2023.