Advances in diagnostics, sensitivity, uncertainty analysis, and hypothesis testing of Earth and environmental systems models

Proper characterization of uncertainty remains a major challenge and is inherent to many aspects of modelling such as structural development, hypothesis testing and parameter estimation, and the adequate characterization of parameters, forcing data and initial and boundary conditions. To address this challenge, useful methods are uncertainty analysis, sensitivity analysis and inversion (calibration), either in Bayesian, geostatistical or conventional manners.

This session invites contributions that discuss advances, both in theory and/or application, in methods for SA/UA and inversion applicable to all Earth and Environmental Systems Models (EESMs). This includes all areas of hydrology, such as classical hydrology, subsurface hydrology and soil science.

Topics of interest include (but are not limited to):
1) Novel methods for effective characterization of sensitivity and uncertainty,
2) Novel approaches for parameter estimation, data inversion and data assimilation,
3) Novel methods for spatial and temporal evaluation/analysis of models,
4) Single- versus multi-criteria SA/UA/inversion,
5) The role of data information and error on SA/UA (e.g., input/output error, model structure error, worth of data etc.), and
6) Improving the computational efficiency of SA/UA/inversion (efficient sampling, surrogate modelling, parallel computing, model pre-emption, etc.).

Contributions addressing any or all aspects of sensitivity/uncertainty, including those related to structural development, hypothesis testing, parameter estimation, data assimilation, forcing data, and initial and boundary conditions are invited. We also invite instances of the above research questions applied to scientifically built machine-learning models.

Co-organized by NP5
Convener: Juliane Mai | Co-conveners: Hoshin Gupta, Anneli Guthke, Wolfgang Nowak, Cristina PrietoECSECS, Saman Razavi, Thomas Wöhling
vPICO presentations
| Wed, 28 Apr, 15:30–17:00 (CEST)

Session assets

Session materials

vPICO presentations: Wed, 28 Apr

Chairpersons: Juliane Mai, Wolfgang Nowak, Cristina Prieto
Model calibration and inverse modeling
William Farmer, Ghazal Shabestanipour, Jonathan Lamontagne, and Richard Vogel

There is an increasing need to develop stochastic watershed models using post-processing methods to generate stochastic streamflow ensembles from deterministic watershed models (DWMs).  Stochastic streamflow ensembles are needed for a wide variety of water resource planning applications relating to both short-term forecasting and long-range simulation. Current methods often involve post-processing of ordinary, differenced residuals defined as the difference between the simulations (S) and observations (O). However, ordinary, differenced residuals from daily and sub-daily DWMs exhibit a high degree of non-normality, heteroscedasticity, and stochastic persistence leading to the need for extremely complex post-processing methods. Using deterministic simulations of daily streamflow at over 1,400 sites across the United States, we document that logarithmically transformed ratio residuals – defined as the natural log of the quotient of S divided by O –  are approximately homoscedastic, are approximately normally distributed, and can be well-represented as an autoregressive process. These characteristics make them preferable to ordinary, differenced residuals for post-processing DWMs. Though issues with seasonal fluctuation and long-term persistence are not fully resolved, this simple transformation addresses much of the stochastic complexity of the residuals from a deterministic watershed model and produces streamflow ensemble simulations that more accurately replicate essential elements of the statistical distributions of streamflow (including design events, higher-order moments and extreme values). The use of this transformation and autoregressive models demonstrates that more accurate stochastic modeling of natural resources phenomena can be achieved with relatively elegant solutions to support natural resource management in the past, present and future.

How to cite: Farmer, W., Shabestanipour, G., Lamontagne, J., and Vogel, R.: Stochastic watershed models using a logarithmic transformation of ratio residuals, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-517, https://doi.org/10.5194/egusphere-egu21-517, 2021.

Lele Shu, Hao Chen, and Xianhong Meng

The hydrologic model is ideal for experimenting and understanding the water movement and storage in a watershed from the upper mountain to the river outlet. Nevertheless, the model's performance, suitability, and data availability are the primary challenge for a modeler. This study introduces the Simulator for Hydrologic Unstructured Domains (SHUD), a surface-subsurface integrated hydrological model using the semi-discrete Finite Volume Method. Though the SHUD applies a fine time-step (in minutes) and flexible spatial domain decomposition (m to km) to simulate the fully coupled surface-subsurface hydrology, the model can solve the watershed-scale problem efficiently and dependably. Plenty of applications in the USA proved the SHUD model's performance and suitability in the humid and data-rich watersheds.  

In this research, we demonstrate the SHUD model deployment in two data-scarce watersheds in the northwest of China with global datasets, validate the simulations against local observational data, and assess the SHUD model's efficiency and suitability.  The one is the Upstream Heihe River (UHR), which is a typical semi-arid mountainous watershed.  The other is Yellow River Source (YRS), the upstream of Yellow River, contributing more than 50% of total discharge. The results, figures, and analysis based on SHUD simulations under global datasets highlight the model's suitability and efficiency in data-limited watersheds, even ungaged ones. The SHUD model is a useful modeling platform for hydrology and water-related coupling studies.

How to cite: Shu, L., Chen, H., and Meng, X.: Deployment, calibration, and efficiency of SHUD model in cold and arid watersheds, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7949, https://doi.org/10.5194/egusphere-egu21-7949, 2021.

Xinyu Li, Prajna Kasargodu Anebgailu, and Jörg Dietrich

The calibration of hydrological models using bio-inspired meta-heuristic optimization techniques has been extensively tested to find the optimal parameters for hydrological models. Shuffled frog-leaping algorithm (SFLA) is a population-based cooperative search technique containing virtual interactive frogs distributed into multiple memeplexes. The frogs search locally in each memeplex and are periodically shuffled into new memeplexes to ensure global exploration. Though it is developed for discrete optimization, it can be used to solve multi-objective combinatorial optimization problems as well.

In this study, a hydrological catchment model, Hydrological Predictions for the Environment (HYPE) is calibrated for streamflow and nitrate concentration in the catchment using SFLA. HYPE is a semi-distributed watershed model that simulates runoff and other hydrological processes based on physical as well as conceptual laws. SFLA with 200 runtimes and 5 memeplexes containing 10 frogs each is used to calibrate 22 model parameters. It is compared with manual calibration and Differential Evolution Markov Chain (DEMC) method from the HYPE-tool. The preliminary results of the statistical performance measures for streamflow calibration show that SFLA has the fastest convergence speed and higher stability when compared with the DEMC method. NSE of 0.68 and PBIAS of 7.72 are recorded for the best run of SFLA during the calibration of streamflow. In comparison, the HYPE-tool DEMC produced the best NSE of 0.45 and a PBIAS of -3.37 while the manual calibration resulted in NSE of 0.64 and PBIAS of 2.01.

How to cite: Li, X., Kasargodu Anebgailu, P., and Dietrich, J.: Multi-objective parameter optimization of the HYPE model using shuffled frog-leaping algorithm, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8244, https://doi.org/10.5194/egusphere-egu21-8244, 2021.

Chengcheng Gong, Wenke Wang, Zaiyong Zhang, Harrie-Jan Hendricks Franssen, Fabien Cochand, and Philip Brunner

Bare soil evaporation is a key component of the soil water balance. Accurate estimation of evaporation is thus critical for sustainable water resources management, especially in arid and semi-arid regions. Numerical models are widely used for estimating bare soil evaporation. Although models allow exploring evaporation dynamics under different hydrological and climatic conditions, their robustness is linked to the reliability of the imposed parameters. These parameters are typically obtained through model calibration. Even if a perfect match between observed and simulated variables is obtained, the predictions are not necessarily reliable. This can be related to model structural errors, or because the inverse problem is ill-posed. While this is conceptually very well known, it remains unclear how the temporal resolution and length of the employed observations for the calibration influence the reliability of the parameters and the predictions.

We used data from a lysimeter experiment in the Guanzhong Basin, China to systematically explore the influence of the calibration period length on the calibrated parameters and uncertainty of evaporation predictions. Soil water content dynamics and water level were monitored every 5 minutes. We set up twelve models using the fully coupled, physically-based HydroGeoSphere model with different calibration period lengths (one month, three months, six months, fourteen months). The estimated evaporation rates by the models for the calibration period and validation period were compared with the measured evaporation rates. Also, we predict cumulative, one-year evaporation rates. The uncertainty of the predictive evaporation by these models from different calibration lengths is quantified. Several key conclusions can be drawn as follows: (1) The extinction depth is a very important parameter for the soil water content dynamics in the vadose zone but is poorly informed unless the calibration includes significantly different depths to groundwater. (2) Using the longer calibration period length (six months or fourteen months) did not necessarily result in more reliable predictions of evaporation rates. (3) Preliminary results indicate that the uncertainty can be reduced if the calibration period includes both climatic forcing similar to the prediction, but additionally also feature similar water table conditions during calibration and prediction. Our results have implications for reducing uncertainty using unsaturated-saturated models to predict evaporation.

How to cite: Gong, C., Wang, W., Zhang, Z., Hendricks Franssen, H.-J., Cochand, F., and Brunner, P.: Influence of calibration period length on predictions of evaporation , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8986, https://doi.org/10.5194/egusphere-egu21-8986, 2021.

Wendy Sharples, Andrew Frost, Ulrike Bende-Michl, Ashkan Shokri, Louise Wilson, and Elisabeth Vogel

Ensuring future water security in a changing climate is becoming a top priority for Australia, which is already dealing with the ongoing socio-economic and environmental impacts from record-breaking bushfires, infrastructure damage from recent flash flooding events, and the prospect of continuing compromised water sources in both regional towns and large cities into the future. In response to these significant impacts the Australian Bureau of Meteorology is providing a hydrological projections service, using their national operational hydrological model (The Australian Water Resources Assessment model: AWRA-L, www.bom.gov.au/water/landscape), to project future hydrological fluxes and states using downscaled meteorological inputs from an ensemble of curated global climate models and emissions scenarios at a resolution of 5km out to the end of this century.

Continental model calibration using a long record of Australian observational data has been employed across components of the water balance, to tune the model parameters to Australia's varied hydro-climate, thereby reducing uncertainty associated with inputs and hydrological model structure. This approach has been shown to improve the accuracy of simulated hydrological fields, and the skill of short term and seasonal forecasts. However, in order to improve model performance and stability for use in hydrological projections, it is desirable to choose a model parameterization which produces reasonable hydrological responses under conditions of climate variability as well as under historical conditions. To this end we have developed a two-stage approach: Firstly, a variance based sensitivity analysis for water balance components (e.g. ephemeral flow, average to high flow, recharge, soil moisture and evapotranspiration) is performed, to rank the most influential parameters affecting water balance components. Parameters which are insensitive across components are then fixed to a previously optimized value, decreasing the number of calibratable parameters in order to decrease dimensionality and uncertainty in the calibration process. Secondly, a model configured with reduced calibratable parameters is put through a multi-objective evolutionary algorithm (Borg MOEA, www.borgmoea.org), to capture the tradeoffs between the water balance component performance objectives under climate variable conditions (e.g. wet, dry and historical) and across climate regions derived from the natural resource management model (https://nrmregionsaustralia.com.au/).

The decreased dimensionality is shown to improve the stability and robustness of the existing calibration routine (shuffled complex evolution) as well as the multi-objective routine. Upon examination of the tradeoffs between the water balance component objective functions and in-situ validation data under historical, wet and dry periods and across different Australian climate regions, we show there is no one size fits all parameter set continentally, and thus some concessions need to be made in choosing a suitable model parameterization. However, future work could include developing a set of parameters which suit specific regions or climate conditions in Australia. The approach outlined in this study could be employed to improve confidence in any hydrological model used to simulate the future impacts of climate change. 

How to cite: Sharples, W., Frost, A., Bende-Michl, U., Shokri, A., Wilson, L., and Vogel, E.: Improving continental scale hydrological model performance and stability under variable climate conditions in order to improve the assessment of future water resources, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9284, https://doi.org/10.5194/egusphere-egu21-9284, 2021.

Samy Chelil, Hind Oubanas, Hocine Henine, Igor Gejadze, Pierre Olivier Malaterre, and Julien Tournebize

The application of inverse modeling approaches has been expanded to the field of hydrology this last decade. Here, the inverse modeling has been used to adjust the input parameters of a new agricultural subsurface drainage model (SIDRA-RU) using observations of the model output. SIDRA-RU is a semi-conceptual and semi-analytical model that transforms the rainfall into a daily drainage discharge. The model is divided into two modules. The first one consists of a conceptual reservoir that converts the net rainfall into recharge; the second module simulates the drainage discharge and the water table level above the mid-drains, based on the resolution of the Boussinesq equation.

The adjoint model of SIDRA-RU has been successfully generated by means of the automatic differentiation tool (TAPENADE). First, this adjoint model is used to explore the local and global adjoint sensitivities of the valuable function defined over the drainage discharge simulations (model output), with respect to the model input parameters. Next, the most influential parameters are estimated using both the classical calibration algorithm (PAP-GR) and the variational data assimilation method (4D-VAR). For the latter method, a simple stochastic procedure has been proposed to avoid trapping the minimization process in the local minimum points.

Our results have shown that the quality of the drainage discharge simulations obtained using the 4DVAR method is better than the ones performed by the PAP-GR calibration algorithm, in terms of the water balance in particular. Indeed, less than 5 mm of the cumulative discrepancy was registered between simulated and observed water volume based on the five-year daily drainage discharge data of the Chantemerle agricultural field. However, some numerical tests, conducted to investigate the convergence of the variational calibration method, indicate the potential presence of the equifinality issues. This could be highlighted by the self-compensation of the physical soil parameters (Ksat and µ) and those managing the conceptual SIDRA-RU reservoir (Sinter and SSDI). The performed sensitivity analysis has shown that the parameters having the most impact on the drainage discharge are those controlling the nervousness and recession of the water level in soils followed by those managing the start of the drainage season.

How to cite: Chelil, S., Oubanas, H., Henine, H., Gejadze, I., Malaterre, P. O., and Tournebize, J.: On the use of inverse modeling to improve subsurface drainage simulations, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14768, https://doi.org/10.5194/egusphere-egu21-14768, 2021.

Wenting Li, Chunhua Yang, Jie Han, Fengxue Zhang, Lijuan Lan, and Yonggang Li

Municipal wastewater treatment plants (WWTPs) reuse domestic sewage, industrial wastewater, and rainfall runoff to realize sustainable utilization of fresh water resources. In order to guarantee the safety, reliability, and profitability of the WWTP, efficient process monitoring and control is becoming increasingly important. However, due to the economic and technical requirements, it is infeasible to place sensors at every process parameter location. Therefore, it is necessary to design the optimal sensor placement scheme which leads to maximum information gain about the plant conditions. Practical issues present in the WWTP, such as harsh physical conditions, fluctuation of water quantity, and variability in process parameters, make the optimal sensor placement problem an especially complicated one. Furthermore, sensors placement problem contains multiple objectives with complex nonlinear relationship. This study focuses on obtaining the optimal flow sensor placement scheme of the WWTP in terms of cost, information richness and redundancy. First, based on the graph theory and structural observability and redundancy criteria, a WWTP system model is constructed. Next, an industrial condition weighting factor setting strategy is introduced to measure the importance of the variables in different processing units, transforming the optimal flow sensor placement problem in the whole process into a discrete multi-objective optimization problem. Then, a novel metaheuristic method named discrete multi-objective state transition algorithm (DMOSTA) is proposed to obtain optimal trade-off solution set. Finally, an evaluation strategy is applied to select the best flow sensor placement scheme from the solution set. The proposed method is applied to three WWTPs with different dimensions. Comparative results show that the optimal flow sensor placement scheme based on the proposed method has the best comprehensive performance in regard to senor cost, process variable observability, sensor redundancy, and computational cost.

How to cite: Li, W., Yang, C., Han, J., Zhang, F., Lan, L., and Li, Y.: Optimal sensor placement method for wastewater treatment plants based on discrete multi-objective state transition algorithm, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15061, https://doi.org/10.5194/egusphere-egu21-15061, 2021.

Bayesian approaches and inference
Michelle Viswanathan, Tobias KD Weber, Andreas Scheidegger, and Thilo Streck

Crop models are used to evaluate the impact of climate change on food security by simulating plant phenology, yield, biomass and leaf area index. Plant phenology defines the timing of crucial growth stages and physiological processes that influence organ appearance and assimilate partitioning. It is governed by environmental factors such as solar radiation, temperature and water availability. Plant phenology is not only specific for the crop species, but also depends on the cultivar. Additionally, growth of a cultivar could vary depending on the environment. Common crop models cannot fully capture the influence of the environment on phenology, resulting in cultivar-specific parameters that are environment-dependent. These parameter estimates may be unreliable in case of limited data. Moreover, crucial species-specific information is ignored. On the other hand, in large regional-scale models covering multiple cultivars and environments, information about the cultivars grown is generally not available. In this case, a shared set of parameters for the crop species would suppress within-species differences leading to unreliable predictions.

A Bayesian hierarchical framework enables us to alleviate these problems by honouring the multi-level data structure. Additionally, we can reflect the uncertainty from different sources, for example, model inputs and measurements. In this study we implement a Bayesian hierarchical framework to estimate parameters of the Soil-Plant-Atmosphere System Simulation (SPASS) model for simulating phenological development of different cultivars of silage maize grown over all the contrasting climatological regions of Germany.

We used data from the German weather service on the phenological development stages of silage maize grown across Germany between 2009 and 2019. During this period, silage maize was grown in over 3000 unique location-year combinations. Maize crops were differentiated into early, mid-early, mid-late and late ripening groups and were further classified into cultivars within each ripening group. Within the hierarchical framework, we estimate maize species-specific parameters as well as parameters per ripening group and cultivar, through Bayesian model calibration. We analyse the influence of environmental conditions on parameter estimates, to further develop the hierarchical structure. We perform cross-validation to assess the prediction quality of the parameterized model.

With this approach, we show that robust parameter estimates account for differences between cultivars, ripening groups as well as different environmental conditions. The parameterized model can be used for large-scale phenology predictions of silage maize grown across Germany. These parameter estimates may perform better than independent species- or cultivar-specific estimates, in predicting phenology of future cultivars where specific cultivar characteristics are not known.

How to cite: Viswanathan, M., Weber, T. K., Scheidegger, A., and Streck, T.: A Bayesian hierarchical approach to improve model parameter estimates and predictions of silage maize phenology in Germany, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7962, https://doi.org/10.5194/egusphere-egu21-7962, 2021.

Sebastian Reuschen, Teng Xu, Fabian Jobst, and Wolfgang Nowak

Geostatistical inference (or inversion) methods are commonly used to estimate the spatial distribution of heterogeneous soil properties (e.g., hydraulic conductivity) from indirect measurements (e.g., piezometric heads). One approach is to use Bayesian inversion to combine prior assumptions (prior models) with indirect measurements to predict soil parameters and their uncertainty, which can be expressed in form of a posterior parameter distribution. This approach is mathematically rigorous and elegant, but has a disadvantage. In realistic settings, analytical solutions do not exist, and numerical evaluation via Markov chain Monte Carlo (MCMC) methods can become computationally prohibitive. Especially when treating spatially distributed parameters for heterogeneous materials, constructing efficient MCMC methods is a major challenge.

Here, we present two novel MCMC methods that extend and combine existing MCMC algorithms to speed up convergence for spatial parameter fields. First, we present the sequential pCN-MCMC, which is a combination of the sequential Gibbs sampler, and the pCN-MCMC. This sequential pCN-MCMC is more efficient (faster convergence) than existing methods. It can be used for Bayesian inversion of multi-Gaussian prior models, often used in single-facies systems. Second, we present the parallel-tempering sequential Gibbs MCMC. This MCMC variant enables realistic inversion of multi-facies systems. By this, we mean systems with several facies in which we model the spatial position of facies (via training images and multiple point geostatistics) and the internal heterogeneity per facies (via multi-Gaussian fields). The proposed MCMC version is the first efficient method to find the posterior parameter distribution for such multi-facies systems with internal heterogeneities.

We demonstrate the applicability and efficiency of the two proposed methods on hydro-geological synthetic test problems and show that they outperform existing state of the art MCMC methods. With the two proposed MCMCs, we enable modellers to perform (1) faster Bayesian inversion of multi-Gaussian random fields for single-facies systems and (2) Bayesian inversion of more realistic fields for multi-facies systems with internal heterogeneity at affordable computational effort.

How to cite: Reuschen, S., Xu, T., Jobst, F., and Nowak, W.: Novel MCMC methods for Bayesian inference of spatial parameter fields, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2560, https://doi.org/10.5194/egusphere-egu21-2560, 2021.

Peter Walter, Ishani Banerjee, Anneli Guthke, Kevin Mumford, and Wolfgang Nowak

Bayesian model selection (BMS) can be used to objectively rank competing models of different structure and with different parameters upon comparison with validation data sets. This technique requires the evaluation of Bayesian Model Evidence (BME). BME is the likelihood of the data to occur under the assumed models, where the likelihood is averaged over the probability distribution of the model and its parameters.

Exact and fast analytical solutions for BME exist only with strong assumptions. For that reason, other techniques and approximations for BMS/BME have been developed. While mathematical approximations via information criteria may suffer from strong biases in real-world applications, numerical methods do not rely on any assumptions but require high computational effort. This becomes prohibitive if the data set is very large, e.g. highly resolved in space and time.

To still enable the use of BME as a probabilistic and rigorous model performance metric, we have developed the “Method of Forced Probabilities”: this method is a fast way to numerically compute BME for models that predict time series and fulfill the Markov Chain property in time. The core idea is to swap the direction of evaluation: instead of comparing thousands of forward runs of the model with the observed data (many model runs on random parameter realizations), we force the model to follow the data during each time step and record the individual probabilities of the model performing these exact transitions (single evaluation).

As a test case for demonstration, we use invasion percolation (IP) models to simulate multiphase flow in porous media. The underlying, highly resolved data set was obtained from an experiment of a slow gas injection into water-saturated, homogeneous sand in a 25cmx25cm acrylic glass cell. Images were obtained at a rate of 30 images per second using the light transmission technique. Since IP models fulfill the Markov chain property, the Method of Forced Probabilities can be applied to evaluate their BME. Results confirm that the proposed method presents a scalable, inexpensive alternative to standard Monte Carlo methods for analyzing the model-data mismatch.

How to cite: Walter, P., Banerjee, I., Guthke, A., Mumford, K., and Nowak, W.: The Method of Forced Probabilities: a Computation Trick for Bayesian Model Evidence, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8381, https://doi.org/10.5194/egusphere-egu21-8381, 2021.

Uncertainty quantification and sensitivity analyses
Abhinav Gupta, Ganeshchandra Mallya, and Rao Govindaraju

A hydrological model incurs three types of uncertainties: measurement, structural and parametric uncertainty. Measurement uncertainty exists due to errors in the measurements of rainfall and streamflow data. Structural uncertainty exists due to errors in the mathematical representation of hydrological processes. Parametric uncertainty is a consequence of limited data available to calibrate the model, and measurement and structural uncertainties.

Recently, separation of structural and measurement uncertainties was identified as one of the twenty-three unsolved problems in hydrology. The information about measurement and structural uncertainties is typically available in the form of residual time-series, that is, the difference between observed and simulated streamflow time-series. The residual time-series, however, provides only an aggregate measure of measurement and structural uncertainties. Thus, the measurement and structural uncertainties are inseparable without additional information. In this study, we used random forest (RF) algorithm to gather additional information about measurement uncertainties using hydrological data across several watersheds. Subsequently, the uncertainty bounds obtained by RF were compared against the uncertainty bounds obtained by two other methods: rating-curve analysis and recently proposed runoff-coefficient method. Rating curve analysis yields uncertainty in streamflow measurements only and the runoff-coefficient yields uncertainty in both rainfall and streamflow measurements. The results of the study are promising in terms of using data across different watersheds for the construction of measurement uncertainty bounds. The preliminary results of this study will be presented in the meeting.

How to cite: Gupta, A., Mallya, G., and Govindaraju, R.: Separation of Structural and Measurement Uncertainties in Watershed Hydrological Models, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-993, https://doi.org/10.5194/egusphere-egu21-993, 2021.

Giacomo Bertoldi, Stefano Campanella, Emanuele Cordano, and Alberto Sartori

Proper characterization of uncertainty remains a major research and operational challenge in Earth and Environmental Systems Models (EESMs). In fact, model calibration is often more an art than a science: one must make several discretionary choices, guided more by his own experience and intuition than by the scientific method. In practice, this means that the result of calibration (CA) could be suboptimal. One of the challenges of CA is the large number of parameters involved in EESM, which hence are usually selected with the help of a preliminary sensitivity analysis (SA). Finally, the computational burden of EESMs models and the large volume of the search space make SA and CA very time-consuming processes.

This work applies a modern HPC approach to optimize a complex, over parameterized hydrological model, improving the computational efficiency of SA/CA. We apply the derivative-free optimization algorithms implemented in the Facebook Nevergrad Python library (Rapin and Teytaud, 2018) on a HPC cluster, thanks to the Dask framework (Dask Development Team, 2016).

The approach has been applied to the GEOtop hydrological model (Rigon et al., 2006; Endrizzi et al., 2014) to predict the time evolution of variables as soil water content and evapotranspiration for several mountain agricultural sites in South Tyrol with different elevation, land cover (pasture, meadow, orchard), soil types.

We performed simulations on one-dimensional domains, where the model solves the energy and water budget equations in a column of soil and neglects the lateral water fluxes.  Even neglecting the distribution of parameters across layers of soil, considering a homogeneous column, one has tens of parameters, controlling soil and vegetation properties, where only a few of them are experimentally available. 

Because the interpretation of global SA could be difficult or misleading and the number of model evaluations needed by SA is comparable with CA, we employed the following strategy. We performed CA using a full set of continuous parameters and SA after CA, using the samples collected during CA, to interpret the results. However, given the above-mentioned computational challenges, this strategy is possible only using HPC resources. For this reason, we focused on the computational aspects of calibration from an HPC perspective and examined the scaling of these algorithms and their implementation up to 1024 cores on a cluster. Other issues that we had to address were the complex shape of the search space and robustness of CA and SA against model convergence failure.

HPC  techniques allow to calibrate models with a high number of parameters within a reasonable computing time and  exploring the parameters space properly. This is particularly important with noisy, multimodal objective functions. In our case, HPC was essential to determine the  parameters controlling the water retention curve, which is highly not linear.  The developed  framework, which is published and freely available on GitHub, shows also how libraries and tools used within the machine learning community could be useful and easily adapted to EESMs CA.

How to cite: Bertoldi, G., Campanella, S., Cordano, E., and Sartori, A.: An empirical study on the GEOtop hydrological model optimal estimation and uncertainty reduction using supercomputers, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15768, https://doi.org/10.5194/egusphere-egu21-15768, 2021.

Spatial bootstrapping for model-free estimation of subcatchment parameter uncertainty for a semi-distributed rainfall runoff model
Everett Snieder and Usman Khan
Sabine M. Spiessl, Dirk-A. Becker, and Sergei Kucherenko

Due to their highly nonlinear, non-monotonic or even discontinuous behavior, sensitivity analysis of final repository models can be a demanding task. Most of the output of repository models is typically distributed over several orders of magnitude and highly skewed. Many values of a probabilistic investigation are very low or even zero. Although this is desirable in view of repository safety it can distort the evidence of sensitivity analysis. For the safety assessment of the system, the highest values of outputs are mainly essential and if those are only a few, their dependence on specific parameters may appear insignificant. By applying a transformation, different model output values are differently weighed, according to their magnitude, in sensitivity analysis. Probabilistic methods of higher-order sensitivity analysis, applied on appropriately transformed model output values, provide a possibility for more robust identification of relevant parameters and their interactions. This type of sensitivity analysis is typically done by decomposing the total unconditional variance of the model output into partial variances corresponding to different terms in the ANOVA decomposition. From this, sensitivity indices of increasing order can be computed. The key indices used most often are the first-order index (SI1) and the total-order index (SIT). SI1 refers to the individual impact of one parameter on the model and SIT represents the total effect of one parameter on the output in interactions with all other parameters. The second-order sensitivity indices (SI2) describe the interactions between two model parameters.

In this work global sensitivity analysis has been performed with three different kinds of output transformations (log, shifted and Box-Cox transformation) and two metamodeling approaches, namely the Random-Sampling High Dimensional Model Representation (RS-HDMR) [1] and the Bayesian Sparse PCE (BSPCE) [2] approaches. Both approaches are implemented in the SobolGSA software [3, 4] which was used in this work. We analyzed the time-dependent output with two approaches for sensitivity analysis, i.e., the pointwise and generalized approaches. With the pointwise approach, the output at each time step is analyzed independently. The generalized approach considers averaged output contributions at all previous time steps in the analysis of the current step. Obtained results indicate that robustness can be improved by using appropriate transformations and choice of coefficients for the transformation and the metamodel.

[1] M. Zuniga, S. Kucherenko, N. Shah (2013). Metamodelling with independent and dependent inputs. Computer Physics Communications, 184 (6): 1570-1580.

[2] Q. Shao, A. Younes, M. Fahs, T.A. Mara (2017). Bayesian sparse polynomial chaos expansion for global sensitivity analysis. Computer Methods in Applied Mechanics and Engineering, 318: 474-496.

[3] S. M. Spiessl, S. Kucherenko, D.-A. Becker, O. Zaccheus (2018). Higher-order sensitivity analysis of a final repository model with discontinuous behaviour. Reliability Engineering and System Safety, doi: https://doi.org/10.1016/j.ress.2018.12.004.

[4] SobolGSA software (2021). User manual https://www.imperial.ac.uk/process-systems-engineering/research/free-software/sobolgsa-software/.

How to cite: Spiessl, S. M., Becker, D.-A., and Kucherenko, S.: Comprehensive global sensitivity analysis of a repository model using different types of transformations and metamodeling techniques, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6131, https://doi.org/10.5194/egusphere-egu21-6131, 2021.

Model structural and modeling decisions
Cristina Prieto, Dmitri Kavetski, Nataliya Nataliya Le Vine, César Álvarez, and Raúl Medina

In hydrological modelling, the identification of hydrological model mechanisms best suited for representing individual hydrological (physical) processes is a major research and operational challenge. We present a statistical hypothesis-testing perspective to identify dominant hydrological mechanism. The method combines: (i) Bayesian estimation of posterior probabilities of individual mechanisms from a given ensemble of model structures; (ii) a test statistic that defines a “dominant” mechanism as a mechanism more probable than all its alternatives given observed data; (iii) a flexible modelling framework to generate model structures using combinations of available mechanisms. The uncertainty in the test statistic is approximated via bootstrap from the ensemble of model structures. Synthetic and real data experiments are conducted using 624 model structures from the hydrological modelling system FUSE and data from the Leizarán catchment in northern Spain. The findings show that the mechanism identification method is reliable: it identifies the correct mechanism as dominant in all synthetic trials where an identification is made. As data/model errors increase, statistical power (identifiability) decreases, manifesting as trials where no mechanism is identified as dominant. The real data case study results are broadly consistent with the synthetic analysis, with dominant mechanisms identified for 4 of 7 processes. Insights on which processes are most/least identifiable are also reported. The mechanism identification method is expected to contribute to broader community efforts on improving model identification and process representation in hydrology.

How to cite: Prieto, C., Kavetski, D., Nataliya Le Vine, N., Álvarez, C., and Medina, R.: Identification of dominant hydrological mechanisms using Bayesian inference, multiple statistical hypothesis testing and flexible models, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3446, https://doi.org/10.5194/egusphere-egu21-3446, 2021.

Janneke Remmers, Ryan Teuling, and Lieke Melsen

Scientific hydrological modellers make multiple decisions during the modelling process, e.g. related to the calibration period and performance metrics. These decisions affect the model results differently. Modelling decisions can refer to several steps in the modelling process. In this project, modelling decisions refer to the decisions made during the whole modelling process, not just the definition of the model structure. Each model output is a hypothesis of the reality; it is an interpretation of the real system underpinned by scientific reasoning and/or expert knowledge. Currently, there is a lack of knowledge and understanding about which modelling decisions are taken and why they are taken. Consequently, the influence of modelling decisions is unknown. Quantifying this influence, which is done in this study, can raise awareness among scientists. This study is based on analysis of interviews with scientific hydrological modellers, thus taking actual practices into account. Different modelling decisions were identified from the interviews, which are subsequently implemented and evaluated in a controlled modelling environment, in our case the modular modelling framework Raven. The variation in the results is analysed to determine which decisions affect the results and how they affect the results. This study pinpoints what aspects are important to consider in studying modelling decisions, and can be an incentive to clarify and improve modelling procedures.

How to cite: Remmers, J., Teuling, R., and Melsen, L.: The impact of modelling decisions in hydrological modelling, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2467, https://doi.org/10.5194/egusphere-egu21-2467, 2021.

Diana Spieler and Niels Schütze

Recent investigations have shown it is possible to simultaneously calibrate model structures and model parameters to identify appropriate models for a given task (Spieler et al., 2020). However, this is computationally challenging, as different model structures may use a different number of parameters. While some parameters may be shared between model structures, others might be relevant for only a few structures, which theoretically requires the calibration of conditionally active parameters. Additionally, shared model parameters might cause different effects in different model structures, causing their optimal values to differ across structures. In this study, we tested how two current “of the shelf” mixed-integer optimization algorithms perform when having to handle these peculiarities during the automatic model structure identification (AMSI) process recently introduced by Spieler et al. (2020).

To validate the current performance of the AMSI approach, we conduct a benchmark experiment with a model space consisting of 6912 different model structures.  First, all model structures are independently calibrated and validated for three hydro-climatically differing catchments using the CMA-ES algorithm and KGE as the objective function. This is referred to as standard calibration procedure. We identify the best performing model structure(s) based on validation performance and analyze the range of performance as well as the number of structures performing in a similar range. Secondly, we run AMSI on all three catchments to automatically identify the most feasible model structure based on the KGE performance. Two different mixed-integer optimization algorithms are used – namely DDS and CMA-ES. Afterwards, we compare the results to the best performing models of the standard calibration of all 6912 model structures.

Within this experimental setup, we analyze if the best performing model structure(s) AMSI identifies are identical to the best performing structures of the standard calibration and if there are differences in performance when using different optimization algorithms for AMSI. We also validate if AMSI can identify the best performing model structures for a catchment at a fraction of the computational cost than the standard calibration procedure requires by using “off the shelf” mixed-integer optimization algorithms.




Spieler, D., Mai, J., Craig, J. R., Tolson, B. A., & Schütze, N. (2020). Automatic Model Structure Identification for Conceptual Hydrologic Models. Water Resources Research, 56(9). https://doi.org/10.1029/2019WR027009

How to cite: Spieler, D. and Schütze, N.: How good does Automatic Model Structure Identification work? A Benchmark Study with 6915 Model Structures., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12232, https://doi.org/10.5194/egusphere-egu21-12232, 2021.

Gowri Reghunath and Pradeep Mujumdar

The hydrological cycle is governed by a number of complex processes which occur at different spatial and temporal scales. Hydrological modelling plays an integral role in enhancing the understanding of hydrological behaviour and process complexities at a range of scales. Different hydrological models have various strengths in the representation of hydrological processes. The performance and applicability of each hydrological model can differ between catchments due to several catchment characteristics and dominant hydrological processes. With a wide variety of model structures, it is important to evaluate how different hydrological models capture the process dynamics in various catchments. This study aims at a comprehensive evaluation of the performance of two widely used hydrological models, namely, the HEC-Hydrologic Modeling System (HEC-HMS) and the Variable Infiltration Capacity (VIC) model, in simulating various water balance components in the sub-catchments of the Cauvery River Basin which is a major river basin in Peninsular India. The basin is characterized by extensive regional variability in land use patterns, water availability, and water demands. The chosen models differ in their model structure complexities, methods adopted for simulation of water balance components, and the representation of geographical information, meteorological and physiographical inputs. The models are calibrated with respect to the observed streamflow at various gauge locations, and the simulated water balance components such as evapotranspiration and baseflow are assessed at annual and seasonal time scales. Also, the impact of the representation of the spatial distribution of input variables and model parameters (lumped versus distributed) are evaluated among the models. This work provides valuable insights into the applicability of various hydrological models in simulating hydrological processes in catchments with high regional complexities. Also, this work aids in the identification of effective models and model parameters which can be useful for hydrological data transfers between catchments as well as predictions in ungauged basins.

How to cite: Reghunath, G. and Mujumdar, P.: A comparative assessment of HEC-HMS and VIC hydrological models for simulating hydrological processes in Cauvery River Basin, India, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14161, https://doi.org/10.5194/egusphere-egu21-14161, 2021.