Multi-dataset, multi-variable, and multi-objective techniques to improve prediction of hydrological and water quality models and their Bayesian applications



Our invited speaker is Lucy Marshall from the University of New South Wales, Sydney, Australia.

The application of multi-datasets and multi-objective functions has proven to improve the performance of hydrologic and water quality models by extracting complementary information from multiple data sources or multiple features of modelled variables. This is useful if more than one variable (runoff and snow cover, sediment or pollutant concentration) or more than one characteristic of the same variable (e.g., flood peaks and recession curves) are of interest.
Similarly, a multi-model approach can overcome shortcomings of individual models, while testing a model at multiple scales helps to improve our understanding of the model functioning in relation to catchment processes. Finally, the quantification of multiple uncertainty sources enables the identification of their individual contributions and this is critical for uncertainty reduction and decision making under uncertainty. In this aspect, Bayesian approaches emerge as very powerful tools for comprehensive handling of uncertainty in data, model structure and parameters.

This session welcomes contributions that apply one or more of the multi-aspects in hydrologic and water quality studies. In particular, we seek studies covering the following issues:
• Frameworks using multi-objectives or multi-variables to improve the identification (prediction) of hydrologic or water quality models
• Studies using multi-model or multiple-data-driven approaches
• Use of multiple scales or sites to improve understanding of catchment processes
• Assimilation of remote sensed data or use of multi-datasets to improve model identification
• Hypothesis testing with one of the multi-aspects
• Metaheuristics (e.g., Monte Carlo) or Bayesian approaches in combination with multi-aspects of model identification
• Techniques to optimize model calibration or uncertainty quantification via multi-aspect analyses
• Studies handling multiple uncertainty sources in a modelling framework
• Bayesian applications to address the problem of scaling (e.g. disparity between process, observations, model resolution and predictions) through hierarchical models
• Bayesian approaches to model water quality in data sparse environments
• Applications of Bayesian Belief Networks as decision support tools
• Application of machine learning and data mining approaches to learn from large, multiple or high-resolution data sets.

Convener: Anna E. Sikorska-Senoner | Co-conveners: Miriam GlendellECSECS, David C. Finger, Alberto Montanari, Ibrahim Alameddine, Lorenz AmmannECSECS, James E. Sample
vPICO presentations
| Wed, 28 Apr, 09:00–10:30 (CEST)

vPICO presentations: Wed, 28 Apr

Chairpersons: Anna E. Sikorska-Senoner, Alberto Montanari, Lorenz Ammann
Multiple techniques in hydrological models
Lucy Marshall

The latest generation of integrated hydrologic models provides new opportunities to better understand and hypothesize about the connections between hydrological, ecological and energy transfer processes across a range of scales. Parallel to this has been unprecedented growth in new technologies to observe components of Earth’s biophysical system through satellite remote sensing or on-the-ground instruments. However, along with growth in available data and advanced modelling platforms comes a challenge to ensure models are representative of catchment systems and are not unrealistically confident in their predictions. Many hydrologic and ecosystem variables are measured infrequently, measured with significant error, or are measured at a scale different to their representation in a model. In fact, the modelled variable of interest is frequently not directly observed but inferred from surrogate measurements. This introduces errors in model calibration that will affect whether our models are representative of the systems we seek to understand.

In recent years, Bayesian inference has emerged as a powerful tool in the environmental modeler’s toolbox, providing a convenient framework in which to model parameter and observational uncertainties. The Bayesian approach is ideal for multivariate model calibration, by defining proper prior distributions that can be considered analogous to the weighting often prescribed in traditional multi-objective calibration. 

In this study, we develop a multi-objective Bayesian approach to hydrologic model inference that explicitly capitalises on a priori knowledge of observational errors to improve parameter estimation and uncertainty estimation. We introduce a novel error model, which partitions observation and model residual error according to prior knowledge of the estimated uncertainty in the calibration data. We demonstrate our approach in two case studies: an ecohydrologic model where we make use of the known uncertainty in satellite retrievals of Leaf Area Index (LAI), and a water quality model using turbidity as a proxy for Total Suspended Solids (TSS). Overall, we aim to demonstrate the need to properly account for known observational errors in proper hydrologic model calibration.

How to cite: Marshall, L.: Incorporating observational errors in multivariate hydrologic model calibration: the value in known unknowns, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10509, https://doi.org/10.5194/egusphere-egu21-10509, 2021.

Silja Stefnisdóttir, Anna E. Sikorska-Senoner, Eyjólfur I. Ásgeirsson, and David C. Finger

Hydrological models are crucial components in water and environmental resource management to provide simulations on streamflow, snow cover, and glacier mass balances. Effective model calibration is however challenging, especially if a multi-objective or multi-dataset calibration is necessary to generate realistic simulations of all flow components under consideration.

In this study, we explore the value of metaheuristics for multi-dataset calibration to simulate streamflow, snow cover and glacier mass balances using the HBV model in the glaciated catchment of the Rhonegletscher in Switzerland. We evaluate the performance of three metaheuristic calibration methods, i.e. Monte Carlo (MC), Simulated Annealing (SA) and Genetic Algorithms (GA), in regard to these three datasets. For all three methods, we compare the model performance using 100 best and 10 best optimized parameter sets.

Our results demonstrate that all three metaheuristic methods can generate realistic simulations of the snow cover, the glacier mass balance and the streamflow. The comparison of these three methods reveals that GA provides the most accurate simulations (with lowest confidence intervals) for all three datasets, for both 100 and 10 best simulations. However, when using all 100 simulations, GA yields also some worst solutions which are eliminated if only 10 best solutions are considered.

Based on our results we conclude that GA-based multi-dataset calibration provides more accurate and more precise simulation than MC or SA. This conclusion is fortified by a reduction of the parameter equifinality and an improvement of the Pareto frontier for GA in comparison to both other metaheuristic methods. This method should therefore lead to more reproducible and consistent hydrological simulations.

How to cite: Stefnisdóttir, S., Sikorska-Senoner, A. E., Ásgeirsson, E. I., and Finger, D. C.: Advantages of metaheuristics for multi-dataset calibration of hydrological models, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1395, https://doi.org/10.5194/egusphere-egu21-1395, 2021.

Georgia Papacharalampous, Hristos Tyralis, Demetris Koutsoyiannis, and Alberto Montanari

Probabilistic hydrological modelling methodologies often comprise two-stage post-processing schemes, thereby allowing the exploitation of the information provided by conceptual or physically-based rainfall-runoff models. They might also require issuing an ensemble of rainfall-runoff model simulations by using the rainfall-runoff model with different input data and/or different parameters. For obtaining a large number of rainfall-runoff model parameters in this regard, Bayesian schemes can be adopted; however, such schemes are accompanied by computational limitations (that are well-recognized in the literature). Therefore, in this work, we investigate the replacement of Bayesian rainfall-runoff model calibration schemes by computationally convenient non-Bayesian schemes within probabilistic hydrological modelling methodologies of the above-defined family. For our experiments, we use a methodology of this same family that is additionally characterized by the following distinguishing features: It (a) is in accordance with a theoretically consistent blueprint, (b) allows the exploitation of quantile regression algorithms (which offer larger flexibility than parametric models), and (c) has been empirically proven to harness the “wisdom of the crowd” in terms of average interval score. We also use a parsimonious conceptual rainfall-runoff model and 50-year-long monthly time series observed in 270 catchments in the United States to apply and compare 12 variants of the selected methodology. Six of these variants simulate the posterior distribution of the rainfall-runoff model parameters (conditional on the observations of a calibration period) within a Bayesian Markov chain Monte Carlo framework (first category of variants), while the other six variants use a simple computationally efficient approach instead (second category of variants). Six indicative combinations of the remaining components of the probabilistic hydrological modelling methodology (i.e., its post-processing scheme and its error model) are examined, each being used in one variant from each of the above-defined categories. In this specific context, the two large-scale calibration schemes (each representing a different “modelling culture” in our tests) are compared using proper scores and large-scale benchmarking. Overall, our findings suggest that the compared “modelling cultures” can lead to mostly equally good probabilistic predictions.

How to cite: Papacharalampous, G., Tyralis, H., Koutsoyiannis, D., and Montanari, A.: Large-scale calibration of conceptual rainfall-runoff models for two-stage probabilistic hydrological post-processing, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-18, https://doi.org/10.5194/egusphere-egu21-18, 2021.

Omar Cenobio-Cruz, Anaïs Barella-Ortiz, Pere Quintana-Seguí, and Luis Garrote

The SASER (Safran-Surfex-Eaudysee-Rapid) hydrological modeling chain is a physically-based and distributed hydrological model that has been implemented over two domains: Iberia and the Pyrenees. Currently, it is used for drought studies (HUMID project) and water resources analysis (PIRAGUA project).

In this modeling chain, SAFRAN provides the meteorological forcing, SURFEX is the LSM that performs the water and energy balances and Eaudyssée-RAPID simulates daily streamflow. SAFRAN and SURFEX are run at a spatial resolution of 5 km for the Iberian implementation and 2.5 km for the Pyrenean one. Daily streamflow is calculated by the RAPID river routing scheme at a spatial resolution of 1 km in both cases. SAFRAN analyzes daily observed precipitation, which is then interpolated to the hourly scale. For precipitation, relative humidity is currently used to hourly distribute the daily precipitation.

SASER is able to simulate adequate streamflow on the Ebro basin (KGE>0.5 on 62% of near-natural gauging stations when the LSM is run at 2.5 km of spatial resolution). However, due to the lack of a hydrogeological model, low flows are often poorly reproduced by this scheme. Furthermore, peak flows could also be improved.

This work aims at improving high and lows by correcting the distribution of hourly precipitation and adding linear reservoirs to improve low flows.

The increase of the spatial resolution from 5 to 2.5 km has caused a relevant improvement of peak flows. However, most of the peak flows are still underestimated. One way of improving simulated streamflow is improving the hourly distribution of the precipitation, as SAFRAN distributes precipitation through the day with unrealistic low hourly intensities. This will impact runoff generation and, thus, peak flow. We have used two ERA-Interim driven RCM simulations from the CORDEX project to improve the hourly distribution of precipitation. As a result, we now produce more realistic temporal patterns of hourly precipitation.

The current SASER implementation is not able to sustain low flows. A physical-based solution (hydrogeological model) would be desirable, but as it is difficult to implement we chose to introduce a linear reservoir, following the steps of Artinyan et al (2008) and Getinara et al. (2014). The reservoir is able to improve low flows in most near-natural subbasins. The challenge now is how to set its parameters in non-natural basins.

How to cite: Cenobio-Cruz, O., Barella-Ortiz, A., Quintana-Seguí, P., and Garrote, L.: Improvement of the simulation of high and low flows in the LSM based hydrological modeling chain SASER applied to the Ebro river basin, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7779, https://doi.org/10.5194/egusphere-egu21-7779, 2021.

John M. Quilty and Anna E. Sikorska-Senoner

Despite significant efforts to improve the calibration of hydrological models, when applied to real-world case studies, model errors (residuals) remain. These residuals impair flow estimates and can lead to unreliable design, management, and operation of water resources systems. Since these residuals are auto-correlated, they should be treated with appropriate methods that do not require limiting assumptions (e.g., that the residuals follow a Gaussian distribution).

This study introduces a novel data-driven framework to account for residuals of hydrological models. Our framework relies on a conceptual-data-driven approach (CDDA) that integrates two models, i.e., a hydrological model (HM) with a data-driven (i.e., machine learning) model (DDM), to simulate an ensemble of residuals from the HM. In the first part of the CDDA, a HM is used to generate an ensemble of streamflow simulations for different parameter sets. Afterwards, residuals associated with each simulation are computed and a DDM developed to predict the residuals. Finally, the original streamflow simulations are coupled with the DDM predictions to produce the CDDA output, an improved ensemble of streamflow simulations. The proposed CDDA is a useful approach since it respects hydrological processes via the HM and it profits from the DDM’s ability to simulate the complex (nonlinear) relationship between residuals and input variables.

To explore the utility of CDDA, we focus principally on identifying the best DDM and input variables to mimic HM residuals. For this purpose, we have explored eight different DDM variants and multiple input variables (observed precipitation, air temperature, and streamflow) at different lag times prior to the simulation day. Based on a case study involving three Swiss catchments, the proposed CDDA framework is shown to be very promising at improving ensemble streamflow simulations, reducing the mean continuous ranked probability score by 16-29 % when compared to the standalone HM. It was found that eXtreme Gradient Boosting (XGB) and Random Forests (RF), each using 29 input variables, were the strongest predictors of the HM residuals. However, similar performance could be achieved by selecting only the six most important (of the original 29) input variables and re-training the XGB and RF models.

Additional experimentation shows that by converting CDDA to a stochastic framework (i.e., to account for important uncertainty sources), significant gains in model performance can be achieved.

How to cite: Quilty, J. M. and Sikorska-Senoner, A. E.: Learning from one’s errors: A data-driven approach for mimicking an ensemble of hydrological model residuals, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13244, https://doi.org/10.5194/egusphere-egu21-13244, 2021.

Carolina Natel de Moura, João Marcos Carvalho, and Jan Seibert

Global meteorological and hydrological datasets have become increasingly available in the past few decades, marked by an increase in the number of large datasets, often including hundreds of catchments. These data sets bring two main advantages: the ability to perform hydrological modeling over a large number of catchments located in different hydroclimate characteristics, - which leads us to more robust hypothesis testing, and the ability to address the uncertainties related to the hydrological model input data. However, the full added value to hydrological modeling is not yet fully understood. The main questions surrounding the use of multi-source and large-scale datasets are related to how much value these datasets add to the performance of hydrological models. How different are these datasets, how accurate are they, and whether their use results in similar or rather different hydrological simulations? Other questions are how can we better combine them for improved predictions, and what is the average uncertainty of the input datasets in hydrological modeling? We aimed here to investigate better those issues using Brazilian catchments as study cases. The Brazilian hydrometeorological network has several issues to overcome, such as an undistributed spatial network resulting in data-scarce areas, a large amount of missing data, and the lack of standardized and transparent quality analysis. In this study, we used a national hydrometeorological dataset (CAMELS-BR) along with other several global forecast and reanalysis meteorological datasets, such as the CFSv2 and ECMWF, for the streamflow prediction using the data-driven model Long-Short Term Memory (LSTM). Initial results indicate that calibrating a recurrent neural network is clearly depending on the data source. Moreover, the tested global meteorological products are found to be suitable for hydrological modeling. The combination of different data sources in the hydrological model seems to be beneficial, especially in those areas where ground-level gauge stations are scarce.

How to cite: Natel de Moura, C., Carvalho, J. M., and Seibert, J.: Value of multi-source dataset for hydrological catchment modeling, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7138, https://doi.org/10.5194/egusphere-egu21-7138, 2021.

Moctar Dembélé, Bettina Schaefli, and Grégoire Mariéthoz

The diversity of remotely sensed or reanalysis-based rainfall data steadily increases, which on one hand opens new perspectives for large scale hydrological modelling in data scarce regions, but on the other hand poses challenging question regarding parameter identification and transferability under multiple input datasets. This study analyzes the variability of hydrological model performance when (1) a set of parameters is transferred from the calibration input dataset to a different meteorological datasets and reversely, when (2) an input dataset is used with a parameter set, originally calibrated for a different input dataset.

The research objective is to highlight the uncertainties related to input data and the limitations of hydrological model parameter transferability across input datasets. An ensemble of 17 rainfall datasets and 6 temperature datasets from satellite and reanalysis sources (Dembélé et al., 2020), corresponding to 102 combinations of meteorological data, is used to force the fully distributed mesoscale Hydrologic Model (mHM). The mHM model is calibrated for each combination of meteorological datasets, thereby resulting in 102 calibrated parameter sets, which almost all give similar model performance. Each of the 102 parameter sets is used to run the mHM model with each of the 102 input datasets, yielding 10404 scenarios to that serve for the transferability tests. The experiment is carried out for a decade from 2003 to 2012 in the large and data-scarce Volta River basin (415600 km2) in West Africa.

The results show that there is a high variability in model performance for streamflow (mean CV=105%) when the parameters are transferred from the original input dataset to other input datasets (test 1 above). Moreover, the model performance is in general lower and can drop considerably when parameters obtained under all other input datasets are transferred to a selected input dataset (test 2 above). This underlines the need for model performance evaluation when different input datasets and parameter sets than those used during calibration are used to run a model. Our results represent a first step to tackle the question of parameter transferability to climate change scenarios. An in-depth analysis of the results at a later stage will shed light on which model parameterizations might be the main source of performance variability.

Dembélé, M., Schaefli, B., van de Giesen, N., & Mariéthoz, G. (2020). Suitability of 17 rainfall and temperature gridded datasets for large-scale hydrological modelling in West Africa. Hydrology and Earth System Sciences (HESS). https://doi.org/10.5194/hess-24-5379-2020

How to cite: Dembélé, M., Schaefli, B., and Mariéthoz, G.: Parameter transferability between multiple gridded input datasets challenges hydrological model performance under changing climate, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8782, https://doi.org/10.5194/egusphere-egu21-8782, 2021.

Multiple techniques in water quality & groundwater models
Alban de Lavenne, Göran Lindström, Johan Strömqvist, Charlotta Pers, Alena Bartosova, and Berit Arheimer

Conceptual hydrological models can move towards process-oriented modelling when addressing broader issues than flow modelling alone. For instance, water quality modelling generally requires understanding of pathways and travel times. However, conceptual modelling often relies on a calibration procedure of discharge at the outlet, which aggregates all processes at the catchment scale. As the number of parameters increases, such an approach can lead to model over-parametrisation issues. In this study we tested if adding a second kind of observation, specifically sediment data, can help distinguish surface runoff from total discharge. This new constraint relies on a hypothesis that in stream sediment concentrations are strongly influenced by surface runoff (through erosion and remobilisation). We tested our hypothesis by applying a multi-objective calibration (simulation performance on discharge and suspended sediment) to the World-Wide HYPE hydrological model (WWH) and we used this framework to evaluate new surface flow modelling routines. We gathered data for 111 catchments spread over the USA where both discharge and sediment observation were available at a daily step at locations suitable for WWH. 

Results show that in comparison to a single-objective calibration on discharge this multi-objective calibration enables a significant improvement on the simulation performance of suspended sediments without a significant impact on the performance of discharge. This illustrates the benefits of multi-objective calibration rather than using two calibrations made one after the other. In addition, this evaluation framework highlights the advantage of a new process description for surface runoff in the WWH model that relates soil moisture conditions to surface runoff ratio. The new surface runoff routine resulted in similar discharge performances as the original one but with fewer parameters, which reduce equifinality and can prevent inadequate model complexity in data-poor areas. 

How to cite: de Lavenne, A., Lindström, G., Strömqvist, J., Pers, C., Bartosova, A., and Arheimer, B.: Evaluation of surface runoff model hypothesis by multi-objective calibration using discharge and sediment data, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15426, https://doi.org/10.5194/egusphere-egu21-15426, 2021.

Petra Hulsman, Hubert Savenije, and Markus Hrachowitz

In the Luangwa river basin in Zambia, the total water storage observed by GRACE (Gravity Recovery and Climate Experiment) shows an oscillating pattern, such that the annual minimum/maximum storage decreased in 2002 – 2006 after which it increased until 2010, which was again followed by a storage decrease. However, this pattern was not reproduced by a standard conceptual hydrological model. Similarly, previous studies illustrated the inability of standard conceptual hydrological models to reproduce long-term storage variations in many river basins world-wide. This study identified processes that potentially caused the observed long-term storage variations in the Luangwa basin through data analysis and model hypothesis testing. During data analysis, long-term storage variations were compared to satellite-based products for precipitation, actual evaporation, potential evaporation and NDVI observations. During model hypotheses testing, we analysed 1) four different model forcing combinations and 2) five alternative model hypotheses for groundwater export to neighbouring basins. The results indicated that the benchmark model did not reproduce the observed long-term storage variations partly due to the forcing data and partly due to the missing process of regional groundwater export. Alternative model forcing data affected the modelled annual maximum storage, whereas the annual minima improved when adapting the model structure allowing for regional groundwater export from a deeper groundwater layer. In other words, standard conceptual hydrological models can reproduce long-term storage variations when using a suitable model structure.

How to cite: Hulsman, P., Savenije, H., and Hrachowitz, M.: Improving modelled long-term storage variations with standard hydrological models in data-scarce regions, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-699, https://doi.org/10.5194/egusphere-egu21-699, 2021.

Doris Duethmann, Aaron Smith, Lukas Kleine, Chris Soulsby, and Doerthe Tetzlaff

It is widely acknowledged that calibrating and evaluating hydrological models only against streamflow may lead to inconsistencies of internal model states and large parameter uncertainties. Soil moisture is a key variable for the energy and water balance, which affects the partitioning of solar radiation into latent and sensible heat as well as the partitioning of precipitation into direct runoff and catchment storage. In contrast to ground-based measurements, satellite-derived soil moisture (SDSM) data are widely available and new data products benefit from improved spatio-temporal resolutions. Here we use a soil water index product based on data fusion of microwave data from METOP ASCAT and Sentinel 1 CSAR for calibrating the process-based ecohydrological model EcH2O-iso in the 66 km² Demnitzer Millcreek catchment in NE Germany. Available field measurements in and close to this intensively monitored catchment include soil moisture data from 74 sensors and water stable isotopes in precipitation, stream and soil water. Water stable isotopes provide information on flow pathways, storage dynamics, and the partitioning of evapotranspiration into evaporation and transpiration. Accounting for water stable isotopes in the ecohydrologic model therefore provides further insights regarding the consistency of internal processes. We first compare the SDSM data to the ground-based measurements. Based on a Monte Carlo approach, we then investigate the trade-off between model performance in terms of soil moisture and streamflow. In situ soil moisture and water stable isotopes are further consulted to evaluate the internal consistency of the model. Overall, we find relatively good agreements between satellite-derived and ground based soil moisture dynamics. Preliminary results suggest that including SDSM in the model calibration can improve the simulation of internal processes, but uncertainties of the SDSM data should be accounted for. The findings of this study are relevant for reliable ecohydrological modelling in catchments that lack detailed field measurements for model evaluation.

How to cite: Duethmann, D., Smith, A., Kleine, L., Soulsby, C., and Tetzlaff, D.: Value of satellite-derived soil moisture data to improve the internal consistency of process-based ecohydrological models, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14123, https://doi.org/10.5194/egusphere-egu21-14123, 2021.

Emilie Rouzies, Claire Lauvernet, and Arthur Vidard

Intensive use of pesticides in agricultural catchments leads to a widespread contamination of rivers and groundwater. Pesticides applied on fields are transferred at surface and subsurface to waterbodies, resulting from the interaction of various physical processes. They are also highly influenced by landscape elements that can accelerate or slow down and dissipate water and contaminant flows. The PESHMELBA model has been developed to simulate pesticide fate on small agricultural catchments and to represent the landscape elements in an explicit way. It is characterized by a process-oriented approach and a modular structure that couples different models.

In the long run, we aim at setting up and comparing different landscape organization scenarios for decision-making support. However, before considering such operationnal use of PESHMELBA, uncertainties must be quantified and reduced. Additionally, the model is physically-based, fully-spatialized which leads to a large set of parameters that must be carefully estimated. To tackle both objectives, we set up a data assimilation framework based on satellite images and in situ data and we evaluate the potential of Ensemble Smoother for joint variable-parameter assimilation. Assimilating surface moisture images allows for direct correction of variables and parameters on the top part of the soil. However, due to the PESHMELBA structure based on a dynamic parallel code coupler (OpenPALM), the impact of such correction on other compartments and other physical processes has to be finely assessed.

In this preliminary study, a fairly simple virtual hillslope inspired from a realistic catchment is set up and data assimilation is performed on twin experiments, i.e., using virtual surface moisture images. The potential of such technique for improving the global performances of the model is scrutinized and the sensitivity to the assimilation framework (ensemble size, frequency of observations, errors, etc.) is assessed. Valuable information on the coupling functioning are obtained allowing for anticipating performances in a real case. Identified limitations of surface moisture assimilation also give precious indications about existing gaps and pave the way for multi-source data assimilation.


How to cite: Rouzies, E., Lauvernet, C., and Vidard, A.: How to reduce uncertainties in a coupled and spatialized water quality model using data assimilation?, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14738, https://doi.org/10.5194/egusphere-egu21-14738, 2021.

Jia-Ying Dai, You-Jia Chen, Gwo-Wen Hwang, and Su-Ting Cheng

Dissolved oxygen (DO) is a critical factor that controls the health and survival of the aquatic life. In the lower Danshuei River of Taiwan, DO was occasionally lower than 2 mg/L leading to several fish kill events. Since 2018, the Taipei city government started to continuously monitor hourly DO and other water quality factors at sites of Cheng-Mei Bridge and Cheng-De Bridge. However, at most sites, the monitoring has been conducted once a month. To provide sufficient DO predictions for preventing the occurrence of fish kills, a mechanistic DO modeling is required. As a result, in this study, we developed a system dynamic DO modeling considering oxygen exchange between the air-water and up/downstream interfaces with instream interactions of reaeration, photosynthesis, sediment oxygen demand (SOD), biochemical oxygen demand (BOD), respiration, and deoxygenation using the STELLA Architect software. In the model, we used meteorological data, water quality data, and hydrological data (flow rate, cross-section area, and hydraulic depth) simulated by HEC-RAS as input data to simulate daily DO at Cheng-Mei Bridge. Field measurements ranging from 0.21 to 10.34 mg/L were used to calibrate and validate the simulation results during Jan. to Aug. 2018, and Sep. to Dec. 2018, respectively. Our simulation results appeared reasonably good accuracy, in which the root mean square error (RMSE) ranging from 0.5 to 1.5 mg/L, and the percentage root mean square error (PRMSE) ranging from 5 to 15%. Moreover, results showed that DO was most sensitive to hydrological data, deoxygenation coefficient, and reaeration coefficient such that the meteorological conditions, like temperature and wind speed, were also important variables triggering hypoxia or anoxia that caused fish kills. Consequently, to better avoid or mitigate the occurrence of fish kills, we believe this physically-based DO modeling coupled with meteorological variables will offer useful information in predicting the condition of DO along the lower Danshuei River for managers to take preventative actions.

How to cite: Dai, J.-Y., Chen, Y.-J., Hwang, G.-W., and Cheng, S.-T.: A Mechanistic Dissolved Oxygen Modeling for Riverine Fish Kill Prevention, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9327, https://doi.org/10.5194/egusphere-egu21-9327, 2021.

Marvin Höge, Anneli Guthke, and Wolfgang Nowak

In environmental modelling it is usually the case that multiple models are plausible, e.g. for predicting a certain quantity of interest. Using model rating methods, we typically want to elicit a single best one or the optimal average of these models. However, often, such methods are not properly applied which can lead to false conclusions.

At the examples of three different Bayesian approaches to model selection or averaging (namely 1. Bayesian Model Selection and Averaging (BMS/BMA), 2. Pseudo-BMS/BMA and 3. Bayesian Stacking), we show how very similarly looking methods pursue vastly different goals and lead to deviating results for model selection or averaging.

All three yield a weighted average of predictive distributions. Yet, only Bayesian Stacking has the goal of averaging for improved predictions in the sense of an actual (optimal) model combination. The other approaches pursue the quest of finding a single best model as the ultimate goal - yet, on different premises - and use model averaging only as a preliminary stage to prevent rash model choice.

We want to foster their proper use by, first, clarifying their theoretical background and, second, contrasting their behaviors in an applied groundwater modelling task. Third, we show how the insights gained from these Bayesian methods are transferrable to other (also non-Bayesian) model rating methods and we pose general conclusions about multi-model usage based on model weighting.



How to cite: Höge, M., Guthke, A., and Nowak, W.: Better Than Just Average: The Many Faces of Bayesian Model Weighting Methods and What They Tell Us about Multi-Model Use, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2192, https://doi.org/10.5194/egusphere-egu21-2192, 2021.

Bayesian applications in water quality modelling
Mohamad Abbas and Ibrahim Alameddine

Inland water bodies are variable and complex environments, which are indispensable for maintaining biodiversity and providing ecosystem services. The ecological functions of these environments are increasingly threatened by several stressors such as climate change, human activities and other natural stressors. Anthropogenic eutrophication has become one of the most pressing causes of water quality degradation of freshwater ecosystems worldwide. The eutrophication process accelerates the occurrence of algal blooms, with the dominance of potentially toxic cyanobacterial species. As a result, the assessment and monitoring of change in the eutrophic status of these systems is deemed necessary for adopting efficient and adaptive water quality management plans. While conventional monitoring methods provide accurate snapshots of eutrophication metrics at discrete points, they do not provide a synoptic coverage of the status of a water body in space and time. Compared with in situ monitoring, remote sensing provides an effective method to assess the water quality dynamics of water bodies globally at a relatively high spatio-temporal resolution. Yet, the full potential of remote sensing towards assessing eutrophication in inland freshwater systems has so far remained limited by the need to develop site specific models that need extensive local calibration and validation. This constraint is associated with the poor transferability of these models between systems. In this work, we develop a Bayesian hierarchical modelling (BHM) framework that provides a comprehensive models that can be used to predict chlorophyll-a levels, Secchi disk depth (SDD), and total suspended solids across the continental United States (US) based on Landsat 5, 7 and 8 surface reflectance data. The proposed BHM is able to assess, account, and quantify the lake, ecoregion, and trophic status variabilities. The model is developed based on the AquaSat database that contains more than 600,000 observations collected between 1984 and 2019 from lakes and reservoirs across the contiguous US. The model improved the predictions of SDD and Chlorophyll-a the most as compared to the pooled model; yet no such improvements were observed for TSS. Meanwhile, making use of the ecoregion categorization to develop the BHM structure proved to be the most advantageous.

How to cite: Abbas, M. and Alameddine, I.: A Bayesian hierarchical model for assessing the eutrophication status of inland freshwater systems in the contiguous United States from Landsat time series: the promise of a universal transferable model, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2288, https://doi.org/10.5194/egusphere-egu21-2288, 2021.

Craig Wilkie, Surajit Ray, Marian Scott, and Claire Miller

Rivers are vital parts of the hydrosphere, providing ecosystem services and water for drinking and agriculture. However, rapid industrialisation and urbanisation globally leads to increasing pollution in many rivers. On their own, many in-river monitoring efforts in lower middle income countries do not yet provide enough information to adequately understand the general state or trends in freshwater ecosystems, presenting difficulties in efforts to mitigate water quality degradation. However, new sources of data such as satellites, drones and sondes provide better spatial and temporal coverage of the river network. This talk presents a statistical downscaling approach for the fusion of data from these different sources into a single product with improved accuracy and coverage compared to that of an individual source, through a Bayesian hierarchical modelling approach. The model development is motivated by the Ramganga river in northern India, a source of irrigation for crops and drinking water that supports millions of people, but suffers from heavy metal and nutrient pollution from population pressures, intensive agriculture and industries along its length, leading to water quality degradation and biodiversity loss. The work takes place as part of the Ramganga Water Data Fusion Project, funded by the UK Global Challenges Research Fund with the aim of informing work such as risk-based modelling and developing future monitoring design to improve mitigation efforts.

How to cite: Wilkie, C., Ray, S., Scott, M., and Miller, C.: Bayesian spatiotemporal statistical modelling of water quality within rivers, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10843, https://doi.org/10.5194/egusphere-egu21-10843, 2021.

Miriam Glendell, Mads Troldborg, Zisis Gagkas, Andy Vinten, Allan Lilly, and Donald Reid

Pesticides are contaminants of priority concern regulated under the EU Water Framework Directive 2000 (WFD) and its daughter Directives. Article 7 of the WFD promotes a ‘prevention-led’ approach that prioritises pollution prevention at source rather than costly drinking water treatment.

However, the effectiveness of pollution mitigation measures in catchment systems is uncertain and catchment management needs to consider local biophysical, agronomic, and social aspects. Local risk assessment and management of water contamination in drinking water catchments informed by process-based models is costly and often hindered by lack of data. Therefore, spatial risk indices have been developed to evaluate the intrinsic risk from pesticide pollution. However, these risk assessment approaches do not explicitly account for uncertainties in complex processes and their interactions. 

In this study, we developed a probabilistic decision support tool (DST) based on spatial Bayesian Belief Networks (BBN) to inform field-level pesticide mitigation strategies in a small drinking water catchment (3.5 km2) with limited observational data. The DST accounts for the spatial heterogeneity of soil properties, topographic connectivity, and agronomic practices; temporal variability of climatic and hydrological processes as well as uncertainties related to pesticide properties and the effectiveness of management interventions. Furthermore, the graphical nature of the BBN facilitated interactive model development and evaluation with stakeholders, while the ability to integrate diverse data sources allowed an effective use of available data.

The risk of pesticide loss via two pathways (overland flow and leaching to groundwater) was simulated for five active ingredients. Risk factors included climate and hydrology (temperature, rainfall, evapotranspiration, overland and subsurface flow), soil properties (texture, organic matter content, hydrological properties), topography (slope, distance to surface water/depth to groundwater), landcover and agronomic practices, pesticide properties and usage. The effectiveness of mitigation measures such as delayed pesticide application timing; 10%, 25% and 50% application rate reduction; field buffers; and presence/absence of soil pan on risk reduction were evaluated.

Sensitivity analysis identified the application rates, rainfall, and overland flow connectivity as the most important risk factors. Pesticide pollution risk from surface water runoff showed clear spatial variability across the study catchment, while groundwater leaching risk was more uniform. Combined interventions of 50% reduced pesticide application rate, management of plough pan, delayed application timing and field buffer installation reduced the probability of high-risk from overland flow in several fields.

The DST provided a probabilistic dynamic field-scale assessment of ‘critical risk areas’ of pesticide pollution in time and space and is easily transferable to neighbouring catchments.

How to cite: Glendell, M., Troldborg, M., Gagkas, Z., Vinten, A., Lilly, A., and Reid, D.: A probabilistic decision support tool for field level pesticide risk assessment in a small drinking water catchment on the Island of Jersey, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1178, https://doi.org/10.5194/egusphere-egu21-1178, 2021.

Kerr Adams, Miriam Glendell, Marc Metzger, Rachel Helliwell, Christopher (Kit) Macleod, and Sarah Gillman

The cumulative impacts of future climatic and socio-economic change have the potential to threaten the resilience of freshwater catchments and the important socio-ecological services they provide. Working with stakeholder groups from Scottish Water (statutory corporation that provides water and sewerage services across Scotland) and the Scottish Environment Protection Agency (environmental regulator), we established a participatory method for developing a Bayesian Network (BN) model to simulate the resilience of the Eden catchment, in eastern Scotland, to future pressures. The Eden catchment spans approximately 319km2, arable farming is the major land use, and the catchment falls within the Strathmore, Fife and Angus Nitrate Vulnerable Zone. The participatory method involves co-developing a BN model structure by conceptually mapping land management, water resource and wastewater services.  Working with stakeholders, appropriate baseline data is identified to define and parameterise variables that represent the Eden catchment system and future scenarios. Key factors including climate, land-use and population change were combined in future scenarios and are represented in the BN through causal relationships. Scenarios consider shocks and changes to the catchment system in a 2050-time horizon. Resilience is measured by simulating the impacts of the future scenarios and their influence on natural, social and manufactured capitals within a probabilistic framework. Relationships between specific components of the catchment system can be evaluated using sensitivity analysis and strength of influence to better understand the interactions between specific variables. The participatory modelling improved the structure of the BN through collaborative learning with stakeholders, increasing understanding of the catchment system and stakeholder confidence in the probabilistic outputs. This participatory method delivered a purpose built, user-friendly decision support tool to help stakeholders understand the cumulative impacts of both climatic and socio-economic factors on catchment resilience.

How to cite: Adams, K., Glendell, M., Metzger, M., Helliwell, R., Macleod, C. (., and Gillman, S.: Participatory methods for developing a Bayesian network model for simulating catchment resilience under future scenarios., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9876, https://doi.org/10.5194/egusphere-egu21-9876, 2021.

Rosa F Ropero, M Julia Flores, and Rafael Rumí

Environmental data often present missing values or lack of information that make modelling tasks difficult. Under the framework of SAICMA Research Project, a flood risk management system is modelled for Andalusian Mediterranean catchment using information from the Andalusian Hydrological System. Hourly data were collected from October 2011 to September 2020, and present two issues:

  • In Guadarranque River, for the dam level variable there is no data from May to August 2020, probably because of sensor damage.
  • No information about river level is collected in the lower part of Guadiaro River, which make difficult to estimate flood risk in the coastal area.

In order to avoid removing dam variable from the entire model (or those missing months), or even reject modelling one river system, this abstract aims to provide modelling solutions based on Bayesian networks (BNs) that overcome this limitation.

Guarranque River. Missing values.

Dataset contains 75687 observations for 6 continuous variables. BNs regression models based on fixed structures (Naïve Bayes, NB, and Tree Augmented Naïve, TAN) were learnt using the complete dataset (until September 2019) with the aim of predicting the dam level variable as accurately as possible. A scenario was carried out with data from October 2019 to March 2020 and compared the prediction made for the target variable with the real data. Results show both NB (rmse: 6.29) and TAN (rmse: 5.74) are able to predict the behaviour of the target variable.

Besides, a BN based on expert’s structural learning was learnt with real data and both datasets with imputed values by NB and TAN. Results show models learnt with imputed data (NB: 3.33; TAN: 3.07) improve the error rate of model with respect to real data (4.26).

Guadairo River. Lack of information.

Dataset contains 73636 observations with 14 continuous variables. Since rainfall variables present a high percentage of zero values (over 94%), they were discretised by Equal Frequency method with 4 intervals. The aim is to predict flooding risk in the coastal area but no data is collected from this area. Thus, an unsupervised classification based on hybrid BNs was performed. Here, target variable classifies all observations into a set of homogeneous groups and gives, for each observation, the probability of belonging to each group. Results show a total of 3 groups:

  • Group 0, “Normal situation”: with rainfall values equal to 0, and mean of river level very low.
  • Group 1, “Storm situation”: mean rainfall values are over 0.3 mm and all river level variables duplicate the mean with respect to group 0.
  • Group 2, “Extreme situation”: Both rainfall and river level means values present the highest values far away from both previous groups.

Even when validation shows this methodology is able to identify extreme events, further work is needed. In this sense, data from autumn-winter season (from October 2020 to March 2021) will be used. Including this new information it would be possible to check if last extreme events (flooding event during December and Filomenastorm during January) are identified.




How to cite: F Ropero, R., Flores, M. J., and Rumí, R.: Missing values and lack of information in water management datasets: an approach based on Bayesian Networks, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10004, https://doi.org/10.5194/egusphere-egu21-10004, 2021.

Jingshui Huang, Pablo Merchan-Rivera, Gabriele Chiogna, Markus Disse, and Michael Rode

Water quality models offer to study dissolved oxygen (DO) dynamics and resulting DO balances. However, the infrequent temporal resolution of measurement data commonly limits the reliability of disentangling and quantifying instream DO process fluxes using models. These limitations of the temporal data resolution can result in the equifinality of model parameter sets. In this study, we aim to quantify the effect of the combination of emerging high-frequency monitoring techniques and water quality modelling for 1) improving the estimation of the model parameters and 2) reducing the forward uncertainty of the continuous quantification of instream DO balance pathways.

To this end, synthetic measurements for calibration with a given series of frequencies are used to estimate the model parameters of a conceptual water quality model of an agricultural river in Germany. The frequencies vary from the 15-min interval, daily, weekly, to monthly. A Bayesian inference approach using the DREAM algorithm is adopted to perform the uncertainty analysis of DO simulation. Furthermore, the propagated uncertainties in daily fluxes of different DO processes, including reaeration, phytoplankton metabolism, benthic algae metabolism, nitrification, and organic matter deoxygenation, are quantified.

We hypothesize that the uncertainty will be larger when the measurement frequency of calibrated data was limited. We also expect that the high-frequency measurements significantly reduce the uncertainty of flux estimations of different DO balance components. This study highlights the critical role of high-frequency data supporting model parameter estimation and its significant value in disentangling DO processes.

How to cite: Huang, J., Merchan-Rivera, P., Chiogna, G., Disse, M., and Rode, M.: Can high-frequency data enable better parameterization of water quality models and disentangling of DO processes?, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8936, https://doi.org/10.5194/egusphere-egu21-8936, 2021.

Matthias Pucher, Peter Flödl, Daniel Graeber, Klaus Felsenstein, Thomas Hein, and Gabriele Weigelhofer

The carbon cycle in aquatic environments is of high interest because of its effects on water quality and greenhouse gas production as well as its alteration through anthropogenic activities with unknown outcomes. Uptake and release of dissolved organic matter (DOM) compounds is depending on the molecular structure and is strongly linked to N and P dynamics. Current research has not fully revealed the complex patterns behind.

To investigate the interactions between DOM components, we performed ten plateau addition experiments with different, realistic, complex DOM leachates (cow dung, pig dung, corn, leaves and nettles) in a small stream. The DOM quality was determined by fluorescence measurements and parallel factor (PARAFAC) decomposition and the nutrient concentrations were measured at eleven consecutive points in the stream at plateau conditions. The hydrological transport processes were incorporated by using the results of a 1-D hydrodynamic model.

The nutrient spiralling concept and its application in nutrient dynamics is a valuable basis for the analysis of our data. However, we could not find a data analysis approach, that suited the nature of our questions and data. Based on previously observed nutrient uptake models, we extended the nutrient spiralling concept by additional non-linear terms to analyse interactions between different DOM components.

We developed the “Interactions in Nutrient Spirals using BayesIan REgression (INSBIRE)” approach to analyse DOM uptake and retention mechanism. This approach can disentangle complex and interacting biotic and abiotic drivers in nutrient uptake metrics, show their variability and quantify their error distribution. We successfully used INSBIRE to show DOM-compound-specific interactions and draw conclusions from the data of our experiment. The applicability of INSBIRE has still to be tested in other studies, but we see a high potential not only in DOM dynamics but any kind of solute dynamics where interactions are crucial.

How to cite: Pucher, M., Flödl, P., Graeber, D., Felsenstein, K., Hein, T., and Weigelhofer, G.: Complex interactions of in-stream DOM and nutrient spiralling unravelled by Bayesian regression analysis, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7804, https://doi.org/10.5194/egusphere-egu21-7804, 2021.