Hydroinformatics: computational intelligence, systems analysis, optimisation, data science, and innovative sensing techniques

Hydroinformatics has emerged over the last decades to become a recognised and established field of independent research within the hydrological sciences. Hydroinformatics is concerned with data acquisition, development and hydrological application of mathematical modelling, information technology, systems science and computational intelligence tools. We also have to face the challenges of the so-called Big Data: large data sets, both in size and complexity. Methods and technologies for data handling, visualization and knowledge acquisition are often referred to as Data Science.

The aim of this session is to provide an active forum in which to demonstrate and discuss the integration and appropriate application of emergent computational technologies in a hydrological modelling context. Topics of interest are expected to cover a broad spectrum of theoretical and practical activities that would be of interest to hydro-scientists and water-engineers. We aim to address the following classes of methods and technologies:

* Predictive and analytical models based on the methods of statistics, computational intelligence, machine learning : neural networks (including deep learning), fuzzy systems, genetic programming, cellular automata, chaos theory, etc.
* Innovative sensing techniques: satellites, gauges and citizens (crowdsourcing)
* Methods for the analysis of complex data sets, including remote sensing data: principal and independent component analysis, time series analysis, information theory, etc.
* Specific concepts and methods of Big Data and Data Science
* Optimisation methods associated with heuristic search procedures: various types of evolutionary algorithms, randomised and adaptive search, etc.
* Applications of systems analysis and optimisation in water resources
* Hybrid modelling involving different types of models both process-based and data-driven, combination of models (multi-models), etc.
* Data assimilation and model reduction in integrated modelling
* Novel methods of analysing model uncertainty and sensitivity
* Software architectures for linking different types of models and data sources

Applications could belong to any area of hydrology or water resources: rainfall-runoff modelling, flow forecasting, sedimentation modelling, analysis of meteorological and hydrologic data sets, linkages between numerical weather prediction and hydrologic models, model calibration, model uncertainty, optimisation of water resources, etc.

Co-organized by ESSI1/NH1
Convener: Dimitri Solomatine | Co-conveners: Ghada El Serafy, Amin Elshorbagy, Dawei Han, Thaine H. AssumpçãoECSECS, Fernando Nardi, Serena CeolaECSECS, Maurizio MazzoleniECSECS
vPICO presentations
| Fri, 30 Apr, 09:00–12:30 (CEST)

vPICO presentations: Fri, 30 Apr

Chairpersons: Dimitri Solomatine, Dawei Han
Machine learning
Daniel Klotz, Frederik Kratzert, Martin Gauch, Alden K. Sampson, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter, and Grey Nearing

Uncertainty is a central part of hydrological inquiry. Deep Learning provides us with new tools for estimating these inherent uncertainties. The currently best performing rainfall-runoff models are based on Long Short-Term Memory (LSTM) networks. However, most LSTM-based modelling studies focus on point estimates.

Building on the success of LSTMs for estimating point predictions, this contribution explores different extensions to directly provide uncertainty estimations. We find that the resulting models provide excellent estimates in our benchmark for daily rainfall-runoff across hundreds basins. We provide an intuitive overview of these strong results, the benchmarking procedure, and the approaches used for obtaining them.

In short, we extend the LSTMs in two ways to obtain uncertainty estimations. First, we parametrize LSTMs so that they directly provide uncertainty estimates in the form of mixture densities. This is possible because it is a general function approximation approach. It requires minimal a-priori knowledge of the sampling distribution and provides us with an estimation technique for the aleatoric uncertainty of the given setup.  Second, we use Monte Carlo Dropout to randomly mask out random connections of the network. This enforces an implicit approximation to a Gaussian Process and therefore provides us with a tool to estimate a form of epistemic uncertainty. In the benchmark the mixture density based approaches provide better estimates, especially the ones that use Asymmetric Laplacians as components.

How to cite: Klotz, D., Kratzert, F., Gauch, M., K. Sampson, A., Klambauer, G., Brandstetter, J., Hochreiter, S., and Nearing, G.: Uncertainty estimation with LSTM based rainfall-runoff models, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13308, https://doi.org/10.5194/egusphere-egu21-13308, 2021.

Lingling Ni, Dong Wang, Jianfeng Wu, and Yuankun Wang

With the increasing water requirements and weather extremes, effective planning and management for water issues has been of great concern over the past decades. Accurate and reliable streamflow forecasting is a critical step for water resources supply and prevention of natural disasters. In this study, we developed a hybrid model (namely GMM-XGBoost), coupling extreme gradient boosting (XGBoost) with Gaussian mixture model (GMM), for monthly streamflow forecasting. The proposed model is based on the principle of modular model, where a complex problem is divided into several simple ones. GMM was applied to cluster streamflow into several groups, using the features selected by a tree-based method. Then, each group was used to fit several single XGBoosts. And the prediction is a weighted average of the single models. Two streamflow datasets were used to evaluate the performance of the proposed model. The prediction accuracy of GMM-XGBoost was compared with that of support vector machine (SVM) and standalone XGBoost. The results indicated that although all three models yielded quite good performance on one-month ahead forecasting with high Nash-Sutclitte efficiency coefficient (NSE) and low root mean squared error (RMSE), GMM-XGBoost provided the best accuracy with significant improvement of forecasting accuracy. It can be inferred from the results that (1) XGBoost is applicable for streamflow forecasting, and in general, performs better than SVM; (2) the cluster analysis-based modular model is helpful in improving accuracy; (3) the proposed GMM-XGBoost model is a superior alternative, which can provide accurate and reliable predictions for optimal water resources management.

Note: This study has been published in Journal of Hydrology (Ni, L., Wang, D., Wu, J., Wang, Y., Tao, Y., Zhang, J. and Liu, J., 2020. Streamflow forecasting using extreme gradient boosting model coupled with Gaussian mixture model. Journal of Hydrology, 586.).

How to cite: Ni, L., Wang, D., Wu, J., and Wang, Y.: A hybrid model coupling extreme gradient boosting model with Gaussian mixture model for streamflow forecasting, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3590, https://doi.org/10.5194/egusphere-egu21-3590, 2021.

Tanja Morgenstern, Sofie Pahner, Robert Mietrach, and Niels Schütze

Long short-term memory (LSTM) networks are able to learn and replicate the relationships of multiple climate and hydrological temporal variables, and therefore are theoretically suitable for data driven modelling and forecasting of rainfall-runoff behavior. However, they inherit some prediction errors occasionally found in data-driven models: phase shift errors, oscillations and total failures. The phase shift error is a particularly significant challenge due to its occurrence when using hourly precipitation and runoff data for catchments with short response times.

In order to detect and eliminate these errors, we investigated four approaches, of which the first two are of structural nature, while the last two modify the input time series by certain transformations:
1. The use of encoder-decoder architectures for LSTM networks.
2. Offsetting the start of the flood forecast to the forecast time step of interest.
3. The inversion of the input time series.
4. Including subsequently observed precipitation data as a “best precipitation forecast”.

We tested the four approaches on five different pilot catchments located in Saxony, Germany with relatively short response times. The results show no advantage of the structural approaches. In contrast, the modification of the input time series shows potential for improving the predictive quality of flood forecasting in a potential operational application.

How to cite: Morgenstern, T., Pahner, S., Mietrach, R., and Schütze, N.: Flood forecasting in small catchments using deep learning LSTM networks, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15072, https://doi.org/10.5194/egusphere-egu21-15072, 2021.

Uwe Ehret

In this contribution, I will suggest an approach to build models as ordered and connected collections of multivariate, discrete probability distributions (dpd's). This approach can be seen as a Machine-Learning (ML) approach as it allows very flexible learning from data (almost) without prior constraints. Models can be built on dpd's only (fully data-based model), but they can also be included into existing process-based models at places where relations among data are not well-known (hybrid model). This provides flexibility for learning similar to including other ML approaches - e.g. Neural Networks - into process-based models, with the advantage that the dpd's can be investigated and interpreted by the modeler as long as their dimensionality remains low. Models based on dpd's are fundamentally probabilistic, and model responses for out-of-sample situations can be assured by dynamically coarse-graining the dpd's: The farther a predictive situation is from the learning situations, the coarser/more uncertain the prediction will be, and vice versa.

I will present the main elements and steps of such dpd-based modeling at the example of several systems, ranging from simple deterministic (ideal spring) to complex (hydrological system), and will discuss the influence of i) the size of the available training data set, ii) choice of the dpd priors, and iii) binning choices on the models' predictive power.

How to cite: Ehret, U.: Hybrid modeling using multivariate, discrete probability distributions , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2866, https://doi.org/10.5194/egusphere-egu21-2866, 2021.

Han Li, Han Chen, Jinhui Jeanne Huang, Edward McBean, Jiawei Zhang, Junjie Gao, and Zhingqing Lan

Prediction of vegetation transpiration (T) is of increasing importance in water resources management and agricultural practices, in particular to facilitate precision irrigation. Traditional evapotranspiration (ET) partitioning dual source modeling requires an extensive array of ground-level parameters and needs model correction and calibration to attain model certainty. In response, a quick and low-cost method is described to predict T using artificial intelligence (AI) modeling based on meteorological factors, status of crop growth factors and soil parameters. This study compares Random Forest (RF) and Support Vector Regression (SVR) in building AI models using three years (2014–2017) of continuous high-resolution monitoring data in a cabbage farmland.Input data included air temperature (Ta), solar radiation (Ra), relative humidity (RH), vapor pressure deficit(VPD), wind speed (Ws), soil moisture (SM), vegetation height (H), and leaf area index (LAI). The results show that soil surface resistance calculations by Monte Carlo iterative method and vegetation stomatal resistance calculations and carbon dioxide concentration and emission, improve performance of the original Shuttleworth–Wallace(S-W) model. In addition, the AI model indicates Ta and Ra are essential inputs for both model types. When there are sufficient observation data, or only lacking soil and vegetation data, the RF model is recommended for use. When there are only limited data or lack of critical Ta and Ra data, the SVR model is the preferred model. Scientific guidance is provided for agriculture precision irrigation, indicating which AI model can best estimate T and water demand for irrigation planning and water management.

How to cite: Li, H., Chen, H., Huang, J. J., McBean, E., Zhang, J., Gao, J., and Lan, Z.: Partitioning of daily evapotranspiration using a modified shuttleworthwallace model, random Forest and support vector regression, for a cabbage farmland, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4024, https://doi.org/10.5194/egusphere-egu21-4024, 2021.

Raphael Schneider, Hans Jørgen Henriksen, Julian Koch, Lars Troldborg, and Simon Stisen

The DK-model (https://vandmodel.dk/in-english) is a national water resource model, covering all of Denmark. Its core is a distributed, integrated surface-subsurface hydrological model in 500m horizontal resolution. With recent efforts, a version at a higher resolution of 100m was created. The higher resolution was, amongst others, desired by end-users and to better represent surface and surface-near phenomena such as the location of the uppermost groundwater table. Being presently located close to the surface across substantial parts of the country and partly expected to rise, the groundwater table and its future development due to climate change is of great interest. A rising groundwater table is associated with potential risks for infrastructure, agriculture and ecosystems. However, the 25-fold jump in resolution of the hydrological model also increases the computational effort. Hence, it was deemed unfeasible to run the 100m resolution hydrological model nation-wide with an ensemble of climate models to evaluate climate change impact. The full ensemble run could only be performed with the 500m version of the model. To still produce the desired outputs at 100m resolution, a downscaling method was applied as described in the following.

Five selected subcatchment models covering around 9% of Denmark were run with five selected climate models at 100m resolution (using less than 3% of the computational time for hydrological models compared to a national, full ensemble run at 100m). Using the simulated changes at 100m resolution from those models as training data, combined with a set of covariates including the simulated changes in 500m resolution, Random Forest (RF) algorithms were trained to downscale simulated changes from 500m to 100m.

Generalizing the trained RF algorithms, Denmark-wide maps of expected climate change induced changes to the shallow groundwater table at 100m resolution were modelled. To verify the downscaling results, amongst others, the RF algorithms were successfully validated against results from a sixth hydrological subcatchment model at 100m resolution not used in training the algorithms.

The experience gained also opens for various other applications of similar algorithms where computational limitations inhibit running distributed hydrological models at fine resolutions: The results suggest the potential to downscale other model outputs that are desired at fine resolutions.

How to cite: Schneider, R., Henriksen, H. J., Koch, J., Troldborg, L., and Stisen, S.: Using machine learning to downscale simulations of climate change induced changes to the shallow groundwater table, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7170, https://doi.org/10.5194/egusphere-egu21-7170, 2021.

Juan F. Farfán-Durán and Luis Cea

In recent years, the application of model ensembles has received increasing attention in the hydrological modelling community due to the interesting results reported in several studies carried out in different parts of the world. The main idea of these approaches is to combine the results of the same hydrological model or a number of different hydrological models in order to obtain more robust, better-fitting models, reducing at the same time the uncertainty in the predictions. The techniques for combining models range from simple approaches such as averaging different simulations, to more complex techniques such as least squares, genetic algorithms and more recently artificial intelligence techniques such as Artificial Neural Networks (ANN).

Despite the good results that model ensembles are able to provide, the models selected to build the ensemble have a direct influence on the results. Contrary to intuition, it has been reported that the best fitting single models do not necessarily produce the best ensemble. Instead, better results can be obtained with ensembles that incorporate models with moderate goodness of fit. This implies that the selection of the single models might have a random component in order to maximize the results that ensemble approaches can provide.

The present study is carried out using hydrological data on an hourly scale between 2008 and 2015 corresponding to the Mandeo basin, located in the Northwest of Spain. In order to obtain 1000 single models, a hydrological model was run using 1000 sets of parameters sampled randomly in their feasible space. Then, we have classified the models in 3 groups with the following characteristics: 1) The 25 single models with highest Nash-Sutcliffe coefficient, 2) The 25 single models with the highest Pearson coefficient, and 3) The complete group of 1000 single models.

The ensemble models are built with 5 models as the input of an ANN and the observed series as the output. Then, we applied the Random-Restart Hill-Climbing (RRHC) algorithm choosing 5 random models in each iteration to re-train the ANN in order to identify a better ensemble. The algorithm is applied to build 50 ensembles in each group of models. Finally, the results are compared to those obtained by optimizing the model using a gradient-based method by means of the following goodness-of-fit measures: Nash-Sutcliffe (NSE) coefficient, adapted for high flows Nash-Sutcliffe (HF−NSE), adapted for low flows Nash-Sutcliffe (LF−W NSE) and coefficient of determination (R2).

The results show that the RRHC algorithm can identify adequate ensembles. The ensembles built using the group of models selected based on the NSE outperformed the model optimized by the gradient method in 64 % of the cases in at least 3 of 4 coefficients, both in the calibration and validation stages. Followed by the ensembles built with the group of models selected based on the Pearson coefficient with 56 %. In the case of the third group, no ensembles were identified that outperformed the gradient-based method. However, the most part of the ensembles outperformed the 1000 individual models.

Keywords: Multi-model ensemble; Single-model ensemble; Artificial Neural Networks; Hydrological Model; Random-restart Hill-climbing


How to cite: Farfán-Durán, J. F. and Cea, L.: Building hydrological single-model ensembles using artificial neural networks and a combinatorial optimization approach, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8256, https://doi.org/10.5194/egusphere-egu21-8256, 2021.

Pascal Horton and Olivia Martius

Analog methods (AMs) are statistical downscaling methods often used for precipitation prediction in different contexts, such as operational forecasting, past climate reconstruction of climate change impact studies. It usually relies on predictors describing the atmospheric circulation and the moisture content of the atmosphere to sample similar meteorological situations in the past and establish a probabilistic forecast for a target date. AMs can be based on outputs from numerical weather prediction models in the context of operational forecasting or outputs from climate models in climatic applications.

AMs can be constituted of multiple predictors organized in different subsequent levels of analogy that refines the selection of similar situations. The development of such methods is usually a manual process where some predictors are assessed in different structures. As most AMs use multiple predictors, a comprehensive assessment of all combinations becomes quickly impossible. The selection of predictors in the application of the AM often builds on previous work and does not evolve much. However, the climate models providing the predictors evolve continuously and new variables might become relevant to be considered in AMs. Moreover, the best predictors might change from one region to another or for another predictand of interest. There is a need for a method to automatically explore potential variables for AMs and to extract the ones that are relevant for a predictand of interest.

We propose using genetic algorithms (GAs) to proceed to an automatic selection of the predictor variables along with all other parameters of the AM. We even let the GAs automatically pick the best analogy criteria, i.e. the metric that quantifies the analogy between two situations. The first test consisted of letting the GAs select the single best variable to predict daily precipitation for each of 25 selected catchments in Switzerland. The results showed great consistency in terms of spatial patterns and the underlying meteorological processes. Then, different structures were assessed by varying the number of levels of analogy and the number of variables per level. Finally, multiple optimizations were conducted on the 25 catchments to identify the 12 variables that provide the best prediction when considered together.

How to cite: Horton, P. and Martius, O.: Automatic input variable selection for analog methods using genetic algorithms, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9375, https://doi.org/10.5194/egusphere-egu21-9375, 2021.

Laura Viviana Garzon Useche, Karel Aldrin Sánchez Hernández, Gerald Augusto Corzo Pérez, and German Ricardo Santos Granados

The importance of knowing and representing rural and urban development in water management is vital for its sustainability.  An essential part of the management required that stakeholders are more aware of the consequences of decisions and in some way, can link decisions towards sustainability.  For this, a mobile app serious game called Water Citizens has been proposed as knowledge dissemination and to provide a better understanding of the way decisions affect Sustainable Development Goals (SDGs). A complex model of a pilot region (Combeima in Ibague, Colombia) has been developed, and the model results are few into equations to estimate fluctuations of SDGs in the region. Running this complex model in real-time, for a mobile application, requires an extensive high-performance computing system linked to large and complex network setup. To solve this problem, a fast yet accurate surrogate model is proposed.

Therefore, this study contemplates an analysis of methods to forecast sustainable development indicators evaluated through climate change scenarios for a period between 1989-2039. The proposed scenarios associated the public health, livestock, agriculture, engineering, education and environment sectors with climate variables, climate change projections, land cover and land use, water demands (domestic, agricultural and livestock) and water quality (BOD and TSS). Generating the possibility that each player can make decisions that represent the actions that affect or contribute to the demand, availability and quality of water in the region.

Consequently, a set of indicators were selected to recreate the dimensions of each sector and reflect its relationship with the Sustainable Development Objectives, as opposed to the decisions made by each player. In addition, three categories were considered for the levels of sustainability: low (0.0 - 0.33), medium (0.34 - 0.66) and high (0.67 - 1.0) for the calculated SDG values. 

Self-learning techniques have been employed in the analysis of decision-making problems. In this study, the nearest K neighbours (k-NN) and a multilayer perceptron network (MLP) were used. Through an analysis based on the responses of the players and sustainability indexes, a multiple correlation analysis was developed in order to consolidate the learning dataset, which was randomly partitioned in proportions 0.7 and 0.3 for the training and test subsets respectively. Subsequently, the model fit and performance was carried out, analysing the MSE error metric and confusion matrix.

Finally, the results of this study will allow to determine the potential of supervised learning models as a decision-making tool for the evaluation of sustainable development, as well as to obtain a better abstraction and representation of the water resource to the challenges related to climate adaptation and water sustainability measures of citizen action, besides generating new approaches for the use of artificial intelligence in land use planning and climate adaptation processes.

How to cite: Garzon Useche, L. V., Sánchez Hernández, K. A., Corzo Pérez, G. A., and Santos Granados, G. R.: A Machine Learning Sustainable Development Goals Model for Water Resources Serious Gaming. Case Study:  Combeima River, Colombia., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13929, https://doi.org/10.5194/egusphere-egu21-13929, 2021.

Camilo Andres Gonzalez Ayala, Santiago Duarte Prieto, Ana Escalera, Gerald Corzo Perez, Hector Angarita, and German Santos Granados

The socio-economic development of a country depends mainly on adequate integrated water resources management (IWRM). Sectors such as mining and agriculture are two main economic activities in Bolivia, that negatively impact the water resource quality and availability. Also, every year, floods and droughts hit the most vulnerable populations in different regions of Bolivia. Floods represent the greatest hydroclimatological risk factor in the country along with landslides caused by heavy precipitation. Along with these challenges in the country, there is also inefficient water treatment for water supply which can lead to other problems like diseases. Nowadays, the media such as newspapers, television, radio, report on these problems, in terms of water resources, which are experienced year after year in the country. Furthermore, due to advances in technology, this information can be found digitally. In the same way, people have made use of social networks, such as twitter, to express their opinion on a specific topic. The type of information found both in the media and in social networks is called qualitative information.

This digital information will be extracted using web crawling and web scrapping techniques that allow the process to be automated. This process is performed by applying keywords in the context of water resources in Bolivia, such as names of different water bodies in a basin. Once the information has been extracted, it will be transformed into a quantitative form, in such a way that it is useful for planning and decision-making processes of IWRM in Bolivia.

The purpose of this research is focused on the application of Natural Language Processing in the digital information found for three hydrological basins located in Bolivia, in order to recognize how Bolivian society relates the management of water resources. These hydrological basins are La Paz - Choqueyapu, Tupiza and Pampa – Huari. Initially, the digital information that will be studied in this research consists of three Bolivian newspapers and the information found on Twitter. The application of a sentiment analysis classification model in Python language programming is developed. In order to preserve the semantic information and the different words in the text, Word2Vec model will be used. The extracted digital information is pre-processed, eliminating empty words that do not add sentiments to a text and punctuation marks. Once the information is pre-processed, it is divided into two types, training and testing. The training data will be used to train the Word2Vec model. The result of the model consists of a value that determines the positive, neutral or negative sentiment of the text. Once the model is trained, the testing data that has not been used will be applied in order to evaluate the performance of the model.

This research helps to identify key elements, actors, frequent words related to IWRM, factors related to river health and improve the concept of citizen science. The results are mapped by geolocation, as a frequency distribution considering the digital perception (sentiment analysis) found and the frequency in which a topic is mentioned in the analysed digital information.

How to cite: Gonzalez Ayala, C. A., Duarte Prieto, S., Escalera, A., Corzo Perez, G., Angarita, H., and Santos Granados, G.: Natural Language Processing In Integrated Water Resources Management. Case Study: Three Bolivian River Basins, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13198, https://doi.org/10.5194/egusphere-egu21-13198, 2021.

Santiago Duarte, Gerald Corzo, and Germán Santos

Bogotá’s River Basin, it’s an important basin in Cundinamarca, Colombia’s central region. Due to the complexity of the dynamical climatic system in tropical regions, can be difficult to predict and use the information of GCMs at the basin scale. This region is especially influenced by ENSO and non-linear climatic oscillation phenomena. Furthermore, considering that climatic processes are essentially non-linear and possibly chaotic, it may reduce the effectiveness of downscaling techniques in this region. 

In this study, we try to apply chaotic downscaling to see if we could identify synchronicity that will allow us to better predict. It was possible to identify clearly the best time aggregation that can capture at the best the maximum relations between the variables at different spatial scales. Aside this research proposes a new combination of multiple attractors. Few analyses have been made to evaluate the existence of synchronicity between two or more attractors. And less analysis has considered the chaotic behaviour in attractors derived from climatic time series at different spatial scales. 

Thus, we evaluate general synchronization between multiple attractors of various climate time series. The Mutual False Nearest Neighbours parameter (MFNN) is used to test the “Synchronicity Level” (existence of any type of synchronization) between two different attractors. Two climatic variables were selected for the analysis: Precipitation and Temperature. Likewise, two information sources are used: At the basin scale, local climatic-gauge stations with daily data and at global scale, the output of the MPI-ESM-MR model with a spatial resolution of 1.875°x1.875° for both climatic variables (1850-2005). In the downscaling process, two RCP (Representative Concentration Pathways)  scenarios are used, RCP 4.5 and RCP 8.5.

For the attractor’s reconstruction, the time-delay is obtained through the  Autocorrelation and the Mutual Information functions. The False Nearest Neighbors method (FNN) allowed finding the embedding dimension to unfold the attractor. This information was used to identify deterministic chaos at different times (e.g. 1, 2, 3 and 5 days) and spatial scales using the Lyapunov exponents. These results were used to test the synchronicity between the various chaotic attractor’s sets using the MFNN method and time-delay relations. An optimization function was used to find the attractor’s distance relation that increases the synchronicity between the attractors.  These results provided the potential of synchronicity in chaotic attractors to improve rainfall and temperature downscaling results at aggregated daily-time steps. Knowledge of loss information related to multiple reconstructed attractors can provide a better construction of downscaling models. This is new information for the downscaling process. Furthermore, synchronicity can improve the selection of neighbours for nearest-neighbours methods looking at the behaviour of synchronized attractors. This analysis can also allow the classification of unique patterns and relationships between climatic variables at different temporal and spatial scales.

How to cite: Duarte, S., Corzo, G., and Santos, G.: Identification of Synchronicity in Deterministic Chaotic Attractors for the Downscaling Process in the Bogotá River Basin, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14752, https://doi.org/10.5194/egusphere-egu21-14752, 2021.

Yiannis Kontos, Theodosios Kassandros, Konstantinos Katsifarakis, and Kostas Karatzas

Groundwater pollution numerical simulations coupled with Genetic Algorithms (GAs) lead to vast computational load, while flow fields’ simplification can compensate in design, but not real-time/operational, applications. Various Machine Learning/Deep Learning (ML/DL) methods/problem-formulations were tested/evaluated for real-time inverse problems of aquifer pollution source identification. Aim: investigate data-driven approaches towards replacing flow simulation with ML/DL trained models identifying the source, faster but efficiently enough.

Steady flow in a 1500mx1500m theoretical confined, isotropic aquifer of known characteristics is studied. Two pumping wells (PWs) near the southern boundary provide irrigation/drinking water, defining the flow together with a varying North-South natural flow. Six suspected possible sources, capable of instantaneous leakage, may spread a conservative pollutant. Particle tracking simulates advective mass transport, in a 2D flow-field for 2500 1-day timesteps. The 14x14 inner field grid nodes serve as locations of sources, PWs and monitoring wells (MWs; for simple daily yes/no pollution detection and/or drawdown measuring). 15,246 combinations of 6 Source Nrs, 21 N-S hydraulic gradients, 11+11 PW1,2 flow-rates were simulated with existing own software, providing the necessary data-sets for ML training/evaluation.

Two basic ML/DL approaches were implemented: Classification (CL) and Computer Vision (CV). In CL, every source is a discrete class, while each MW is a discrete variable. The target variable Y can equal 1 to 6, while input variables X can be: a) 0/1 (MWi polluted or not), b) the first day of MWi’s pollution, c) the duration of MWi’s pollution, d) hydraulic drawdown of MWi. For a bit more realism, the two southern rows of 28 MWs, and the MWs on/around PWs are concealed. CL features the advantage of facilitating Correlation-based Feature Subset Selection (CFSS), indirectly leading to a pseudo-optimization of the monitoring network, minimizing the number of MWs (not the sampling frequency though), based solely on the efficiency in identifying the source criterion. As a downside, time dimension and spatial correlation of MWs are not considered. Approach (b) being the best scheme, Random Forests (RFs; 86.5576% accuracy), Multi-Layer Perceptron (MLP; 77.5%), and Nearest Neighbors (NN; 86.5%) were tested. CFSS led to 8 only MWs being important, so training with the optimal subsets gave promising results: RF=85.4%, MLP=73.1%, NN=85.4%. In CV, MWis’ pollution input data on a 10-day basis (0-60, 800-on concealed) were formulated into 14x14-pixel black/white images, that is 14x14 binary (0,1) matrices, the t=0 image being the desideratum. A Convolutional Neural Network (CNN; U-Net architecture for image segmentation) achieved 97.1% accuracy. A Convolutional Long/Short-Term Memory Neural Network (CLSTM), training a model to back-propagate predicting each given time step, with unchanged data formulation (60-800d, step 10), exhibits 82.3% accuracy. CLSTM’s performance is timestep-sensitive, best results yielded (98% accuracy) using configuration 5-800d, step 6.

Concluding, CL’s CFSS minimizes the input space, while CV approaches yield more promising results in terms of accuracy. Each approach has certain constraints in operational applicability, concerning the number of MWs, the sampling resolution and the total elapsed time. This process paves the way for realistic inverse problem solutions, ML-GAs monitoring network optimization, and real-time pollution detection operational systems. 

How to cite: Kontos, Y., Kassandros, T., Katsifarakis, K., and Karatzas, K.: Groundwater pollution monitoring and the inverse problem of source identification. Evaluation of various Machine Learning methods, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16388, https://doi.org/10.5194/egusphere-egu21-16388, 2021.

Vasileios Kourakos, Andreas Efstratiadis, and Ioannis Tsoukalas

Hydrological calibrations with historical data are often deemed insufficient for deducing safe estimations about a model structure that imitates, as closely as possible, the anticipated catchment behaviour. Ιn order to address this issue, we investigate a promising strategy, using as drivers synthetic time series, which preserve the probabilistic properties and dependence structure of the observed data. The key idea is calibrating a model on the basis of synthetic rainfall-runoff data, and validating against the full observed data sample. To this aim, we employed a proof of concept on few representative catchments, by testing several lumped conceptual hydrological models with alternative parameterizations and across two time-scales, monthly and daily. Next, we attempted to reinforce the validity of the recommended methodology by employing monthly stochastic calibrations in 100 MOPEX catchments. As before, a number of different hydrological models were used, for the purpose of proving that calibration with stochastic inputs is independent of the chosen model. The results highlight that in most cases the new approach leads to stronger parameter identifiability and stable predictive capacity across different temporal windows, since the model is trained over much extended hydroclimatic conditions.

How to cite: Kourakos, V., Efstratiadis, A., and Tsoukalas, I.: Can hydrological model identifiability be improved? Stress-testing the concept of stochastic calibration, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-11704, https://doi.org/10.5194/egusphere-egu21-11704, 2021.

Mohammad Sina Jahangir and John Quilty

Hydrological forecasts at different horizons are often made using different models. These forecasts are usually temporally inconsistent (e.g., monthly forecasts may not sum to yearly forecasts), which may lead to misaligned or conflicting decisions. Temporal hierarchal reconciliation (or simply, hierarchical reconciliation) methods can be used for obtaining consistent forecasts at different horizons. However, their effectiveness in the field of hydrology has not yet been investigated. Thus, this research assesses hierarchal reconciliation for precipitation forecasting due to its high importance in hydrological applications (e.g., reservoir operations, irrigation, drought and flood forecasting). Original precipitation forecasts (ORF) were produced using three different models, including ‘automatic’ Exponential Time-Series Smoothing (ETS), Artificial Neural Networks (ANN), and Seasonal Auto-Regressive Integrated Moving Average (SARIMA). The forecasts were produced at six timescales, namely, monthly, 2-monthly, quarterly, 4-monthly, bi-annual, and annual, for 84 basins selected from the Canadian model parameter experiment (CANOPEX) dataset. Hierarchical reconciliation methods including Hierarchical Least Squares (HLS), Weighted Least Squares (WLS), and Ordinary Least Squares (OLS) along with the Bottom-Up (BU) method were applied to obtain consistent forecasts at all timescales.

Generally, ETS and ANN showed the best and worst performance, respectively, according to a wide range of performance metrics (root mean square error (RMSE), normalized RMSE (nRMSE), mean absolute error (MAE), normalized MAE (nMAE), and Nash-Sutcliffe Efficiency index (NSE)). The results indicated that hierarchal reconciliation has a dissimilar impact on the ORFs’ accuracy in different basins and timescales, improving the RMSE in some cases while decreasing it in others. Also, it was highlighted that for different forecast models, hierarchical reconciliation methods showed different levels of performance. According to the RMSE and MAE, the BU method outperformed the hierarchical methods for ETS forecasts, while for ANN and SARIMA forecasts, HLS and OLS improved the forecasts more substantially, respectively. The sensitivity of ORF to hierarchical reconciliation was assessed using the RMSE. It was shown that both accurate and inaccurate ORF could be improved through hierarchical reconciliation; in particular, the effectiveness of hierarchical reconciliation appears to be more dependent on the ORF accuracy than it is on the type of hierarchical reconciliation method.

While in the present work, the effectiveness of hierarchical reconciliation for hydrological forecasting was assessed via data-driven models, the methodology can easily be extended to process-based or hybrid (process-based data-driven) models. Further, since hydrological forecasts at different timescales may have different levels of importance to water resources managers and/or policymakers, hierarchical reconciliation can be used to weight the different timescales according to the user’s preference/desired goals.

How to cite: Jahangir, M. S. and Quilty, J.: Improving hydrological forecasts through temporal hierarchal reconciliation, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13303, https://doi.org/10.5194/egusphere-egu21-13303, 2021.

Seong Jin Noh, Hyeonjin Choi, and Bomi Kim

We present an approach to combine two data-centric approaches, data assimilation (DA) and deep learning (DL), from the perspective of hydrologic forecasting. DA is a statistical approach based on Bayesian filtering to produce optimal states and/or parameters of a dynamic model using observations. By extracting information from both model and observational data, DA improves not only the performance of numerical modeling, but also understanding of uncertainties in predictions. While DA complements information gaps in model and observational data, DL constructs a new modeling system by extracting and abstracting information solely from data without relying on the conventional knowledge of hydrologic systems. In a new approach, an ensemble of deep learning models can be updated by real-time data assimilation when a new observation becomes available. In the presentation, we will focus on discussing the potentials of combining two data-centric approaches.


How to cite: Noh, S. J., Choi, H., and Kim, B.: An ensemble of deep learning models with data assimilation for hydrologic forecasting, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16249, https://doi.org/10.5194/egusphere-egu21-16249, 2021.

Yu Li, Jinhui Jeanne Huang, and Ran Yan

Leakage in the water supply system is a world problem that happens everywhere, not only in China but also in Japan, the US, and Europe. It not only results in the waste of water resources but also raises safety issues in drinking water. The traditional solution is the Minimum Night Flow method with manual leak detectors. This solution could only find leakage at night. The engineers have to search the leaking point randomly using leak detectors. It not only highly relies on domain knowledge and expertise but is also labor-consuming. The response time is quite long, might be a couple of days to several days. Here, time series analysis based on a dynamic time warping algorithm is used to detect anomalies in time series of pressure stations and flow stations, and the risk coefficient of each pipe network is determined by using a neural network combined with existing data. The water treatment plants don't even have to install new sensors if the budget is limited.

How to cite: Li, Y., Huang, J. J., and Yan, R.: Leakage detection in water pipe networks using machine learning, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9345, https://doi.org/10.5194/egusphere-egu21-9345, 2021.

Ran Yan, Yu Li, and Jinhui Jeanne Huang

During January 2016 and December 2020, eastern and southern China including Shanghai experienced a rapid drop in temperatures along with snow. This cold wave which also had a severe impact on water distribution networks. Leakage of pipe network causes serious economic loss and waste of water resources. Nonetheless, cold wave is not the only factor affecting leakage from a pipe network. There are also other factors including the burial depth of pipes, the materials of pipes, the diameters of pipes, break history and so on. In this work, we use machine learning method and Bayesian distribution regression to explore the relationship between pipe leaks and impact factors. Based on results, risk maps of water distribution networks are generated. This research indicated that which risk factors is important for leakage detection and water loss management of urban water supply network, which can be promising for wide practical applications due to rapid expansion of data.

How to cite: Yan, R., Li, Y., and Huang, J. J.: A novel method to diagnose factors influencing in leakage in water distribution systems including extreme weather events, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8156, https://doi.org/10.5194/egusphere-egu21-8156, 2021.

Big data and spatio-temporal data analysis
Giulia Giani, Miguel Angel Rico-Ramirez, and Ross Woods

A widely accepted objective methodology to select individual rainfall-streamflow events is missing and this makes it difficult to synthesize findings from independent research initiatives. In fact, the selection of individual events is a fundamental step in many hydrological studies, but the importance and impact of the choices made at this stage are largely unrecognised.

The event selection methods found in the literature start by looking at either the rainfall timeseries or the streamflow timeseries. Moreover, most of the methodologies involve hydrograph separation, which is a highly uncertain step and can be performed using many different algorithms. Further increasing the subjectivity of the procedure, a wide range of ad hoc conditions are usually applied (e.g. peak-over-threshold, minimum duration of rainfall event, minimum duration of dry spell, minimum rainfall intensity…).

For these reasons, we present a new methodology to extract rainfall-streamflow events which minimizes the conceptual hypotheses and user’s choices, and bases the identification of the events mainly on the joint fluctuations of the two signals. The proposed methodology builds upon a timeseries analysis technique to estimate catchment response time, the Detrending Moving-average Cross-correlation Analysis-based method.

The proposed method has the advantage of looking simultaneously at the evolution in time of rainfall and streamflow timeseries, providing a more systemic detection of events. Moreover, the presented method can easily be adapted to extract events at different time resolutions (provided the resolution is fine enough to capture the delay between the rainfall and streamflow responses).

Properties of the events extracted with the proposed method are compared with the ones of the events extracted with the most traditional approach (based on hydrograph separation) to show strengths and weaknesses of the two techniques and suggest in which situations the proposed method can be most useful.

How to cite: Giani, G., Rico-Ramirez, M. A., and Woods, R.: An innovative, timeseries-analysis-based method to extract rainfall-streamflow events from continuous timeseries, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4619, https://doi.org/10.5194/egusphere-egu21-4619, 2021.

Maria Kireeva, Timophey Samsonov, and Ekaterina Rets

River hydrograph analysis provides valuable information about temporal and spatial variability of the river discharge. One of the most imporant operations is separation of hydrograph, which aims at decomposing the total streamflow into components. Numerous approaches for hydrograph separation have been developed to date. Most of them traditionally separate the streamflow into general quickflow and baseflow components, but it is also possible to obtain more specific quickflow separation with subdivision into genetic components, such as seasonal snowmelt, rain, thaw etc. We present the general framework for river hydrograph analysis and separation provided by newly released GrWat package, which has been developed during several years. The framework includes a simple tabular data model for representation of hydrograph and climatic (temperature and precipitation) daily data needed for separation of the quickflow into genetic components; spatial analysis operations for automatic extraction of climatic data from reanalysis datasets covering the river basin; automated interpolation of missing data considering the autocorrelation; fast implementation of multiple algorithms for hydrograph separation; computation of more than 30 interannual and long-term characteristics of separated hydrograph components; scale-space transformation for hierarchical decomposition of the hydrograph; high-quality plotting and reporting of the results of analysis. One of the prominent features of the framework is a powerful algorithm for genetic hydrograph separation, which is capable of not only extracting the baseflow, seasonal, thaw and rain flood components, but also to cut the short-time rain floods which complicate the shape of the seasonal flood. The baseflow separation is performed on the first stage and can be initialized by any of the baseflow separation algorithms available in the package. On the second stage the quickflow is separated into genetic components. Such modular structure provides the flexible way to experiment with different combinations of algorithms and to select the approach wich serves best to the goal of the analysis and specific features of the hydrograph.

The study was supported by the Russian Science Foundation grant No. 19-77-10032

How to cite: Kireeva, M., Samsonov, T., and Rets, E.: Comprehensive analysis and separation of river hydrograph using the GrWat R package, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14261, https://doi.org/10.5194/egusphere-egu21-14261, 2021.

Tatsuya Ishikawa, Takao Moriyama, Paolo Fraccaro, Anne Jones, and Blair Edwards
Floods have significant impact on social and economic activities, with flood frequency projected to increase in the future in many regions of the world due to climate change. Quantification of current and future flood risk at lead times of months to years are potentially of high value for planning activities in a wide range of humanitarian and business applications across multiple sectors. However, there are also many technical and methodological challenges in producing accurate, local predictions which also adequately quantify uncertainty. Multiple geospatial datasets are freely available to improve flood predictions, but their size and complexity mean they are difficult to store and combine. Generation of flood inundation risk maps requires the combination of several static geospatial data layers with potentially multiple simulation models and ensembles of climate inputs.
Here we present a geospatial climate impact modelling framework, which we apply to the challenge of flooding risk quantificationOur framework is modular, scalable cloud-based and allows for the easy deployment of different impact models and model components with a range of input datasets (different spatial and temporal scales) and model configurations.  
The framework allows us to use automated tools to carry out AI-enabled parameter calibration, model validation and uncertainty quantification/propagation, with the ability to quickly run the impact models for any location where the appropriate data is available.  We can additionally trial different sources of input data, pulling data from IBM PAIRS Geoscope and other sources, and we have done this with our pluvial flood models.
In this presentation, we provide pluvial flood risk assessments generated through our framework. We calibrate our flood models to accurately reproduce inundations derived from historical precipitation datasets, validated against flood maps obtained from corresponding satellite imagery, and quantify uncertainties for hydrological parameters. Probabilistic flood risk is generated through ensemble execution of such models, incorporating climate change and model parameter uncertainties.

How to cite: Ishikawa, T., Moriyama, T., Fraccaro, P., Jones, A., and Edwards, B.: A geospatial and temporal analytics framework for flood risk mapping, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14064, https://doi.org/10.5194/egusphere-egu21-14064, 2021.

Biswa Bhattacharya and Junaid Ahmad

Satellite based rainfall estimates (SBRE) are used as an alternative to gauge rainfall in hydrological studies particularly for basins with data issues. However, these data products exhibit errors which cannot always be corrected by bias correction methods such as Ratio Bias Correction (RBC). Data fusion or data merging can be a potentially good approach in merging various satellite rainfall products to obtain a fused dataset, which can benefit from all the data sources and may minimise the error in rainfall estimates. Data merging methods which are commonly applied in meteorology and hydrology are: Arithmetic merging method (AMM), Inverse error squared weighting (IESW) and Error variance (EV). Among these methods EV is popular, which merges can be used to merge bias corrected SBREs using the minimisation of variance principle.

In this research we investigated the possibility of using K nearest neighbour as a data merging method. Four satellite rainfall products were used in this study namely CMORPH, PERSIANN CDR, TRMM 3B42 and MSWEP. MSWEP was used as a reference dataset for comparing the merged rainfall dataset since it is also a merged product. All these products were downloaded at 0.25° x 0.25° spatial scale and daily temporal scale. Satellite products are known to behave differently at different temporal and spatial scales. Based on the climatic and physiographic features the Indus basin was divided into four zones. Rainfall products were compared at daily, weekly, fortnightly, monthly and seasonal scales whereas spatial scales were gauge location, zonal scales and basin scale. The RBC method was used to correct the biasness of satellite products by correcting the products at monthly and seasonal scale. Wth bias correction the daily normalised root mean square error (NRMSE) was reduced up to 20% for CMORPH, 22% for PERSIANN CDR and 14% for TRMM at the Indus basin scale for monthly scale which is why the monthly bias corrected data was used for merging. Merging of satellite products can be fruitful to benefit from the strength of each product and minimize the weakness of products. Four different merging methods i.e. Arithmetic merging method (AMM), Inverse error squared weighting (IESW), Error variance (EV) and K Nearest Neighbour method (KNN) were used and performance was checked in two seasons i.e. non-wet and wet season. AMM and EV methods performed similarly whereas IESW performed poorly at zonal scales. KNN merging method outperformed all other merging methods and gave lowest error across the basin. Daily NRMSE was reduced to 0.3 at Indus basin scale with KNN method, AMM and EV reduced the error to 0.45 in comparison to error produced by CMORPH, PERSIANN CDR and TRMM of 0.8, 0.65 and 0.53 respectively in the wet season. KNN merged product gave lowest error at daily scale in calibration and validation period which justifies that merging improves rainfall estimates in sparsely gauged basin.


Key words: Merging, data fusion, K nearest neighbour, KNN, error variance, Indus.

How to cite: Bhattacharya, B. and Ahmad, J.: K nearest neighbour in merging satellite rainfall estimates from diverse sources in sparsely gauged basins, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14650, https://doi.org/10.5194/egusphere-egu21-14650, 2021.

Jaku Rabinder Rakshit Pally and Vidya Samadi

Due to the importance of object detection in video analysis and image annotation, it is widely utilized in a number of computer vision tasks such as face recognition, autonomous vehicles, activity recognition, tracking objects and identity verification. Object detection does not only involve classification and identification of objects within images, but also involves localizing and tracing the objects by creating bounding boxes around the objects and labelling them with their respective prediction scores. Here, we leverage and discuss how connected vision systems can be used to embed cameras, image processing, Edge Artificial Intelligence (AI), and data connectivity capabilities for flood label detection. We favored the engineering definition of label detection that a label is a sequence of discrete measurable observations obtained using a capturing device such as web cameras, smart phone, etc. We built a Big Data service of around 1000 images (image annotation service) including the image geolocation information from various flooding events in the Carolinas (USA) with a total of eight different object categories. Our developed platform has several smart AI tools and task configurations that can detect objects’ edges or contours which can be manually adjusted with a threshold setting so as to best segment the image. The tool has the ability to train the dataset and predict the labels for large scale datasets which can be used as an object detector to drastically reduce the amount of time spent per object particularly for real-time image-based flood forecasting.  This research is funded by the US National Science Foundation (NSF).

How to cite: Pally, J. R. R. and Samadi, V.: Application of Image Processing and Big Data Science for Flood Label Detection, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3709, https://doi.org/10.5194/egusphere-egu21-3709, 2021.

Chairpersons: Thaine H. Assumpção, Maurizio Mazzoleni
Emanuele Ciancia, Alessandra Campanelli, Teodosio Lacava, Angelo Palombo, Simone Pascucci, Nicola Pergola, Stefano Pignatti, Valeria Satriano, and Valerio Tramutoli

The assessment of TSM spatiotemporal variability plays a key role in inland water management, considering how these fluctuations affect water transparency, light availability, and the physical, chemical, and biological processes. All the above-mentioned topics highlight the need to develop innovative methodologies of data analysis that are able to handle multi-mission and multi-source remote sensing data, fostering the implementation of integrated and sustainable approaches. Sentinel-2A multispectral instrument (MSI) and Landsat 8 operational land instrument (OLI) data offer unique opportunities for investigating certain in-water constituents (e.g., TSM and chlorophyll-a) mainly owing to their spatial resolution (10–60 m). Furthermore, the joint use of these sensors offers the opportunity to build time series with an improved revisiting time thus enabling limnologists, aquatic ecologists and water resource managers to enhance their monitoring efforts. In this framework, the potential of MSI–OLI combined data in characterizing the multi-temporal (2014–2018) TSM variability in Pertusillo Lake (Basilicata region, Southern Italy) has been evaluated in this work. In particular, a customized MSI-based TSM model (R2=0.81) has been developed and validated by using ground truth data acquired during specific measurement campaigns. The model was then exported on OLI data through an inter-calibration procedure (R2=0.87), allowing for the generation of a TSM multi-temporal MSI–OLI merged dataset. The analysis of the derived multi-year TSM monthly maps has shown the influence of hydrological factors on the TSM seasonal dynamics over two sub-regions of the lake, the west and east areas. The western side appears more affected by inflowing rivers and water level fluctuations, whose  effects  tend to longitudinally decrease, leading to less sediment within the eastern sub-area. The achieved results highlight how the proposed methodological approach (i.e. in situ data collection, satellite data processing and modeling) can be exported in other inland waters that deserve to be investigated for a better management of water quality and monitoring systems.

How to cite: Ciancia, E., Campanelli, A., Lacava, T., Palombo, A., Pascucci, S., Pergola, N., Pignatti, S., Satriano, V., and Tramutoli, V.: On the potential of multi-source remote sensing data in characterizing the Total Suspended Matter variability in inland waters., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14711, https://doi.org/10.5194/egusphere-egu21-14711, 2021.

Water systems modelling, management and optimization
Dionysios Nikolopoulos, Panagiotis Kossieris, and Christos Makropoulos

Urban water systems are designed with the goal of delivering their service for several decades.  The infrastructure will inevitably face long-term uncertainty in a multitude of parameters from the hydroclimatic and socioeconomic realms (e.g., climate change, limited supply of water in terms quantity and acceptable quality, population growth, shifting demand patterns, industrialization), as well as from the conceptual realm of the decision maker (e.g., changes in policy, system maintenance incentives, investment rate, expansion plans). Because urban water systems are overly complex, a holistic analysis involves the use of various models that individually pertain to a smaller sub-system and a variety of metrics to assess performance, whereas the analysis is accomplished at different temporal and spatial scales for each sub-system. In this work, we integrate a water resources management model with a water distribution model and a water demand generation model at smaller (household and district) scale, allowing us to simulate urban water systems “from source to tap”, covering the entire water cycle. We also couple a stochastic simulation module that supports the representation of uncertainty throughout the water cycle. The performance of the integrated system under long term uncertainty is assessed with the novel measure of system’s resilience i.e. the degree to which a water system continues to perform under progressively increasing disturbance. This evaluation is essentially a framework of systematic stress-testing, where the disturbance is described via stochastically changing parameters in an ensemble of scenarios that represent future world views. The framework is showcased through a synthesized case study of a medium-sized urban water system.


This research is carried out / funded in the context of the project “A resilience assessment framework for water supply infrastructure under long-term uncertainty: A Source-to-Tap methodology integrating state of the art computational tools” (MIS 5049174) under the call for proposals “Researchers' support with an emphasis on young researchers- 2nd Cycle”. The project is co-financed by Greece and the European Union (European Social Fund- ESF) by the Operational Programme Human Resources Development, Education and Lifelong Learning 2014-2020.”

How to cite: Nikolopoulos, D., Kossieris, P., and Makropoulos, C.: Stochastic stress-testing approach for assessing resilience of urban water systems from source to tap, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13284, https://doi.org/10.5194/egusphere-egu21-13284, 2021.

Antonio Candelieri, Riccardo Perego, Ilaria Giordani, and Francesco Archetti

Two approaches are possible in Pump Scheduling Optimization (PSO): explicit and implicit control. The first assumes that decision variables are pump statuses/speeds to be set up at prefixed time. Thus, the problem is to efficiently search among all the possible schedules (i.e., configurations of the decision variables) to optimize the objective function – typically minimization of the energy-related costs – while satisfying hydraulic feasibility. Since both the energy cost and the hydraulic feasibility are black-box, the problem is usually addressed through simulation-optimization, where every schedule is simulated on a “virtual twin” of the real-world water distribution network. A plethora of methods have been proposed such as meta-heuristics, evolutionary and nature-inspired algorithms. However, addressing PSO via explicit control can imply many decision variables for real-world water distribution networks, increasing with the number of pumps and time intervals for actuating the control, requiring a huge number of simulations to obtain a good schedule.

On the contrary, implicit control aims at controlling pump status/speeds depending on some control rules related, for instance, to pressure into the network: pump is activated if pressure (at specific locations) is lower than a minimum threshold, or it is deactivated if pressure exceeds a maximum threshold, otherwise, status/speed of the pump is not modified. These thresholds are the decision variables and their values – usually set heuristically – significantly affect the performance of the operations. Compared to explicit control, implicit control approaches allow to significantly reduce the number of decision variables, at the cost of making more complex the search space, due to the introduction of further constraints and conditions among decision variables. Another important advantage offered by implicit control is that the decision is not restricted to prefixed schedules, but it can be taken any time new data from SCADA arrive making them more suitable for on-line control.

The main contributions of this paper are to show that:

  • thresholds-based rules for implicit control can be learned through an active learning approaches, analogously to the one used to implement Automated Machine Learning;
  • the active learning framework is well-suited for the implicit control setting: the lower dimensionality of the search space, compared to explicit control, substantially improves computational efficiency;
  • hydraulic simulation model can be replaced by a Deep Neural Network (DNN): the working assumption, experimentally investigated, is that SCADA data can be used to train and accurate DNN predicting the relevant outputs (i.e., energy and hydraulic feasibility) avoiding costs for the design, development, validation and execution of a “virtual twin” of the real-world water distribution network.

The overall system has been tested on a real-world water distribution network.

How to cite: Candelieri, A., Perego, R., Giordani, I., and Archetti, F.: Active learning of optimal controls for pump scheduling optimization, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12610, https://doi.org/10.5194/egusphere-egu21-12610, 2021.

Andrea Ponti, Antonio Candelieri, and Francesco Archetti

The issue of vulnerability and robustness in networked systems can be addressed by several methods. The most widely used are based on a set of centrality and connectivity measures from network theory which basically relate vulnerability to the loss of efficiency caused by the removal of some nodes and edges. Another related view is given by the analysis of the spectra of the adjacency and Laplacian matrices of the graph associated to the networked system.

The main contribution of this paper is the introduction of a new set of vulnerability metrics given by the distance between the probability distribution of node-node distances between the original network and that resulting from the removal of nodes/edges. Two such probabilistic measures have been analysed: Jensen-Shannon (JS) divergence and Wasserstein (WST) distance, aka the Earth-Mover distance: this name comes from its informal interpretation as the minimum energy cost of moving and transforming a pile of dirt in the shape of one probability distribution to the shape of the other distribution. The cost is quantified by the amount of dirt moved times the moving distance. The Wasserstein distance can be traced back to the works of Gaspard Monge in 1761 and Lev Kantorovich in 1942. Wasserstein distances are generally well defined and provide an interpretable distance metric between distributions. Computing Wasserstein distances requires in general the solution of a constrained linear optimization problem which is, when the support of the probability distributions is multidimensional, very large.

An advantage of the Wasserstein distance is that, under quite general conditions, it is a differentiable function of the parameters of the distributions which makes possible its use to assess the sensitivity of the network robustness to distributional perturbations. The computational results related to two real-life water distribution networks confirm that the value of the distances JS and WST is strongly related to the criticality of the removed edges. Both are more discriminating, at least for water distribution networks, than efficiency-based and spectral measures. A general methodological scheme has been developed connecting different modelling and computational elements, concepts and analysis tools, to create an analysis framework suitable for analysing robustness. This modelling and algorithmic framework can also support the analysis of other networked infrastructures among which power grids, gas distribution and transit networks.

How to cite: Ponti, A., Candelieri, A., and Archetti, F.: Vulnerability and robustness of networked infrastructures: beyond typical graph-based measures, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12708, https://doi.org/10.5194/egusphere-egu21-12708, 2021.

José Pinho, Isabel Iglesias, Willian Melo, Ana Bio, Paulo Avilez-Valente, José Vieira, Luisa Bastos, and Fernando Veloso-Gomes

Spits are landforms that present a complex morphology, which depends on currents, waves, sediment transport, tidal range and anthropic-induced changes. Their position and shape is subject to extreme events like flood river discharges and storms. They can also respond to processes that take place at larger time scales, as plate tectonics, sea level rise or even climatological patterns with teleconnections all over the world, as the well know North Atlantic Oscillation (NAO) or El Niño-Southern Oscillation (ENSO). This is the case of the Douro river mouth sand spit located on the northern coast of Portugal. This naturally dynamic sand spit, which has moved landwards over the past decades, has caused frequent nuisance to navigation, affecting width and depth of the navigation channel. Therefore, a breakwater was constructed in an attempt to stabilise the sand spit and the estuary inlet.

Validated hydrodynamic numerical models (openTELEMAC-MASCARET and Delft3D) of the Douro river estuary have demonstrated ability to accurately describe the estuarine hydrodynamic patterns and water elevation under extreme flood conditions. Model results showed that for higher river flow discharges the sand spit is partially inundated.

In this work a morphodynamic model (Delft3D) of the estuary was implemented to assess both the morphodynamics of the sand spit under extreme events, including the effect of sea level rise due to climate change, and the variation of extreme water levels along the estuary due to spit erosional processes that can occur during flood events.

Preliminary results show that the sand spit will be locally eroded for the higher river flood discharges, forming a two-secondary-channels system, with one channel located near the breakwater’s southern extremity and the other, narrower, near the south bank. Associated with these two channels, two depositional bars will be formed in front of the channels at the coastal platform. However, the inner immersed sand spit will be suffering a sedimentation process for all of the simulated scenarios. This way, neither the river mouth discharge conditions nor the water levels inside the estuary will suffer significant changes according to the simulated scenarios.

These results will be complemented with further analyses considering the sediment size influence, tidal level, storm surge, sea level rise and river flood discharges.

Acknowledgements: To the Strategic Funding UIDB/04423/2020 and UIDP/04423/2020 (FCT and ERDF) and to the project EsCo-Ensembles (PTDC/ECI-EGC/30877/2017, NORTE 2020, Portugal 2020, ERDF and FCT). The authors also want to acknowledge the data provided by EDP and IH.

How to cite: Pinho, J., Iglesias, I., Melo, W., Bio, A., Avilez-Valente, P., Vieira, J., Bastos, L., and Veloso-Gomes, F.: Assessment of a Sand Spit Morphodynamics Under Extreme Flood Events, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5778, https://doi.org/10.5194/egusphere-egu21-5778, 2021.

Thibault Malou and Jérome Monnier

The spatial altimetry provides an important amount of water surface height data from multi-missions satellites (especially Jason-3, Sentinel-3A/B and the forthcoming NASA-CNES SWOT mission). To exploit at best the potential of spatial altimetry, the present study proposes on the derivation of a model adapted to spatial observations scale; a diffusive-wave type model but adapted to a double scale [1].

Moreover, Green-like kernel can be employed to derived covariance operators, therefore they may provide an approximation of the covariance kernel of the background error in Variational Data Assimilation processes. Following the derivation of the aforementioned original flow model, we present the derivation of a Green kernel which provides an approximation of the covariance kernel of the background error for the bathymetry (i.e. the control variable) [2].

This approximation of the covariance kernel is used to infer the bathymetry in the classical Saint-Venant’s (Shallow-Water) equations with better accuracy and faster convergence than if not introducing an adequate covariance operator [3].

Moreover, this Green kernel helps to analyze the sensitivity of the double-scale diffusive waves (or even the Saint-Venant’s equations) with respect to the bathymetry.

Numerical results are analyzed on real like datasets (derived from measurements of the Rio Negro, Amazonia basin).

The double-scale diffusive wave provide more accurate results than the classical version. Next, in terms of inversions, the derived physically-based covariance operators enable to improve the inferences, compared to the usual exponential one.

[1] T. Malou, J. Monnier "Double-scale diffusive wave equations dedicated to spatial river observations". In prep.

[2] T. Malou, J. Monnier "Physically-based covariance kernel for variational data assimilation in spatial hydrology". In prep.

[3] K. Larnier, J. Monnier, P.-A. Garambois, J. Verley. "River discharge and bathymetry estimations from SWOT altimetry measurements". Inv. Pb. Sc. Eng (2020).

How to cite: Malou, T. and Monnier, J.: Double-scale diffusive wave model dedicated to spatial river observation and associated covariance kernel for variational data assimilation , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10355, https://doi.org/10.5194/egusphere-egu21-10355, 2021.

Dandan Liu, Yiheng Chen, Jinhui Huang, and Xiaogang Shi

With the accelerating urbanization in developing countries, the threats of extreme rainfall and flood events are increasing. The impact of flood disasters severely threatens the safety of cities with a huge population. In order to quantitatively study the impact of urbanization development on urban floods, the hydrological characteristics of two adjacent basins will be analyzed and compared in this study -- Shenzhen, China with rapid urbanization in the past 40 years, and Hong Kong, China that was already urbanized.

The methods of this study mainly include the following two points. Firstly, in order to clarify the urbanization development process of the study regions, the geospatial database of surface impervious area of two adjacent basins from 1986 to 2018 was obtained in this study. In addition, this study intends to predict the impervious area of the study area in the future years through urban planning. The prediction method is based on adaptive cell deep learning analysis method. Secondly, in order to simulate the waterlogging situation in the two regions, this study intends to select specific flood events to establish and calibrate the SWMM model. By changing the impervious area of the two regions, hydrological parameters such as surface runoff and sensitivity under different scenarios can be obtained.

After model simulation, we will finally analyze the simulation results as follows: Firstly, the variation of runoff and flood peak with impervious area will be analyzed; Secondly, by comparing the simulation results of the two regions, the sensitivity of flood events to urbanization development will be evaluated; Finally, according to the predicted simulation results, the flood situation in the study area will be evaluated in the future years, which has a certain guiding significance for urban flood prevention.

How to cite: Liu, D., Chen, Y., Huang, J., and Shi, X.: Quantifying the Effects of Urbanization on Floods for two adjacent basins in Shenzhen and Hong Kong, China., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4904, https://doi.org/10.5194/egusphere-egu21-4904, 2021.

Jiawei Zhang, Han Chen, Jinhui Jeanne Huang, Edward McBean, Han Li, Zhiqing Lan, and Jun Jie Gao

Accurately estimate and map soil evaporation (E) and vegetation transpiration (T) in urban woodland areas is great significance for precision irrigation, urban water resource allocation and management. However, customized dual-source models based on satellite imagery are lacking. This research, for the first time, developed a dual source approach to predict E and T in urban garden area. The method is improved from MOD16 algorithm, we advanced the MOD16 in following aspects: 1, an enhanced net radiant flux (Rn) and soil heat flux (G) calculation method is proposed; 2, The determination of vegetation canopy impedance couples into the impact of carbon dioxide emissions; 3, A physical mechanism-based β estimation method is proposed, to replace the empirical values in original model. Our model was test in 40 cloudless days based on 10 m Sentinel-2 imagery in Guiwan Garden area in Shenzhen city, southern of China. The Shuttleworth-Wallace, FAO-dual-Kc and Priestley-Taylor model were used to evaluate model performance, results suggest the modified MOD16 model successful produce and partition ET in city garden area, and outperformed than pervious MOD16 algorithm. The spatial distribution pattern demonstrate that E and T present obvious seasonal changes, with the range of 23-150 W/m2 for E and 31-186 W/m2 for T, proven the large amount water was lost through ET in urban garden area. Sensitivity analysis results show that the improved MOD16 model is more sensitive to vegetation index products and solar radiation, need to prioritize the accurate input of these two types of parameters. The modified MOD16 model significantly facilitate the accuracy of ET simulation in high-resolution, small-scale areas, provides a powerful tool to quantify the E and T in urban areas, and assessing the impact of climate change on the urban hydrological cycle.

How to cite: Zhang, J., Chen, H., Huang, J. J., McBean, E., Li, H., Lan, Z., and Gao, J. J.: A modified MOD16 algorithm to estimate soil evaporation and vegetation transpiration in urban garden area, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8161, https://doi.org/10.5194/egusphere-egu21-8161, 2021.

Hydroinformatics platforms and systems integration
Vidya Samadi and Rakshit Pally

Floods are among the most destructive natural hazard that affect millions of people across the world leading to severe loss of life and damage to property, critical infrastructure, and agriculture. Internet of Things (IoTs), machine learning (ML), and Big Data are exceptionally valuable tools for collecting the catastrophic readiness and countless actionable data. The aim of this presentation is to introduce Flood Analytics Information System (FAIS) as a data gathering and analytics system.  FAIS application is smartly designed to integrate crowd intelligence, ML, and natural language processing of tweets to provide warning with the aim to improve flood situational awareness and risk assessment. FAIS has been Beta tested during major hurricane events in US where successive storms made extensive damage and disruption. The prototype successfully identifies a dynamic set of at-risk locations/communities using the USGS river gauge height readings and geotagged tweets intersected with watershed boundary. The list of prioritized locations can be updated, as the river monitoring system and condition change over time (typically every 15 minutes).  The prototype also performs flood frequency analysis (FFA) using various probability distributions with the associated uncertainty estimation to assist engineers in designing safe structures. This presentation will discuss about the FAIS functionalities and real-time implementation of the prototype across south and southeast USA. This research is funded by the US National Science Foundation (NSF).

How to cite: Samadi, V. and Pally, R.: The Convergence of IoT, Machine Learning, and Big Data for Advancing Flood Analytics Knowledge  , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7782, https://doi.org/10.5194/egusphere-egu21-7782, 2021.

Tianrui Pang, Jiping Jiang, Fengyuan Zhang, Harsh Yadav, Yunlei Men, Peng Wang, and Yi Zheng

In the era of smart city, developing environmental decision support system (EDSS) integrated with monitoring, modelling, planning and control for smart management of urban river water quality has been widely accepted and implemented around the world. Construction and coupling localized water quality models, such as popular WASP and EFDC by USEPA, to meet different management requirements are fundamental for the surface water EDSS development. However, few reported the technique coupling the advanced version of WASP program with EDSS platform. 

Traditional pathway of EDSS integrations or model coupling, e.g. database-oriented interaction, are non-modular, with low efficiency of share and reuse, and difficult for system updating. With the development of cloud computing and web services, the service-oriented design are the future trends of model coupling.

In this paper, a generic interface/module interacting with WASP V7.5 program is developed and the technical route of tightly coupling is proposed. The web service encapsulation of localized WASP models and advanced cloud computing services are implemented with the help of OpenGMS framework and SaaS (software as services) pattern. To meet the basic requirements of urban water quality management, the water assimilative capacity allocation and pollution load reduction planning are work out by the cloud computing services, which achieves operational running of EDSS.

The study area is located in Maozhou River Basin, an urban river in Shenzhen, China. According to the national water environment code and regulations, COD (chemical oxygen demand) and NH3-N (ammonia nitrogen) are set as the ending point for supervision and the corresponding WASP model of Maozhou River is constructed and calibrated by historical field data. The computing components and web-services are integrated into the comprehensive water quality management platform of Maozhou River through the model configuration and controlling data parameterization. One version of the Maozhou River EDSS has been deployed and is online on the Shenzhen ecological and environmental intelligent management and control center since January, 2021.

Along with WASP, service-oriented encapsulating of EFDC and SWMM based computing components for particular management purposes are also implemented based on the same technique route since both of them are developed by USEPA with similar inputs and outputs. The technology of model coupling and platform integration mentioned in this paper provides a valuable paradigm for linking other environment models into specific management business. Under the proposed technical pathway, the interaction of model interface between computation engine and system business layer can be easily updated along with the changes of management requirements. It provides the merits of rapidly development, easily deployment and maintenance.

How to cite: Pang, T., Jiang, J., Zhang, F., Yadav, H., Men, Y., Wang, P., and Zheng, Y.: Pathway to Encapsulate Water Quality Models as Cloud Computing Services and Couple with Environmental DSS for Managing Urban River Water Quality, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8115, https://doi.org/10.5194/egusphere-egu21-8115, 2021.

Felipe Troncoso, Nancy Hitschfeld, Pedro Sanzana, Francisco Suárez, and José Muñoz

Water resources management requires specialized computer tools that allow explicit integration of surface and groundwater fluxes, which generally have domains with different spatial discretization. On one hand, a surface hydrological domain, D1, is typically segmented in sub-basins, elevation contour bands or hydrological response units. These elements usually are represented by grids, triangles, or simple irregular polygons. In D1, the elements are connected to each other and incorporated into a drainage network that defines a surface topology, t1. On the other hand, an aquifer domain, D2, is organized in hydrogeological units, which can be represented by geometrical elements such as grids, triangulations, Voronoi or Quadratree diagrams. In D2, a regular connection is typically associated to structured meshes that defines a groundwater topology, t2. We present a new tool called GeoLinkage (v.geolinkage) that creates an ESRI-format linkage shapefile of the new surface-groundwater topology, t1-2. This python-based open-source tool has a graphical user interface (GUI) as an add-on for GRASS-GIS, which was constructed using Pygrass and Flopy libraries. It was developed to be used in WEAP-MODFLOV models, but it can also be used with other water resources management models. GeoLinkage allows processing models with reasonable computation times, which facilitates scenario analysis. It calculates the locations of the surface element geometries (nodes and arcs) using the GRASS platform and connects them to each element of a structured mesh in MODFLOW models. GeoLinkage was applied to obtain groundwater levels and coverage of water demand in Azapa Valley, a hyper-arid zone in the desert of Chile, where a grid of 70.305 cells and six fields with detailed geometry were processed in only 12 min.

How to cite: Troncoso, F., Hitschfeld, N., Sanzana, P., Suárez, F., and Muñoz, J.: GeoLinkage: a GRASS-GIS plugin to integrate surface waters and groundwater in WEAP-MODFLOW models, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13909, https://doi.org/10.5194/egusphere-egu21-13909, 2021.

Citizen science
Louise Petersson, Marie-Claire ten Veldhuis, Govert Verhoeven, Zoran Kapelan, Innocent Maholi, and Hessel Winsemius

We demonstrate a framework for urban flood modeling with community mapped data, particularly suited for flood risk management in data-scarce environments. The framework comprises three principal stages: data acquisition with survey design and quality assurance, model development and model implementation for flood prediction. We demonstrate that data acquisition based on community mapping can be affordable, comprehensible, quality assured and open source, making it applicable in resource-strained contexts. The framework was demonstrated and validated on a case study in Dar es Salaam, Tanzania. The results obtained show that the community mapped data supports flood modeling on a level of detail that is currently inaccessible in many parts of the world. The results obtained also show that the community mapping approach is appropriate for datasets that do not require extensive training, such as flood extent surveys where it is possible to cross-validate the quality of reports given a suitable number and density of data points. More technically advanced features such as dimensions of urban drainage system elements still require trained mappers to create data of sufficient quality. This type of mapping can, however, now be performed in new contexts thanks to the development of smartphones. Future research is suggested to explore how community mapping can become an institutionalized practice to fill in important gaps in data-scarce environments.

How to cite: Petersson, L., ten Veldhuis, M.-C., Verhoeven, G., Kapelan, Z., Maholi, I., and Winsemius, H.: Community Mapping Supports Comprehensive Urban Flood Modeling for Flood Risk Management in a Data-Scarce Environment, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4221, https://doi.org/10.5194/egusphere-egu21-4221, 2021.

Jan Seibert, Simon Etter, Barbara Strobl, Sara Blanco, Mirjam Scheller, Franziska Schwarzenbach, and Ilja van Meerveld

Citizen science observations are potentially useful to complement existing monitoring networks. This is also the case in hydrology, where we often lack spatially distributed observations. Engaging the public might help to overcome the lack of data in hydrology. So far, most hydrological citizen science projects have been based on the use of different instruments or installations. For stream level observations, a staff gauge is installed in the river but it is difficult to scale this type of citizen science approach to a large number of sites because these gauges cannot be installed everywhere (or by everyone). Here, we present an evaluation of the CrowdWater smartphone app that allows the collection of hydrological data without any physical installation or specialized instruments. With the help of a free app, citizens can report the stream level, soil moisture conditions, the presence of water in temporary streams, plastic pollution in streams and on streambanks, as well as general information on streams. The approach is similar to geocaching, with the difference that instead of finding treasures, hydrological measurement sites are set up. These sites can be found by the initiator or other citizen scientists to take additional measurements at a later time. For the water level measurements, a virtual staff gauge approach is used instead of a physical staff gauge. A picture of a staff gauge is digitally inserted into a photo of a stream bank or a bridge pillar and serves as a reference of the water level. During a subsequent field visit, the stream level is compared to the virtual staff gauge on the first picture. In this presentation, we discuss how well the water level class observations agreed with measured stream levels, and in which months and during which flow conditions citizens submitted their stream level observations. We also highlight methods to ensure data quality, and illustrate how these water level data can be used in hydrological model calibration. We also give an update on new activities in the CrowdWater project.

How to cite: Seibert, J., Etter, S., Strobl, B., Blanco, S., Scheller, M., Schwarzenbach, F., and van Meerveld, I.: CrowdWater: How well can citizens observe water levels and other hydrological variables using a smartphone app?, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13285, https://doi.org/10.5194/egusphere-egu21-13285, 2021.

Mila Sari, Bongkot Ngamsom, Alexander Iles, Jeanette Rotchell, Will Mayes, Mark Lorch, Nicole Pamme, and Samantha Richardson

Monitoring water quality traditionally involves experts collecting samples for laboratory-based analysis; a time consuming, costly process.1 It has been recognised that frequent measurements are needed to understand patterns and pressures of changing contaminant concentrations.2 One approach empowers citizens with simple tools, enabling them to monitor water quality regularly.3 Generally, citizen-led sampling has involved volunteers collecting samples for later analysis by experts. We describe an approach comprising of a series of paper-based sensors, that when coupled with a smartphone, enable citizens to participate in simultaneous collection of samples and generation of onsite measurements.

We developed paper microfluidic analytical devices (PADs) for the detection of contaminants (nutrients, metals, organics). All devices were designed to be simple to use with rapid colour readout achieved with minimal user input. Filter paper was patterned with hydrophobic wax barriers to create reaction zones. Within these zones, chemical reagents were stored, that would, upon sample addition, change colour proportionally to the analyte concentration. After addition and drying of reagents, devices were sealed by lamination with a hole cut to allow for sample entry. For water analysis, the devices were placed directly onto the water sample and incubated for a short time (< 10 min). The coloured reaction products were visible to the naked eye; more precise quantification was achieved by capturing a digital image followed by colour intensity analysis.

We adapted spectroscopic determination chemistry, so that it was suitable for use on a portable paper platform; successfully developing separate devices for phosphate (LOD 3 mg L-1), copper (LOD 2 mg L-1), chromium (LOD 0.5 mg L-1), nickel (LOD 3 mg L-1), and triclosan (LOD 3 mg L-1). To detect very low concentrations (>µg L-1) of contaminants (metals, organics) usually found in the environment, we aim to combine the simple paper-based readout with an in-field pre-concentration step. By incorporating an electrospun membrane with a simple filtration system, adsorption of copper ions on the membrane surface was demonstrated. Coupling such a pre-concentration method with colour-generating paper readout devices, would potentially provide a simple means for on-site monitoring at environmentally relevant levels.

Citizen-led sampling was undertaken to monitor phosphates in freshwater across the Humber region (UK), Belgium, Germany and the Netherlands. Devices featured six reaction zones, two control zones and internal calibration (coloured squares). Results were captured using a custom-developed app, RiverDIP (Natural Apptitude) that also recorded location, turbidity (photos), GPS, date, time and waterbody. Submitted data were analysed, and subsequently plotted on an online map, allowing volunteers to see all sampling efforts with > 300 results returned so far. Engagement with volunteers was investigated to empower people by informing them of sources of domestic pollution.

In summary, we have developed a series of simple-to-use paper-based devices to detect water contaminants and demonstrated the feasibility of citizen-led sampling to monitor water quality. Future work will involve further development towards a system for simple onsite pre-concentration and monitoring of heavy metals involving volunteers in the sampling process.


  • 1.J. Environ. Manage.,87 2008, 639-648.
  • 2.Sens., 5, 2005, 4-37.
  • 3.Front. Ecol. Environ.,10, 2012, 298-304.

How to cite: Sari, M., Ngamsom, B., Iles, A., Rotchell, J., Mayes, W., Lorch, M., Pamme, N., and Richardson, S.: Simple-to-use paper microfluidic devices for monitoring contaminants in fresh water, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7834, https://doi.org/10.5194/egusphere-egu21-7834, 2021.

Elena von Benzon, Elizabeth Bagshaw, Michael Prior-Jones, Isaac Sobey, Rupert Perkins, and Simon Browning

We present the first trial of an accurate, low-cost wireless sensor, the ‘Hydrobean’, and base station designed for use by citizen scientists in catchment water quality monitoring. This novel wireless sensor network addresses key concerns identified with current volunteer monitoring programmes, including temporal discontinuity and insufficient data quality. Hydrobean continuously measures electrical conductivity, temperature and pressure and wirelessly transmits these data to an online portal for observation and download by users. These parameters can be used to assess catchment water quality status, with excursions from baseline conditions detected in real time at high temporal resolution. Citizen scientists have an increasingly important role to play in enhancing our scientific understanding of catchment water quality, but their contribution has so far been limited by barriers to access suitable monitoring equipment. Traditional grab sampling techniques result in key contamination incidents being missed and trend analysis limited as samples are analysed discretely, typically on a weekly or monthly basis. Additionally, the quality of data obtained from basic chemical test kits commonly used by citizen scientists does not meet the requirements of many data users. This research explores the role of low-cost wireless sensor networks in advancing the potential of citizen scientists in monitoring catchment water quality. Monitoring equipment available to citizen scientists generally needs to be low cost, so is unlikely to rival professional standard monitoring techniques in the foreseeable future. However, reliable, low-cost sensors which enable continuous, real-time monitoring do now exist for a limited range of water quality parameters and have been used in the development of the wireless sensor network presented here. Critically, Hydrobean and its base station are low cost, low maintenance, portable and robust in order to meet the requirements of community monitoring programmes. Ultimately, a model will be integrated into the real-time analysis of data collected by the wireless sensor network to predict when and where contamination incidents are expected to be affecting catchment water quality. We report initial field results of the Hydrobean wireless sensor network and will discuss ways in which the basic design can be improved in future versions. 

How to cite: von Benzon, E., Bagshaw, E., Prior-Jones, M., Sobey, I., Perkins, R., and Browning, S.: A low-cost wireless sensor network for citizen science water quality monitoring, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12426, https://doi.org/10.5194/egusphere-egu21-12426, 2021.

Julian Klaus, David Hannah, and Kwok Pan Chun

Crowd-sourcing of hydrological data with volunteering citizen scientists has the potential to overcome severe data limitations in space and time. However, several aspects on the reliability, quality, and value of crowd-sourced data are under debate. In this contribution, we present results of a citizen science experiments involving 300 high school students in Luxembourg. The students relied on self-build rainfall collectors to sample precipitation over selected 24-hour periods covering Luxembourg at national scale (~2500 km2) and subsequently measured the amount. Following data collection and archiving, we evaluated the quality of the data by benchmarking the crowd-sourced values to data collected with a dense network of ~50 tipping buckets across the country. This was done by kriging both data sets. We found that the aerial precipitation at national scale derived from both data sort was consistent, however with a rather systematic bias between the two data sources. The bias was in the same range as the bias between tipping bucket data and average amounts from several measurements with a self-build sampler at the same location. The students’ data showed a clearly higher variance compared to the national data but was still able to resolve finer scale variations compared to the national network. We observed the highest differences between both data sets in urban settings. Here, it is not clear if the student’s data was less robust when acquired in an urban setting or if the difference arose from urban rainfall processes that were not observed by the national network, where stations are placed at open sites. With our proposed experiment and the statistical data analysis, we were able to quality control crowd-sourced precipitation data and showed that they are reliable. This increases confidence for many studies relying on similar samplers. Yet, some samples of individuals showed a rather high deviation from the kriged national network, showing that sampling with only a few citizen observers could lead to higher uncertainty in the data. While some limitations exist, we showed that data from citizens are of high quality and provide valuable information for hydrological studies.

How to cite: Klaus, J., Hannah, D., and Chun, K. P.: How reliable are crowd-sourced data in hydrology? Lessons learned from a citizen science experiment, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4825, https://doi.org/10.5194/egusphere-egu21-4825, 2021.