HS3.4

Deep learning in hydrological science

Machine learning (ML) and Deep Learning (DL) have seen accelerated adoption across Hydrology and the broader Earth Sciences. This session highlights the continued integration of ML, and its many variants, including DL, into traditional and emerging hydrology-related workflows. Abstracts are solicited related to novel theory development, novel methodology, or practical applications of ML in hydrological modeling. This might include, but is not limited to, the following:

(1) Development of novel DL models or modeling workflows.
(2) Integrating DL with process-based models and/or physical understanding.
(3) Improving understanding of the (internal) states/representations of ML/DL models.
(4) Understanding the reliability of ML/DL, e.g., under non-stationarity.
(5) Deriving scaling relationships or process-related insights with ML/DL.
(6) Modeling human behavior and impacts on the hydrological cycle.
(7) Hazard analysis, detection, and mitigation.
(8) Natural Language Processing in support of models and/or modeling workflows

Co-organized by ESSI1
Convener: Frederik KratzertECSECS | Co-conveners: Martin GauchECSECS, Thomas LeesECSECS, Daniel KlotzECSECS, Grey Nearing
Presentations
| Wed, 25 May, 15:55–18:17 (CEST)
 
Room 2.15

Presentations: Wed, 25 May | Room 2.15

15:55–16:00
16:00–16:10
|
EGU22-6191
|
solicited
|
On-site presentation
Tim Franken, Cedric Gullentops, Vincent Wolfs, Willem Defloor, Pieter Cabus, and Inge De Jongh

Belgium is ranked 23rd out of 164 countries in water scarcity and the third highest in Europe according to the Water Resource Institute. The warm and dry summers of the past few years have made it clear that Flanders has little if any buffer to cope with a sharp increase in water demand or a prolonged period of dry weather. To increase the resilience and preparedness against droughts, we developed the framework named hAIdro: an operational early warning system for low flows that allows to take timely, local and effective measures against water shortages. Data driven rainfall-runoff models are at the core of the forecasting system that allows to forecast droughts up to 30 days ahead.

The architecture of the data driven hydrological models are inspired by the Multi-Timescale Long Short Term Memory (MTS-LSTM, [1]) that allow to integrate past and future data in one prediction pipeline. The model architecture consists of 3 LTSM’s that are organized in a branched structure. The historical branch processes the historical meteorological data, remote sensing data and static catchment features into encoded state vectors. These are passed through fully connected layers to both a daily and an hourly forecasting branch which are used to make runoff predictions on short (72 hours) and long (30 days) time horizons. The forecasting branches are fed with forecasts of rainfall and temperature, static catchment features and discharge observations. The novelty of the proposed model structure lies in the way discharge observations are incorporated. Only the most recent discharge observations are used in the forecasting branches to minimize the consequences of missing discharge observations in an operational context. The models are trained using a weighted Nash-Sutcliffe Efficiency (NSE) as objective function that puts additional emphasis on low flows. Results show that the newly created data driven models perform well compared to calibrated lumped hydrological PDM models [2] for various performance metrics including Log-NSE and NSE.

We developed a custom cloud-based operational forecasting system, called hAIdro to bring the data driven hydrological models in production. hAIdro processes large quantities of local meteorological measurements, radar rainfall data and ECMWF extended range forecasts to make probabilistic forecasts up to 30 days ahead. hAIdro has been forecasting the runoff twice a day for 262 locations spread over Flanders since April 2021. A continuous monitoring and evaluation framework provides valuable insights in the online model performance and the informative value of hAIdro.

[1] M. Gauch, F. Kratzert, D. Klotz, G. Nearing, J. Lin, and S. Hochreiter. “Rainfall–Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network.” Hydrol. Earth Syst. Sci., 25, 2045–2062, 2021 

[2] Moore, R. J. “The PDM rainfall-runoff model.” Hydrol. Earth Syst. Sci., 11, 483–499,  2007

How to cite: Franken, T., Gullentops, C., Wolfs, V., Defloor, W., Cabus, P., and De Jongh, I.: An operational framework for data driven low flow forecasts in Flanders, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6191, https://doi.org/10.5194/egusphere-egu22-6191, 2022.

16:10–16:17
|
EGU22-5130
|
On-site presentation
Matteo Giuliani, Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli, and Andrea Castelletti

A drought is a slowly developing natural phenomenon that can occur in all climatic zones and can be defined as a temporary but significant decrease in water availability. Over the past three decades, the cost of droughts in Europe amounted to over 100 billion euros, with the recent summer droughts being unprecedented in the last 2,000 years. Although drought monitoring and management are extensively studied in the literature, capturing the evolution of drought dynamics, and associated impacts across different temporal and spatial scales remains a critical, unsolved challenge.

In this work, we contribute with a Machine Learning procedure named FRIDA (FRamework for Index-based Drought Analysis) for the identification of impact-based drought indexes. FRIDA is a fully automated data-driven approach that relies on advanced feature extraction algorithms to identify relevant drought drivers from a pool of candidate hydro-meteorological predictors. The selected predictors are then combined into an index representing a surrogate of the drought conditions in the considered area, including either observed or simulated water deficits or remotely sensed information on crop status. Notably, FRIDA leverages multi-task learning algorithms to upscale the analysis over a large region where drought impacts might depend on diverse but potentially correlated drivers. FRIDA captures the heterogeneous features of the different sub-regions while efficiently using all available data and exploiting the commonalities across sub-regions. In this way, the accuracy of the resulting prediction benefits from a reduced uncertainty compared to training separate models for each sub-region. Several real-world examples will be used to provide a synthesis of recent applications of FRIDA in case studies featuring diverse hydroclimatic conditions and variable levels of data availability.

How to cite: Giuliani, M., Bonetti, P., Metelli, A. M., Restelli, M., and Castelletti, A.: Advancing drought monitoring via feature extraction and multi-task learning algorithms, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5130, https://doi.org/10.5194/egusphere-egu22-5130, 2022.

16:17–16:24
|
EGU22-6362
|
On-site presentation
Tanja Morgenstern, Jens Grundmann, and Niels Schütze

Reliable forecasts of water level and discharge are necessary for efficient disaster management in case of a flood event. The methods of flood forecasting are rapidly developing, part of this being artificial neural networks (ANN). These belong to the data-driven models and therefore are sensitive to the quality, quantity and relevance of their input and training data.

Previous studies at the Institute of Hydrology and Meteorology at the TU Dresden used both hourly discharge and precipitation time series to model the precipitation-runoff process with ANN, e.g. Deep Learning LSTM networks (Long Short-Term Memory – a subcategory of ANN). The precipitation data were derived of area averages of radar data, in which the spatial structure of the precipitation and thus important information for rainfall-runoff modelling is lost. This is a problem especially for small-scale convective rainfall events.

As part of the KIWA project, we carry out a study with the aim of improving the reliability of flood forecasts of our LSTM networks by supplementing the input data with statistical precipitation information. For this purpose, we are adding statistical information such as area maximum and minimum of precipitation intensity, as well as its standard deviation over the area, to the area mean values of precipitation from the hourly radar data.

As this information contains details on the precipitation intensity distribution over the area, we expect an improvement of the discharge prediction quality, as well as an improvement of the timing. In addition, we expect the LSTM network to learn from the statistical information to better assess the relevance and quality of the given precipitation values and to recognize the spatial uncertainties inherent to the area means. The resulting knowledge of the network should now enable it to forecast the discharge while communicating information on the uncertainty of the current discharge forecast.

We present the preliminary results of this investigation based on small pilot catchments in Saxony (Germany) with differing hydrological and geographical characteristics.

How to cite: Morgenstern, T., Grundmann, J., and Schütze, N.: Flood Forecasting With LSTM Networks: Enhancing the Input Data With Statistical Precipitation Information, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6362, https://doi.org/10.5194/egusphere-egu22-6362, 2022.

16:24–16:31
|
EGU22-2254
|
Presentation form not yet defined
Jakub Langhammer

Machine learning has shown great promise for hydrological modeling because, unlike conventional approaches, it allows efficient processing of big data provided by the recent automatic monitoring networks. This research presents the Support Vector Machine (SVM) model designed for modeling floods in a montane environment based on data from a distributed automated sensor network. The study aimed to test the reliability of the SVM model to predict the different types of flood events occurring in the environment of a mid-latitude headwater basin, experiencing the effects of climate and land use change. 

The sensor network uses four hydrological and two meteorological stations, located in headwaters of the montane basin of Vydra, experiencing intense forest disturbance, a rise in air temperatures, and frequent occurrence of flood events. Automated hydrological stations are operating in the study area for ten years, recording the water levels in a 10-minute interval with online access to data. Meteorological stations monitor air temperatures, precipitation, and snow cover depth at the same time step. 

The model network was built using the Support Vector Machines (SVM), particularly the nu-SVR algorithm, employing the LibSVM library. The network was trained and validated on a complex sample of hydrological observations and tested on the scenarios covering different types of extreme events. The simulation scenarios included the floods from a single summer storm, recurrent storms, prolonged regional rain, snowmelt, and a rain-on-snow event. 

The model proved the robustness and good performance of the data-driven SVM model to simulate hydrological time series. The RMSE model performance ranged from 0,91-0,97 for individual scenarios, without substantial errors in the fit of the trend, timing of the events, peak values, and flood volumes. The model reliably reconstructed even the complex flood events, such as rain on snow episodes and flooding from recurrent precipitation. 

The research proved that the data-driven SVM model provides a reliable and robust tool for simulating flood events from sensor network data. The model proved reliability in a montane environment featuring rapid runoff generation, transient environmental conditions, and variability of flood event types. The SVM model proved to efficiently handle big data volumes from sensor networks and, under such conditions, is a promising approach for operational flood forecasting and hydrological research. 

How to cite: Langhammer, J.: Flood forecasting using sensor network and Support Vector Machine model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2254, https://doi.org/10.5194/egusphere-egu22-2254, 2022.

16:31–16:38
|
EGU22-8471
|
ECS
|
Presentation form not yet defined
Etienne Fluet-Chouinard, William Aeberhard, Eniko Szekely, Massimilano Zappa, Konrad Bogner, Sonia I. Seneviratne, and Lukas Gudmundsson

The prediction of streamflow in gauged and ungauged basins is a central challenge of hydrology and is increasingly being met by machine learning and deep learning models. With increase in data volume and advances in modeling techniques, the capacity for deep learning tools to compete and complement physics-based hydrological models over a variety of settings and scales is still being explored. Here, we present initial results of the MAchine learning for Swiss (CH) river FLOW estimation (MACH-Flow) project. We train machine learning models on daily discharge data from 260 gauging stations across Switzerland covering the 1980-2020 time window. The river gauging stations we included have catchment areas ranging between 0.1-3000 km2, and average streamflow between 0.1-100 m3/second. We also test a range of predictor features including: air temperature, precipitation, incoming radiation, relative humidity, as well as a number of static catchment variables. We evaluated multiple model architectures of ranging complexity, from models focusing on runoff predictions over individual headwater catchments, such as Neural Network, Long short-term memory (LSTM) cells. We also investigate Graph Neural Networks capable of leveraging information from neighbouring stations in making point location predictions. Predictions are generated at gauging locations as well as over 307 land units used for drought monitoring. We benchmark and compare deep learning methods against two process-based hydrological models: 1) the PREecipitation Runoff EVApotranspiration HRU Model (PREVAH) used operationally by Swiss federal agencies and 2) the comparatively streamlined Simple Water Balance Model (SWBM). We compared the deep learning and physics-based models with regards to predicting daily river discharge as well as of low-flows during drought conditions that are essential for water managers and planners in Switzerland. We find that most deep learning methods with sufficient tuning and lookback periods can compete with the streamflow predictions from process-based models, particularly at gauging stations on larger non-regulated rivers where hydro-dynamic time lags are significant. Finally, we discuss the prospects for generating discharge predictions across all river segments of Switzerland using deep learning methods, along with challenges and opportunities to achieve this goal.

How to cite: Fluet-Chouinard, E., Aeberhard, W., Szekely, E., Zappa, M., Bogner, K., I. Seneviratne, S., and Gudmundsson, L.: Machine learning-derived predictions of river flow across Switzerland, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8471, https://doi.org/10.5194/egusphere-egu22-8471, 2022.

Coffee break
17:00–17:07
|
EGU22-3825
|
ECS
|
On-site presentation
Andreas Wunsch, Tanja Liesch, Guillaume Cinkus, Nataša Ravbar, Zhao Chen, Naomi Mazzillli, Hervé Jourde, and Nico Goldscheider

Despite many existing approaches, modeling karst water resources remains challenging and often requires solid system knowledge. Artificial Neural Network approaches offer a convenient solution by establishing a simple input-output relationship on their own. However, in this context, temporal and especially spatial data availability is often an important constraint, as usually no or few climate stations within a karst spring catchment are available. Hence spatial coverage is often unsatisfying and can introduce severe uncertainties. To overcome these problems, we use 2D-Convolutional Neural Networks (CNN) to directly process gridded meteorological data followed by a 1D-CNN to perform karst spring discharge simulation. We investigate three karst spring catchments in the Alpine and Mediterranean region with different meteorologic-hydrological characteristics and hydrodynamic system properties. We compare our 2D-models both to existing modeling studies in these regions and to own 1D-models that are conventionally based on climate station input data. Our results show that our models are excellently suited to model karst spring discharge and rival the simulation results of existing approaches in the respective areas. The 2D-models show a better fit than the 1D-models in two of three cases, learn relevant parts of the input data themselves and by performing a spatial input sensitivity analysis we can further show their usefulness to localize the position of karst catchments.

How to cite: Wunsch, A., Liesch, T., Cinkus, G., Ravbar, N., Chen, Z., Mazzillli, N., Jourde, H., and Goldscheider, N.: Karst spring discharge modeling based on deep learning using spatially distributed input data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3825, https://doi.org/10.5194/egusphere-egu22-3825, 2022.

17:07–17:14
|
EGU22-1767
|
Presentation form not yet defined
Morgan Buire, Manon Ahlouche, Renaud Jougla, and Robert Leconte

Improving streamflow forecasts helps in reducing socio-economical impacts of hydrological-related damages. Among them, improving hydropower production is a challenge, even more so in a context of climate change. Deep learning models drew the attention of scientists working on forecasting models based on physical laws, since they got recognition in other domains. Artificial Neural Network (ANN) offer promising performance for streamflow forecasts, including good accuracy and lesser time to run compared to traditional physically-based models. 

 

The objective of this study is to compare different spatial discretization schemes of inputs in an ANN model for streamflow forecast. The study focuses on the “Au Saumon” watershed in Southern Quebec (Canada) during summer periods, with a forecast window of 7 days at a daily timestep. Parameterization of the ANN was a key preliminary step: the number of neurons in the hidden layer was first optimized, leading to 6 neurons. The model was trained on a 11-year dataset (2000-2005 and 2007-2011) followed by model validation on one dry (2012) and one wet (2006) year to take into account extreme hydrologic regimes. 

 

To lead this study, the physically-based hydrological ‘Hydrotel’ model is the reference to compare our results. The model defines watershed heterogeneity using hydrological units based on land uses, soil types, and topography, called Relative Homogeneous Hydrological Units (RHHU). The Nash-Sutcliffe Efficiency score (NSE) is the main evaluation criteria calculated. In a preliminary step, we have to ensure the ANN model can satisfactorily mimic Hydrotel. With the same model inputs, that is same variables and same spatial discretizations of variables (total precipitation, daily maximum and minimum temperatures, and soil surface humidity), the ANN forecasts were found to be better than those of Hydrotel for one to 7-day forecasts. 

 

Three different watershed spatial discretizations were tested: global, fully distributed, and semi-distributed. For the global model, hydrometeorological data used as inputs to the ANN model were averaged across all RHHUs. The complexity is reduced with loss of spatial details and heterogeneity. For the fully distributed model, a regular grid was defined with six cells of 28x28km2 covering all the watershed. For the semi-distributed model, spatial distribution of the input data was that of the RHHUs. For this discretization, the state variables (soil moisture and outflow) were updated at each forecast timestep, whether on all RHHUs, or only on the RHHU of the outlet.

 

Depending on the spatial discretization of inputs used, the accuracy differed. The fully distributed model offered the least performance, with NSE values of 0.85 ,while the global model surprisingly performed better with a 0.93 NSE. Moreover, updating soil moisture on all the RHHUs of the semi-distributed model improved the NSE across the entire window of forecast.

This research will assess the ANN model performance developed using ERA5-land precipitation and temperature reanalysis and ground observations of soil moisture. Given the promising results obtained with the fully and semi distributed models, our ANN model will be tested with state variables retrieved from satellite data, such as surface soil moisture from SMAP and SMOS missions.



How to cite: Buire, M., Ahlouche, M., Jougla, R., and Leconte, R.: Forecasting streamflow using Artificial Neural Network (ANN) with different spatial discretizations of the watershed : use case on the Au Saumon watershed in Quebec (Canada)., EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1767, https://doi.org/10.5194/egusphere-egu22-1767, 2022.

17:14–17:21
|
EGU22-3946
|
ECS
|
Virtual presentation
Aiden Durrant, David Haro, and Georgios Leontidis

The management of water resource systems is a longstanding and inherently complex problem, balancing an increasing number of interests to meet short- and long-term objectives sustainably. The difficulty of analyzing large-scale, multi-reservoir water systems is compounded by the scale and interpretation of the historic data. Therefore, to assist in the decision-making processes for water allocation we propose the use of machine learning, specifically deep learning to uncover and interpret relationships in high-dimensional data that can enable more accurate forecasting.   

We explore the problem of reservoir level prediction as a pilot study, comparing traditional machine learning approaches to our proposal of spatial-temporal graph neural networks that embed the topological nature of the water system. The graph convolutional neural network explicitly captures spatial interaction among segments of river within the system. The construction of the graph is as follows: nodes represent the reservoir and river monitoring stations; edges define the characteristics of the river sections connecting these stations (i.e. distance, flow, etc.); multiple states of the aforementioned graph, each at different measurement intervals. We then train the network to predict the water level of a node (reservoir measurement station) from previous time intervals. The proposed network is trained on historic data of the EBRO basin, Spain, from 1981 to 2018, specifically utilizing river and reservoir gauging station flow rate and fill level respectively, with the addition of characteristics defining each component of the water system. 

We validate our approaches over a 4-year period, making predictions across various time frames, showing the robustness to various circumstances, and meeting necessary objective requirements ranging from daily to monthly forecasting. As an extension, we also investigate the use of our predictions to allow for drought identification, demonstrating just one of many use-cases where machine learning can uncover vital information that can lead to better management and planning decisions. 

How to cite: Durrant, A., Haro, D., and Leontidis, G.: Graph Neural Networks for Reservoir Level Forecasting and Draught Identification , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3946, https://doi.org/10.5194/egusphere-egu22-3946, 2022.

17:21–17:28
|
EGU22-9771
|
ECS
|
On-site presentation
Chanoknun Wannasin, Claudia Brauer, Remko Uijlenhoet, Paul Torfs, and Albrecht Weerts

Reservoirs and dams are essential infrastructures for human utilization and management of water resources; yet modelling real-time reservoir operation and controlled reservoir outflow remains a challenge. Artificial intelligence techniques, especially machine learning and deep learning, have become increasingly popular in hydrological forecasting, including reservoir operation. In this study, we applied a recurrent neural network (RNN) and a long short-term memory (LSTM) to model the reservoir operation and outflow of a large-scale multi-purpose reservoir at the real-time (daily) timescale. This study aims to investigate the capabilities of RNN and LSTM models in simulating and reforecasting the real-time reservoir outflow, considering the uncertainties in model inputs, model training-testing periods, and different model algorithms. The Sirikit reservoir in Thailand was selected as a case study. The main inputs for the RNN and LSTM models were daily reservoir inflow, daily storage, and the month of the year. We applied the distributed wflow_sbm model for reservoir inflow simulation (using MSWEP precipitation data) and ensemble inflow reforecasting (using ECMWF precipitation data). Daily reservoir storage was obtained from observations and real-time recalculation based on the reservoir water balance. The models were trained and tested with 10-fold cross-validation. Results show that both RNN and LSTM models have high accuracies for real-time simulations and reasonable accuracies for multi-step reforecasts, and that LSTM exhibits better model performance in forecasting mode. The performance varied between each cross-validation, being highly related to the extreme events included in either training or test period. With further understanding of the reservoir inflow uncertainty influences on reservoir operation, we conclude that the models can be potentially applicable in real-time reservoir operation and decision-making for operational water management.

How to cite: Wannasin, C., Brauer, C., Uijlenhoet, R., Torfs, P., and Weerts, A.: Simulating and multi-step reforecasting real-time reservoir operation using combined neural network and distributed hydrological model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9771, https://doi.org/10.5194/egusphere-egu22-9771, 2022.

17:28–17:35
|
EGU22-3281
|
ECS
|
Presentation form not yet defined
Yegane Khoshkalam, Farshid Rahmani, Alain N. Rousseau, Kian Abbasnezhadi, Chaopeng Shen, and Etienne Foulon

Reliable streamflow predictions are critical for managing water resources for flood warning, agricultural irrigation apportionment, hydroelectric production, to name a few. However, there are geographical heterogeneities in available observed streamflow data, river basin geophysical attributes, and meteorological data to support such predictions. Moreover, in data-sparse regions, both process-based and data-driven models have difficulties in being sufficiently calibrated or trained; increasing the difficulty to achieve satisfactory predictions. That being mentioned, it is possible to transfer knowledge from regions with dense and available measured data to data-sparse regions. In earlier work, we have shown that transfer learning based on a long short-term memory (LSTM) network, pre-trained over the conterminous United States, could improve daily streamflow prediction in Quebec (Canada) when compared to a semi-distributed hydrological model (HYDROTEL). The dataset used for pre-training (source dataset) was the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS), while the data for the basins located at the target locations (local dataset) were extracted from the Hydrometeorological Sandbox-École de Technologie Supérieure (HYSETS). Both datasets provide access to various types of information with different spatial resolutions. While HYSETS is generally spanning from 1950 to 2018, the temporal interval for most of the basins reported in CAMELS goes back to 1980. The types of data included in both CAMELS and HYSETS include daily meteorological data (precipitation, temperature, etc.), streamflow observations, and basins physiographic attributes (i.e., considered time-invariant or static). In this work, the techniques applied to further improve streamflow simulations included the use of: (i) streamflow observations and simulated flows from HYDROTEL as input to the LSTM model, (ii) different forcing (meteorological data) and static attribute data from the source and the local datasets, and (iii) additional basins from HYSETS with similar climatological features for model training. The ultimate goal was to improve the accuracy of the predicted hydrographs with an emphasis on enhancing the prediction of peak flows by transfer learning while using the Kling-Gupta efficiency (KGE) and Nash-Sutcliffe efficiency (NSE) metrics. This investigation has revealed the benefits of using transfer learning techniques based on deep learning models to improve streamflow predictions when compared to the application of a distributed hydrological models in data-sparse regions.

How to cite: Khoshkalam, Y., Rahmani, F., Rousseau, A. N., Abbasnezhadi, K., Shen, C., and Foulon, E.: Assessment of Transfer Learning Techniques to Improve Streamflow Predictions in Data-Sparse Regions, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3281, https://doi.org/10.5194/egusphere-egu22-3281, 2022.

17:35–17:42
|
EGU22-10744
|
On-site presentation
Everett Snieder and Usman Khan

Modelling accurate rainfall-runoff (RR) simulations is a longstanding contest in hydrological research. These models often treat the RR relationship as stationary; in other words, model parameters are assumed to be fixed, time-invariant values. In reality, the RR relationship is continuously changing due to factors such as climate change, rapid urban growth, and construction of hydraulic infrastructure. Therefore, there is a need for hydrological models to be able to adapt to these changes.

The suitability of machine learning (ML) models for flow forecasting has been well established over the past 3 decades. One advantage of such models is their ability to rapidly and continuously adapt to the non-stationary relationship between rainfall and runoff generation. However, changes in model performance and model adaptation in an operational context have not received much attention from the research community.

We present a large-scale framework for daily flow forecasting models in Canada (>100 catchments). In our framework, local artificial neural network (ANN) ensembles models are automatically trained to forecast flow on an individual catchment basis using openly available daily hydrometeorological timeseries data. The collection of catchments taken from across Canada have highly heterogenous soil groups, land use, and climate. We propose several experiments that are designed to evaluate the robustness of ANN-based flow forecasting across time. Using the most recent year of observations for validation, we evaluate the effects of incrementally providing increasing amounts of historic observations. Similarly, we quantify changes to ANN model parameters (weights and biases) across increasing historic training data. Finally, we analyse feature importance across time using multiple feature importance algorithms. Our research aims to provide guidance on initial model training and adaptive learning, as ML-based approaches become increasingly adapted for operational use.

How to cite: Snieder, E. and Khan, U.: Large-scale evaluation of temporal trends in ANN behaviour for daily flow forecasts in Canadian catchments., EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10744, https://doi.org/10.5194/egusphere-egu22-10744, 2022.

17:42–17:49
|
EGU22-3661
|
ECS
|
On-site presentation
Marvin Höge, Andreas Scheidegger, Marco Baity-Jesi, Carlo Albert, and Fabrizio Fenicia

Deep learning methods have repeatedly proven to outperform conceptual hydrologic models in rainfall-runoff modelling. Although attempts of investigating the internals of such deep learning models are being made, traceability of model states and processes and their interrelations to model input and output is not fully given, yet. Direct interpretability of mechanistic processes has always been considered as asset of conceptual models that helps to gain system understanding aside of predictability. We introduce hydrologic Neural ODE models that perform as well as state-of-the-art deep learning methods in rainfall-runoff prediction while maintaining the ease of interpretability of conceptual hydrologic models. In Neural ODEs, model internal processes that are typically implemented in differential equations by hard-coding are substituted by neural networks. Therefore, Neural ODE models offer a way to fuse deep learning with mechanistic modelling yielding time-continuous solutions. We demonstrate the basin-specific predictive capability for several hundred catchments of the continental US, and exemplarily give insight to what the neural networks within the ODE models have learned about the model internal processes. Further, we discuss the role of Neural ODE models on the middle ground between pure deep learning and pure conceptual hydrologic models.

How to cite: Höge, M., Scheidegger, A., Baity-Jesi, M., Albert, C., and Fenicia, F.: Neural ODEs in Hydrology: Fusing Conceptual Models with Deep Learning for Improved Predictions and Process Understanding, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3661, https://doi.org/10.5194/egusphere-egu22-3661, 2022.

17:49–17:56
|
EGU22-4303
|
On-site presentation
Roland Löwe, Rocco Palmitessa, Allan Peter Engsig-Karup, and Morten Grum

Hydrodynamic models (numerical solutions of the Saint Venant equations) are at the core of simulating water movements in natural streams and drainage systems. They enable realistic simulations of water movement and are directly linked to physical system characteristics such as channel slope and diameter. This feature is important for man-made drainage structures as it enables straightforward testing of the effects of varying channel designs. In cities, models with hundreds up to tens of thousands of pipes are commonly used for drainage infrastructure. Their computational expense remains high and they are not suited for a systematic screening of design options, discussing water management options in workshops, as well as many real-time applications such as data assimilation.

Hydrologists have developed many approaches to enable faster simulations. All of these do, however, compromise on the physical detail of the simulated processes (for example, by simulating only flows using linear reservoirs), and usually also on the spatial and temporal resolution of the models (for example, by simulating only flows between key points in the system). The link to physical system characteristics is thus lost. Therefore, it is challenging to incorporate such approaches into planning workflows where changing city plans require a constant revision of water management options.

Recent advances in scientific machine learning enable the creation of fast machine learning surrogates for complex systems that preserve a high spatio-temporal detail and a physically accurate simulation. We present such an approach that employs generalized residue networks for the simulation of hydraulics in drainage systems. The key concept is to train neural networks that learn how hydraulic states (level, flow and surcharge volume) at all nodes and pipes in the drainage network evolve from one time step to another, given a set of boundary conditions (surface runoff). Training is performed against the output of a hydrodynamic model for a short time series.

Once trained, the surrogates generate the same results as a hydrodynamic model in the same level of detail, and they can be used to quickly simulate the effect of many different rain events and climate scenarios. Considering pipe networks with 50 to 100 pipes, our approach achieves NSE values in the order of 0.95 for the testing dataset. Simulations are performed 10 to 50 times faster than the hydrodynamic model. Training times are in the order of 25 minutes on a single CPU. The surrogates are system specific and need to be retrained when the physical system changes. To minimize this overhead, we train surrogates for small subsystems which can subsequently be linked into a model for a large drainage network.

Our approach is an initial application of scientific machine learning for the simulation of hydraulics that is readily combined with other recent developments. Future research should, in particular, explore the application of physics-informed loss functions for bypassing the generation of training data from hydrodynamic simulations, and of graph neural networks to exploit spatial correlation structures in the pipe network.

How to cite: Löwe, R., Palmitessa, R., Engsig-Karup, A. P., and Grum, M.: Fast and detailed emulation of urban drainage flows using physics-guided machine learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4303, https://doi.org/10.5194/egusphere-egu22-4303, 2022.

17:56–18:03
|
EGU22-11110
|
ECS
|
Virtual presentation
Prashant Istalkar, Akshay Kadu, and Basudev Biswal

Modeling the rainfall-runoff process has been a key challenge for hydrologists. Multiple modeling frameworks have been introduced with time to understand and predict the runoff generation process, including physics-based models, conceptual models, and data-driven models. In recent years the use of deep learning models like Long Short-Term Memory (LSTM) has increased in hydrology because of its ability to learn information in the sequence of input. Studies report LSTM outperforms the well-established hydrological models (e.g. SAC-SMA), which led authors to question the need for process understanding in the machine learning era. In the current study, we claim that process understanding helps to reduce LSTM model complexity and ultimately improves recession flow prediction. Here, we used past streamflow information as input to LSTM and predicted ten days of recession flow. To reduce LSTM complexity, we used insights from a conceptual hydrological model that accounts for storage-discharge dynamics. Overall, our study re-emphasizes the need to understand hydrological processes.

How to cite: Istalkar, P., Kadu, A., and Biswal, B.: Physics-informed LSTM structure for recession flow prediction    , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11110, https://doi.org/10.5194/egusphere-egu22-11110, 2022.

18:03–18:10
|
EGU22-4639
|
ECS
|
Virtual presentation
William Lidberg, Siddhartho Paul, Florian Westphal, and Anneli Ågren

Drainage ditches are common forestry practice across northern European boreal forests and in some parts of North America. Ditching helps with lowering the groundwater level in the wet parts of the forest to improve soil aeration and to support tree growth. However, the intensive ditching practice pose multidimensional environmental risks, particularly for degradation of wetland and soil, greenhouse gas, increased nutrient and sediment loadings to water bodies, as well as biodiversity loss. At the same time there is a discrepancy between the potential significance of artificial water bodies, such as drainage ditches and their low representation in scientific research and water management policy. A comparison with a national inventory of Sweden showed that only 9 % of drainage ditches are present on the best avalible map of Sweden. The increasing understanding of the environmental risks associated with forest ditches together with the poor representation of ditch networks in existing maps of many forest landscapes makes detailed mapping of these ditches a priority for sustainable land and water management. Here, we combine two state-of-the-art technologies – airborne laser scanning and deep learning - for detecting drainage ditches on a national scale.

 

A deep neural network was trained on airborne laser scanning data and 1607 km of manually digitized ditch channels from 10 regions spread across Sweden. 20 % of the data was set aside for testing the model.  The model correctly mapped 82 % of all small drainage channels in the test data with a Matthew's correlation coefficient of 0.72. This approach only requires one topographical index, a high pass median filter calculated from a digital elevation model with a 1 m spatial resolution. This made it possible to scale up over large areas with limited computational resources and the trained model was implemented using Microsoft Azure to map ditch channels across all of Sweden. The total mapped channel length was 970 00 km (equivalent to 24 times around the world). Visual inspection indicated that this method also classifies natural stream channels as drainage channels, which suggests that a deep neural network can be trained to detect natural stream channels in addition to drainage ditches. The model only required one topographical index which makes it possible to implement this approach in other areas with access to high resolution digital elevation data.

How to cite: Lidberg, W., Paul, S., Westphal, F., and Ågren, A.: Mapping Sweden’s drainage ditches using deep learning and airborne laser scanning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4639, https://doi.org/10.5194/egusphere-egu22-4639, 2022.

18:10–18:17
|
EGU22-6231
|
ECS
|
On-site presentation
Guillaume Blanchy, Lukas Albrecht, Johannes Koestel, and Sarah Garré

Adapting agricultural management practices to changing climate is not straightforward. Effects of agricultural management practices (tillage, cover crops, amendment, …) on soil variables (hydraulic conductivity, aggregate stability, …) often vary according to pedo-climatic conditions. Hence, it is important to take these conditions into account in quantitative evidence synthesis. Extracting structured information from scientific publications to build large databases with experimental data from various conditions is an effective way to do this. This database can then serve to explain, and possibly also to predict, the effect of management practices in different pedo-climatic contexts.

However, manually building such a database by going through all publications is tedious. And given the increasing amount of literature, this task is likely to require more and more effort in the future. Natural language processing facilitates this task.  In this work, we built a database of near-saturated hydraulic conductivity from tension-disk infiltrometer measurements from scientific publications. We used tailored regular expressions and dictionaries to extract coordinates, soil texture, soil type, rainfall, disk diameter and tensions applied. The overal results have an F1-score ranging from 0.72 to 0.91.

In addition, we extracted relationships between a set of driver keywords (e.g. ‘biochar’, ‘zero tillage’, …) and variables (e.g. ‘soil aggregate’, ‘hydraulic conductivity’, …) from publication abstracts based on the shortest dependency path between them. The relationships were further classified according to positive, negative or absent correlations between the driver and variable. This technique quickly provides an overview of the different driver-variable relationships and their abundance for an entire body of literature. For instance, we were able to recover the positive correlation between biochar and yield, as well as its negative correlation with bulk density.

How to cite: Blanchy, G., Albrecht, L., Koestel, J., and Garré, S.: Potential of natural language processing for metadata extraction from environmental scientific publications, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6231, https://doi.org/10.5194/egusphere-egu22-6231, 2022.