Sustainable Reservoir Operation and Control Using a Deep Reinforcement Learning Policy Gradient Method
- 1The Glenn Department of Civil Engineering, Clemson University, Clemson, United States of America (sadeghs@clemson.edu)
- 2Department of Agricultural Science, Clemson University, Clemson, United States of America (samadi@clemson.edu)
The increasing stress on water resource systems has prompted researchers to look for ways to improve the performance of reservoir operations. Changes in demand, various hydrological inputs, and new environmental stresses are among issues that water managers face. These concerns have sparked interest in applying different techniques to determine reservoir operation policy to improve reservoir system performance. As the resolution of analysis rises, it becomes more difficult to effectively represent a real-world system using currently available approaches for determining the best reservoir operation policy. One of the challenges is the "curse of dimensionality," which occurs when the discretization of the state and action spaces becomes finer or when more state or action variables are taken into account. Because of the dimensionality curse, the number of state-action variables is limited, rendering dynamic programming (DP) and stochastic DP (SDP) ineffective in handling complex reservoir optimization issues. Reinforcement learning (RL) is one way to overcome the aforementioned curses of stochastic optimization of water resources systems. RL is a well-known and influential technique in machine learning research that can solve a wide range of optimization and simulation challenges. In this study, a novel continuous-action deep RL algorithm called Deep Deterministic Policy Gradients (DDPG) is applied to solve the DP problem for the Folsom Reservoir system located in California, US. Without requiring any model simplifications or surrendering any of the critical characteristics of DP, the employed continuous action-space RL method effectively overcomes dimensionality concerns. The system model employs an iterative learning method that takes into account delayed rewards without requiring an explicit probabilistic model of hydrologic processes, and it can learn the best actions that maximize total expected reward by interacting with a simulated environment. This research is funded by the US Geological Survey.
How to cite: Sadeghi Tabas, S. and Samadi, V.: Sustainable Reservoir Operation and Control Using a Deep Reinforcement Learning Policy Gradient Method, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13011, https://doi.org/10.5194/egusphere-egu22-13011, 2022.