EGU25-15030, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-15030
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
PICO | Monday, 28 Apr, 09:13–09:15 (CEST)
 
PICO spot A, PICOA.14
Filling Streamflow Data Gaps in Indian Catchments Using Machine Learning
Hiren Solanki1 and Vimal Mishra1,2
Hiren Solanki and Vimal Mishra
  • 1Earth Science, Indian Institute of Technology Gandhinagar, Gandhinagar, 382355, India. (hirenrs@iitgn.ac.in)
  • 2Civil Engineering, Indian Institute of Technology Gandhinagar, Gandhinagar, 382355, India. (vmishra@iitgn.ac.in)

Complete hydrological time series are critical for effective water resource management, flood and drought forecasting, hydroelectric power optimization, irrigation planning, ecological preservation, and climate change impact assessments. However, significant data gaps in streamflow and water level observations, compounded by extreme hydroclimatic events and quality control issues, hinder accurate modeling and informed decision-making in Indian catchments. The current challenges are particularly pronounced in regions with high climatic variability, where missing data spans 6 to 12 months. To address this, we employed geomorphological, meteorological, and hydrological parameters in combination with the Random Forest method to gap-fill streamflow data at 352 stations across India, except the transboundary basins. To enhance model accuracy and training, we categorized stations into similar-behaving classes using a k-means clustering algorithm based on catchment characteristics. This clustering increased the availability of training data for machine learning models. Streamflow data from each class was trained with 80% of the available data and validated on the remaining 20%. Our results indicate that clustering significantly improves performance, with over 100 stations reporting a >25% increase in Nash-Sutcliffe Efficiency (NSE). Model performance was evaluated for continuous data gaps of 1 week, 1 month, 3 months, 6 months, and 1 year, revealing a decline in accuracy with longer gaps. Despite this, the mean NSE exceeded 0.85 across all clusters. The gap-filled datasets provide robust hydrographs, enabling precise streamflow variability modeling, climate-hydrology interaction evaluation, and improved water resource management strategies.

How to cite: Solanki, H. and Mishra, V.: Filling Streamflow Data Gaps in Indian Catchments Using Machine Learning, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-15030, https://doi.org/10.5194/egusphere-egu25-15030, 2025.