EGU26-2374, updated on 13 Mar 2026
https://doi.org/10.5194/egusphere-egu26-2374
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Wednesday, 06 May, 08:35–08:45 (CEST)
 
Room C
How far can we stretch big-data ideas with limited data? Machine-learned groundwater level predictions at a continental scale with smaller and smaller data sets.
Wolfgang Nowak, Waqas Ahmed, and Emanuel Buccini
Wolfgang Nowak et al.
  • Universität Stuttgart, Institute for Modelling Hydraulic and Environmental Systems (IWS), LS3 / Stochastic Simulation and Safety Research for Hydrosystems, Stuttgart, Germany (wolfgang.nowak@iws.uni-stuttgart.de)

In deep learning, as in any other modeling endeavor, the quality and scale of the data used are important. To use the prediction of groundwater levels as an example, satellite data have gained considerable attention for monitoring groundwater storage anomalies. However, such data have a coarse resolution and uncertainties due to the disintegration process. At the same time, the lack of sufficiently dense groundwater monitoring networks also remains a significant barrier. In many real-world applications, high-quality data are rare. Thus, input and target (calibration) data are often noisy, spatially/temporally sparse, and lack spatial resolution, which compromises the predictive power of deep learning models.

In this work, we investigate the robustness of deep learning for estimating groundwater levels at the continental scale from sparse observations. We utilize the CONUS2 dataset (https://hydroframe.org/parflow-conus2), derived from the physics-based simulator ParFlow. Inspired by a recent study (HydroStartML [1]), we utilize this dataset to train a deep learning model that estimates the “depth to water table” (DTWT) from easily accessible and spatially distributed covariates. These covariates include elevation, slope, and hydrographic features such as hydraulic conductivity and net recharge.

As a deep-learning model, we implement a U-Net architecture to map the relationship between these covariate maps and WTD. Beyond a baseline estimation, where we use the entire CONUS2 data set for training, we conduct a rigorous ablation to evaluate the model's robustness under simulated data scarcity, reflecting real-world observational constraints. To simulate data scarcity, we apply a masking protocol, where we systematically occlude a wide range of data fractions from the target data, thus forcing the U-Net model to reconstruct the WTD field from limited information. Finally, we assess model performance using standard metrics, such as the Nash-Sutcliffe Efficiency and the Root Mean Squared Error. Our results demonstrate a strong predictive capability, even in data-sparse scenarios, thereby validating the approach. However, a spatial analysis of the error distribution reveals a distinct topographical dichotomy: while the network achieves high precision and stability in low-relief plains, it exhibits systematic errors in complex mountainous terrain, where the prediction task is more challenging due to the larger spatial variability of covariates and the target variable. Overall, our findings suggest that, while U-Net architectures are surprisingly robust for groundwater mapping, distinct physical scenarios may require adaptations to the architecture.

 

[1] L. Pawusch, S. Scheurer, W. Nowak, R.M. Maxwell: “HydroStartML: A combined machine learning and physics-based approach to reduce hydrological model spin-up time”, Advances in Water Resources, 206, 2025. https://doi.org/10.1016/j.advwatres.2025.105124

How to cite: Nowak, W., Ahmed, W., and Buccini, E.: How far can we stretch big-data ideas with limited data? Machine-learned groundwater level predictions at a continental scale with smaller and smaller data sets., EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2374, https://doi.org/10.5194/egusphere-egu26-2374, 2026.