EGU26-729, updated on 13 Mar 2026
https://doi.org/10.5194/egusphere-egu26-729
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Tuesday, 05 May, 16:30–16:40 (CEST)
 
Room C
Rapid Prediction of Contaminant Arrival Times in Stream–Aquifer Systems Using Machine Learning
uğur boyraz and Hayri Baycan
uğur boyraz and Hayri Baycan
  • Istanbul University-Cerrahpasa, Faculty of Engineering, Civil Engineering Department, Istanbul, Türkiye (uboyraz@iuc.edu.tr)

Groundwater quality is increasingly threatened by urban, agricultural, and industrial pressures, many of which introduce persistent pollutants into aquifers. Reliable prediction of solute migration toward surface water bodies is therefore critical for sustainable water‐resources management. This study investigates groundwater contamination dynamics by integrating an analytical groundwater-flow solution with a numerical advection–dispersion model and machine learning (ML). The objective is to improve predictive capability for contaminant arrival timing in stream–aquifer systems while reducing the computational burden associated with repeated physical simulations. This work contributes to the growing field of hybrid, data-driven groundwater modelling by demonstrating how machine-learning surrogates can efficiently emulate computationally intensive contaminant-transport simulations.

A hybrid computational framework was developed in which groundwater flow was solved analytically to obtain the spatial distribution of hydraulic heads and the corresponding stream–aquifer interaction fluxes. These analytically derived velocities along the stream boundary were then used as inputs to an explicit finite-difference solution of the advection–dispersion equation (ADE) for an instantaneous point-source release. The aquifer domain was discretized into a 40×40 grid, and Darcy velocities along the (0, y) interface were multiplied by local solute concentrations to obtain spatially distributed mass fluxes. Numerical integration (trapezoidal rule) yielded the total mass discharged into the river as a function of time. The time at which this discharge reached its maximum was extracted and used as the ML target variable. To explore a wide range of hydrogeological behaviors, a synthetic dataset was generated by sampling physically meaningful parameter ranges, including streambed slope, river length, aquifer width, longitudinal and transverse dispersivities, molecular diffusion, hydraulic conductivity, and initial particle positions. A total of 1200 analytical–numerical realizations were generated and partitioned into training and verification subsets to enable unbiased ML evaluation. All realizations were simulated using a uniform grid resolution to maintain numerical consistency across varying aquifer geometries. Preprocessing involved eliminating variables that did not influence arrival timing, such as total contaminant mass. Spearman correlation analysis and physics-based reasoning indicated that the transverse-dispersivity multiplier and molecular diffusion coefficient contributed negligibly to the target variable and were removed. Physics-informed feature engineering was then applied to strengthen predictor–response relationships, producing composite variables such as hydraulic-gradient proxies, dimensionless spatial coordinates, transmissivity-like ratios, and domain geometry indicators. After removing outliers via the IQR method and applying a logarithmic transformation to the target variable, a CatBoostRegressor model was optimized through Bayesian hyperparameter search. Model evaluation using R², RMSE, MAE, MAPE, and PBIAS demonstrated strong predictive skill with minimal bias (such as R² = 0.9367). These results indicate that the analytical–numerical–ML framework offers a computationally efficient alternative to repeated contaminant-transport simulations and reliably estimates contaminant-arrival timing across a wide spectrum of hydrogeologic settings.

How to cite: boyraz, U. and Baycan, H.: Rapid Prediction of Contaminant Arrival Times in Stream–Aquifer Systems Using Machine Learning, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-729, https://doi.org/10.5194/egusphere-egu26-729, 2026.