- 1Departamento de Geodinámica, Estratigrafía y Paleontología. Facultad de Ciencias Geológicas. Universidad Complutense de Madrid. C/José Antonio Novais 12. 28040 Madrid, Spain. (manuro21@ucm.es)
- 2Instituto Geológico y Minero de España (IGME-CSIC) C/Ríos Rosas 23. 28003 Madrid, Spain.
Advances in machine learning offer new opportunities to enhance the assessment and regional-scale mapping of groundwater nitrate contamination, a long-standing and widespread environmental problem. This study illustrates this potential in the Duero River Basin (Spain), where nitrate concentrations in aquifers have increased steadily over recent decades due to intensive agricultural and livestock farming. Machine learning techniques are applied to predict groundwater nitrate pollution using monitoring data combined with spatially derived environmental and anthropogenic predictors, framing the problem as a binary classification task based on a threshold concentration of 37.5 mg/L. Several tree-based ensemble algorithms were evaluated, with Random Forest selected due to its superior predictive performance and robustness. Model reliability was ensured through a repeated nested cross-validation strategy, resulting in an ensemble of 50 models and the generation of out-of-fold probability estimates. Model performance was evaluated using metrics tailored to imbalanced datasets and focused on the minority class, including the F1-score and the Area Under the Precision–Recall Curve. A temporal analysis based on different hydrological years was conducted to assess the persistence and spatial variability of nitrate pollution risk over time. Spatial validity and model reliability were further evaluated by comparing predicted risk patterns with officially designated nitrate vulnerable zones (NVZs). This comparison revealed a high degree of agreement, while also identifying areas outside current NVZs boundaries exhibiting similar contamination characteristics, suggesting the presence of potentially unrecognised nitrate pollution risks. Model interpretability was explored using SHAP values, which highlighted precipitation, diffuse agricultural pressures, distance to surface water bodies, NDVI, and soil properties as the most influential predictors of nitrate contamination. Overall, the results demonstrate the value of interpretable machine learning approaches for improving the assessment, understanding, and management of groundwater nitrate pollution at the basin scale.
How to cite: Rodríguez del Rosario, M., Gómez-Escalonilla, V., de la Hera-Portillo, Á., and Martínez-Santos, P.: Reliable and interpretable machine learning for groundwater nitrate pollution mapping: the Duero River Basin (Spain), EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20586, https://doi.org/10.5194/egusphere-egu26-20586, 2026.