Explainable Machine Learning for Spatio-Temporal Groundwater Vulnerability Mapping: A Random Forest-DRASTIC-Transit Time Framework for Western Thessaly, Greece

Paraskevas Tsangaratos; Ioannis Matiatos; Ioanna Ilia; Konstantinos Markantonis

doi:https://doi.org/10.5194/egusphere-egu26-21505

[Back] [Session NH6.10]

EGU26-21505, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-21505

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Explainable Machine Learning for Spatio-Temporal Groundwater Vulnerability Mapping: A Random Forest-DRASTIC-Transit Time Framework for Western Thessaly, Greece

Paraskevas Tsangaratos¹, Ioannis Matiatos², Ioanna Ilia³, and Konstantinos Markantonis⁴

Paraskevas Tsangaratos et al.

¹National Technical University of Athens, ATHENS, Greece (ptsag@metal.ntua.gr)
²National Technical University of Athens, ATHENS, Greece (imatiatos@mail.ntua.gr)
³National Technical University of Athens, ATHENS, Greece (gilia@mail.ntua.gr)
⁴National Technical University of Athens, ATHENS, Greece (markantonis@mail.ntua.gr)

Groundwater pollution is a persistent, largely hidden risk in Mediterranean farming basins such as Western Thessaly (Greece), where heavy irrigation, seasonal recharge pulses, and highly variable geology can speed up the movement of contaminants from the land surface into aquifers, making intrinsic vulnerability maps essential for early warning, land-use decisions, and risk-aware governance; however, the widely used DRASTIC index—despite its practicality—relies on fixed weights and linear scoring, which limits its ability to capture nonlinear relationships and changing, time-dependent exposure. To overcome these constraints, we present a hybrid, explainable framework that strengthens the classic DRASTIC structure by introducing an eighth factor, Transit Time (TT), and pairing the resulting parameter set with a tree-based machine learning approach—centered on Random Forest (RF)—to improve predictive skill, spatial detail, and interpretability. We build and compare four configurations: a baseline 7-parameter DRASTIC map (DRASTIC A), an extended DRASTIC map with TT (DRASTIC B), an RF model trained on the original seven DRASTIC layers (RF A), and an RF model trained on the seven layers plus TT (RF B). The models draw on thematic raster layers (e.g., depth to groundwater, recharge, soil, aquifer media, vadose zone characteristics) sampled at nitrate monitoring locations, with TT included as a practical proxy for travel-time delay and attenuation processes that influence when and how strongly pollution signals reach the aquifer. Because spatial autocorrelation can inflate performance when using ordinary random splits, we adopt spatial cross-validation (block- and buffer-based schemes) to better test real-world transferability, address class imbalance with SMOTE, and evaluate outcomes using accuracy, F1-score, class-wise precision/recall, ROC-AUC, and confusion matrices, with special attention to correctly identifying high and very-high vulnerability areas. Among all approaches, RF B performs best (accuracy 0.8214; F1 0.8788), indicating that the combination of nonlinear learning and transit-time information yields clearer, more reliable discrimination of vulnerable zones than either index mapping alone or RF without TT. To make the models transparent and defensible for stakeholders, we apply explainable AI methods—permutation importance and SHAP—to reveal both overall driver rankings and local, pixel-level contributions; consistently, depth to groundwater, vadose zone influence, and recharge stand out as the strongest controls, while TT, although not always dominant in global importance, meaningfully sharpens the spatial tracing of vulnerable corridors and pathways. Finally, to support risk-informed planning under uncertainty, we produce confidence maps based on maximum predicted class probability and normalized entropy maps that summarize ambiguity across classes, clearly separating areas where the model is both confident and vulnerable from areas where predictions are uncertain and additional monitoring or field verification is justified; these layers are masked for nodata regions and designed for direct integration into management workflows. Overall, the proposed Random Forest–DRASTIC–Transit Time framework demonstrates how a spatially validated, explainable ML extension of DRASTIC can deliver more detailed, decision-ready vulnerability maps by blending static hydrogeologic controls with dynamic travel-time behavior, offering a scalable pathway for more sustainable groundwater protection as environmental pressures intensify.

How to cite: Tsangaratos, P., Matiatos, I., Ilia, I., and Markantonis, K.: Explainable Machine Learning for Spatio-Temporal Groundwater Vulnerability Mapping: A Random Forest-DRASTIC-Transit Time Framework for Western Thessaly, Greece, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21505, https://doi.org/10.5194/egusphere-egu26-21505, 2026.