EGU26-21505, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-21505
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Tuesday, 05 May, 14:00–15:45 (CEST), Display time Tuesday, 05 May, 14:00–18:00
 
Hall X3, X3.105
Explainable Machine Learning for Spatio-Temporal Groundwater Vulnerability Mapping: A Random Forest-DRASTIC-Transit Time Framework for Western Thessaly, Greece
Paraskevas Tsangaratos1, Ioannis Matiatos2, Ioanna Ilia3, and Konstantinos Markantonis4
Paraskevas Tsangaratos et al.
  • 1National Technical University of Athens, ATHENS, Greece (ptsag@metal.ntua.gr)
  • 2National Technical University of Athens, ATHENS, Greece (imatiatos@mail.ntua.gr)
  • 3National Technical University of Athens, ATHENS, Greece (gilia@mail.ntua.gr)
  • 4National Technical University of Athens, ATHENS, Greece (markantonis@mail.ntua.gr)

Groundwater pollution is a persistent, largely hidden risk in Mediterranean farming basins such as Western Thessaly (Greece), where heavy irrigation, seasonal recharge pulses, and highly variable geology can speed up the movement of contaminants from the land surface into aquifers, making intrinsic vulnerability maps essential for early warning, land-use decisions, and risk-aware governance; however, the widely used DRASTIC index—despite its practicality—relies on fixed weights and linear scoring, which limits its ability to capture nonlinear relationships and changing, time-dependent exposure. To overcome these constraints, we present a hybrid, explainable framework that strengthens the classic DRASTIC structure by introducing an eighth factor, Transit Time (TT), and pairing the resulting parameter set with a tree-based machine learning approach—centered on Random Forest (RF)—to improve predictive skill, spatial detail, and interpretability. We build and compare four configurations: a baseline 7-parameter DRASTIC map (DRASTIC A), an extended DRASTIC map with TT (DRASTIC B), an RF model trained on the original seven DRASTIC layers (RF A), and an RF model trained on the seven layers plus TT (RF B). The models draw on thematic raster layers (e.g., depth to groundwater, recharge, soil, aquifer media, vadose zone characteristics) sampled at nitrate monitoring locations, with TT included as a practical proxy for travel-time delay and attenuation processes that influence when and how strongly pollution signals reach the aquifer. Because spatial autocorrelation can inflate performance when using ordinary random splits, we adopt spatial cross-validation (block- and buffer-based schemes) to better test real-world transferability, address class imbalance with SMOTE, and evaluate outcomes using accuracy, F1-score, class-wise precision/recall, ROC-AUC, and confusion matrices, with special attention to correctly identifying high and very-high vulnerability areas. Among all approaches, RF B performs best (accuracy 0.8214; F1 0.8788), indicating that the combination of nonlinear learning and transit-time information yields clearer, more reliable discrimination of vulnerable zones than either index mapping alone or RF without TT. To make the models transparent and defensible for stakeholders, we apply explainable AI methods—permutation importance and SHAP—to reveal both overall driver rankings and local, pixel-level contributions; consistently, depth to groundwater, vadose zone influence, and recharge stand out as the strongest controls, while TT, although not always dominant in global importance, meaningfully sharpens the spatial tracing of vulnerable corridors and pathways. Finally, to support risk-informed planning under uncertainty, we produce confidence maps based on maximum predicted class probability and normalized entropy maps that summarize ambiguity across classes, clearly separating areas where the model is both confident and vulnerable from areas where predictions are uncertain and additional monitoring or field verification is justified; these layers are masked for nodata regions and designed for direct integration into management workflows. Overall, the proposed Random Forest–DRASTIC–Transit Time framework demonstrates how a spatially validated, explainable ML extension of DRASTIC can deliver more detailed, decision-ready vulnerability maps by blending static hydrogeologic controls with dynamic travel-time behavior, offering a scalable pathway for more sustainable groundwater protection as environmental pressures intensify.

How to cite: Tsangaratos, P., Matiatos, I., Ilia, I., and Markantonis, K.: Explainable Machine Learning for Spatio-Temporal Groundwater Vulnerability Mapping: A Random Forest-DRASTIC-Transit Time Framework for Western Thessaly, Greece, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21505, https://doi.org/10.5194/egusphere-egu26-21505, 2026.