EGU25-16141, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-16141
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Spatial Prediction and Assessment of Environmental Drivers of Geogenic Arsenic in European Topsoil: A Machine Learning Approach to Food Safety Risks
Kai-Yun Li1, Gustavo Covatti1,2, Joel Podgorski1, and Michael Berg1
Kai-Yun Li et al.
  • 1Eawag, Swiss Federal Institute of Aquatic Science and Technology, Department Water Resources and Drinking Water, 8600 Dübendorf, Switzerland
  • 2Institute of Biogeochemistry and Pollutant Dynamics, ETH Zurich, 8092 Zurich, Switzerland

Arsenic contamination in topsoil, primarily geogenic in origin, presents significant public health risks due to its potential accumulation in agricultural products. This study focuses on predicting geogenic arsenic concentrations across European topsoil using machine learning. The analysis integrates geochemical data from the GEMAS (Geochemical Mapping of Agricultural and Grazing Land Soil in Europe) database and 15 environmental variables (climate, geological, soil, and hydrological factors) to create a map predicting the spatial arsenic occurrence at a resolution of 1 km. A threshold of 20 mg/kg was selected based on general European guidelines and its relevance to potential phytotoxicity risks.

A Random Forest (RF) algorithm is developed and applied to model the probability of arsenic exceeding the widely recognized soil guideline value of 20 mg/kg, used in many European countries. To ensure robustness, 100 iterations are performed. Model efficiency is improved through Recursive Feature Elimination (RFE), which reduces the number of predictors from 35 to 15 features. Performance is assessed using metrics including Area Under the Curve (AUC), sensitivity, and specificity. SHapley Additive exPlanations (SHAP) analysis identifies key predictors, including distance to mineral deposits, latitude, and hydrological conditions. The model preliminarily reveals that 9.2% of European grasslands and 3% of croplands, particularly in France and Spain, exceed 20 mg/kg. In areas with elevated arsenic levels, more than 5% of each crop category, including wheat, maize, rapeseed, and fodder crops, is cultivated in potentially hazardous agricultural regions.

The study highlights the important environmental variables for mapping arsenic hotspots and emphasizes the need for regional assessments to better understand arsenic hazards. While it provides an overview of arsenic occurrence in soil across Europe, local geological variability and anthropogenic impacts require further investigation. Further efforts should aim to develop models at regional to national scales to enhance arsenic risk assessments for food safety and public health. This research will strengthen intervention effectiveness and improve the prediction and management of trace element presence in soil across broader regions.

 

How to cite: Li, K.-Y., Covatti, G., Podgorski, J., and Berg, M.: Spatial Prediction and Assessment of Environmental Drivers of Geogenic Arsenic in European Topsoil: A Machine Learning Approach to Food Safety Risks, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16141, https://doi.org/10.5194/egusphere-egu25-16141, 2025.