- University of Illinois, Prairie Research Institute, Champaign, United States of America (egetahun@illinois.edu)
Accurate prediction of nitrate‑nitrogen (NO₃‑N) and total phosphorus (TP) loads is critical for managing water quality in agricultural watersheds, where excess nutrient runoff can contribute to downstream eutrophication. This study applies Random Forest (RF) regression and conditional inference RF to predict monthly NO₃‑N and TP loads at eight monitoring gages within Conservation Reserve Enhancement Program (CREP) watersheds in the Illinois River and Kaskaskia River basins. The machine learning (ML) models were trained using hydroclimatic, land‑use, and nutrient datasets from 2000–2022 and validated with 2023 observations.
The predictor variables included discharge, precipitation, temperature, land use, total Kjeldahl nitrogen (TKN), suspended sediment, and septic system density in the watersheds. Multiple modeling strategies were evaluated, including full‑feature, reduced‑feature (i.e., derived through importance thresholds or removal of collinear nutrient variables), and hyperparameter‑tuned configurations. Model performance was assessed using Nash–Sutcliffe Efficiency (NSE), R², and RMSE, and interpretability was evaluated through feature‑importance metrics and SHAP analyses.
Monthly Random Forest models effectively captured seasonal nutrient dynamics. Discharge consistently emerged as the dominant predictor of NO₃‑N loads, while interactions among variables, particularly TKN and suspended sediment, played major roles in predicting TP. Land use and septic system density exhibited limited predictive influence. Model performance was strong across configurations, with training NSE values exceeding 0.95 and validation NSE frequently above 0.9. However, reduced skill during summer and fall suggested the influence of unrepresented processes such as evapotranspiration. The most stable performance across sites and seasons was achieved with hyperparameter‑tuned, full‑feature models. SHAP analyses revealed clear linkages between hydrologic and biogeochemical processes, while Spearman correlation heatmaps highlighted strong covariation among nutrient loads and moderate coupling with climatic variables.
These results demonstrate the value of machine‑learning approaches such as RF as complementary alternatives to process‑based models like SWAT, offering robust tools for informing nutrient‑reduction strategies and supporting policy decisions in impaired agricultural watersheds.
How to cite: Getahun, E. and Kharosekar, R.: Data‑Driven Modeling of Nutrient Dynamics: Random Forest Predictions of Nitrate and Total Phosphorus Loads in Illinois, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16874, https://doi.org/10.5194/egusphere-egu26-16874, 2026.