- 1School of Engineering, Morwick G360 Groundwater Research Institute, University of Guelph, Guelph, Ontario, Canada
- 2Department of Earth and Atmospheric Sciences, Université du Québec à Montréal, Montréal, Québec, Canada
- 3Ministry of the Environment, Conservation and Parks (MECP), Etobicoke, Ontario, Canada
In North America, the Great Lakes contain approximately 20% of the available surface fresh water in the world. As a result, the Great Lakes Basin (GLB) is a well-known region for its extensive agricultural and food production activities. Such agricultural activities are considered one of the most significant non-point sources of nutrient transport, particularly nitrogen and phosphorus, to surface water and groundwater. This is mainly because of the application of synthetic fertilizers and manure for enhanced crop productivity and soil fertility. Such elevated nutrient concentrations can disrupt aquatic ecosystems, degrade surface and groundwater quality, and harm both human and aquatic life. However, quantification of nutrient concentrations in agricultural watersheds is challenging because it is influenced by different process parameters including soil type, climate, and land use conditions. These parameters are highly non-linear and uncertain which hinders the applicability of typical mathematical models in nutrient transport applications in surface water and groundwater quality. Therefore, data-driven models using machine learning (ML) algorithms have been extensively applied to unravel the complexities of nutrient transport in surface water and groundwater, tackling the main challenges associated with the mathematical models. This is mainly because ML algorithms can deal with complex datasets with high uncertainty and non-linearity while considering the interdependence between the process parameters. By leveraging historical datasets, ML algorithms can model the explain the cause-result and intricate interdependencies between process parameters, making them well-suited for simulating nutrient transport processes in surface and sub-surface water applications. In the current study, different ML algorithms were adopted to predict nutrient concentrations in surface water and groundwater in a sand plain agricultural watershed within the GLB in Ontario, Canada. These ML algorithms included regression (e.g., artificial neural network) and classification (e.g., decision trees) techniques to better simulate nutrient concentrations in surface water and groundwater. The ML input variables involved meteorological (e.g., precipitation), hydrogeological (e.g., groundwater levels), and water physico-chemical (e.g., pH) conditions. The performance of these ML algorithms was evaluated using different evaluation metrics such as root-mean squared error and F1-score for regression and classification models, respectively. The optimal ML models were selected according to the outcomes of these evaluation metrics. In addition, the interdependence between the involved process parameters (e.g., land use and precipitation) and nutrient concentrations was interpreted to determine the governing parameters on the nutrient transport process in surface and sub-surface water. The main outcomes of this study can help decision-makers in assessing the most effective management efforts to protect and improve surface water and groundwater quality in agricultural watersheds. In addition, these insights enable the interpolation of nutrient concentrations from discrete sampling points, facilitating predictions at unmonitored locations across the watersheds.
How to cite: Elsayed, A., Levison, J., Binns, A., Larocque, M., and Goel, P.: Harnessing Machine Learning for Water Quality Prediction in Agricultural Watersheds, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-14665, https://doi.org/10.5194/egusphere-egu25-14665, 2025.