EGU22-6062
https://doi.org/10.5194/egusphere-egu22-6062
EGU General Assembly 2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Machine learning-based water quality modeling at national level in data-scarce region 

Holger Virro1, Alexander Kmoch1, Marko Vainu2, and Evelyn Uuemaa1
Holger Virro et al.
  • 1University of Tartu, Department of Geography, Tartu, Estonia (evelyn.uuemaa@ut.ee)
  • 2Institute of Ecology, Tallinn University, Uus-Sadama 5, 10120, Tallinn, Estonia

Water quality modeling plays an important role in better understanding the magnitude and impact of water quality issues and in providing evidence for policy-making and implementing measures to mitigate water pollution. Process-based nutrient models are very complex, requiring a lot of input parameters and computationally expensive calibration. Often there is also a lack of high spatial and temporal resolution water quality data because water sampling is expensive and river water quality can’t be measured using remote sensing. Machine learning approaches have been shown to achieve similar accuracy to the physically-based models and even outperform them when describing nonlinear relationships. We used 242 observation sites located at 139 streams in Estonia, amounting to 469 yearly total nitrogen (TN) and 470 total phosphorus (TP) measurements covering the period 2016–2020 to train random forest models for predicting N and P concentrations. We used a total of 82 predictor variables, including land cover, soil, climate, and topography parameters, and applied a feature selection strategy to reduce the number of dependent features in the model. The models resulted in an accuracy of 82% in the case of TN and 54% for TP. The SHAP (SHapley Additive exPlanations) values used to explain the models showed that the most important features for predicting TN were arable land proportion, soil rock content, and hydraulic conductivity, while the main features affecting TP concentration were the urban and grassland proportion in the catchment. The results indicate that the TN model is a viable alternative to process-based models in Estonia. In the case of TP, the derived feature importances and feature interactions can potentially help improve the corresponding model in the future. 

How to cite: Virro, H., Kmoch, A., Vainu, M., and Uuemaa, E.: Machine learning-based water quality modeling at national level in data-scarce region , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6062, https://doi.org/10.5194/egusphere-egu22-6062, 2022.

Displays

Display file