EGU24-10984, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-10984
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Comparative analysis of Machine Learning models for predicting the trihalomethanes formation potential in a Drinking Water Treatment Plant in Spain

Mireia Pla-Castellana1, Oriol Gutierrez2,4, Jordi Raich-Montiu3, and Wolfgang Gernjak2,5
Mireia Pla-Castellana et al.
  • 1Karlsruhe Institut of Technology, Institute of Meteorology and Climate Research Atmospheric Environmental Research, Stuttgart, Germany (mireiapla1988@gmail.com)
  • 2Catalan Institute for Water Research (ICRA), Girona, Spain
  • 3s::can Iberia Sistemas de Medición, Barcelona, Spain
  • 4Universitat de Girona (UdG), Girona, Spain
  • 5Institució Catalana de Recerca i Estudis Avançats (ICREA), Spain

Trihalomethanes (THMs), which may be harmful to human health if ingested or inhaled, are produced when organic matter reacts with chlorine. Hence, their formation during potabilization requires to be controlled to ensure safe drinking water.

In this study, the predictive capacity of a Multiple Linear Regression (MLR) and an Artificial Neural Networks (ANN) models have been compared with real-time field-scale data of the THM formation potential (THM FP) from a Spanish Drinking Water Treatment Plant (DWTP). Spectral absorbance data obtained with Spectro::lyser® probes, installed in several treatment steps of the plant were the independent variables used to construct the models. Variable selection was based on the Stepwise Selection (SS) procedure.

Following the fitting of the investigated models, ANN demonstrated precise goodness of fit (R2 = 0.92; RMSE = 0.77), clearly outperforming the MLR model (R2 = 0.35; RMSE = 1.65). Severe multicollinearity among wavelengths is responsible for the model's accuracy difference. Even though it was reduced by a prior study on the Variance Inflation Factor (VIF), it was still very high for some of the remaining wavelengths. As a result of this effect, large fictitious correlations were produced, which adversely impacted the MLR model's prediction performance (R2 = 0.30 in the validation set). While R2 reduced, indicating perhaps a slight overtraining of the ANN, the resulting R2 in the validation set (0.72) was still very high

This study proved that Machine Learning models such as Artificial Neural Networks based on spectral absorption data can enhance the ability of operators to respond to critical events, becoming a decisive component of the daily management of drinking water in DWTP when needed.

How to cite: Pla-Castellana, M., Gutierrez, O., Raich-Montiu, J., and Gernjak, W.: Comparative analysis of Machine Learning models for predicting the trihalomethanes formation potential in a Drinking Water Treatment Plant in Spain, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10984, https://doi.org/10.5194/egusphere-egu24-10984, 2024.