EGU24-7941, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-7941
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Prediction modelling framework comparative analysis of dissolved oxygen concentration variations using support vector regression coupled with multiple feature engineering and optimization methods: A case study in China

Xizhi Nong1,2, Cheng Lai1, Lihua Chen1, and Jiahua Wei2
Xizhi Nong et al.
  • 1Guangxi University, College of Civil Engineering and Architecture, Water Resources and Hydropower Engineering, Nanning, China (nongxizhi@gxu.edu.cn)
  • 2State Key Laboratory of Hydroscience and Engineering, Tsinghua University, Beijing 100084, China

Dissolved oxygen (DO) is an essential indicator for assessing water quality and managing aquatic environments, but it is still a challenging topic to accurately understand and predict the spatiotemporal variation of DO concentrations under the complex effects of different environmental factors. In this study, a practical prediction framework was proposed for DO concentrations based on the support vector regression (SVR) model coupling multiple intelligence techniques (i.e., four data denoising techniques, three feature selection rules, and four hyperparameter optimization methods). The holistic framework was tested using a data matrix (17532 observation data in total) of 12 indicators from three vital water quality monitoring stations of the longest inter-basin water diversion project in the world (i.e., the Middle-Route of the South-to-North Water Diversion Project of China), during the year 2017 to 2020 period. The results showed that the framework we advocated for could successfully and accurately predict DO concentration variations in different geographical locations. The model used the “wavelet analysis–LASSO regression–random search–SVR” combination of the Waihuanhe station has the best prediction performance, with the Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), and coefficient of determination (R2) values of 0.251, 0.063, 0.190, and 0.911, respectively. The combined methods using feature selection and hyperparameter optimization techniques can significantly promote the robustness and accuracy of the prediction model and can provide a new universal and practical way for investigating and understanding the environmental drivers of DO concentration variations. For the water quality management department, this proposed comprehensive framework can also identify and reveal the key parameters that should be concerned and monitored under different environmental factors change. More studies in terms of assessing potential integrated water quality risk using multi-indicators in mega water diversion projects and/or similar water bodies are required in the future.

How to cite: Nong, X., Lai, C., Chen, L., and Wei, J.: Prediction modelling framework comparative analysis of dissolved oxygen concentration variations using support vector regression coupled with multiple feature engineering and optimization methods: A case study in China, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7941, https://doi.org/10.5194/egusphere-egu24-7941, 2024.

Comments on the supplementary material

AC: Author Comment | CC: Community Comment | Report abuse

supplementary materials version 1 – uploaded on 16 Apr 2024, no comments