Estimating Escherichia coli levels using drone-based RGB imagery and machine learning techniques
- 1USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA
- 2Department of Civil, Urban, Earth and Environmental Engineering, Ulsan National Institute of Science and Technology, UNIST-gil 50, Ulsan, 44919, Republic of Korea
- 3School of Civil, Environmental and Architectural Engineering, Korea University, Seoul, 02841, Republic of Korea
Rapid and efficient quantification of E. coli levels is the important goal of the microbial water quality assessment. To address this, remote sensing and machine learning algorithms have been used recently. Application of these techniques encounter challenges from a limited number of samples and imbalances in water quality datasets. This study focused on estimating E. coli concentrations in a Maryland irrigation pond during the summer season. We utilized demosaiced drone-based RGB imagery across visible and infrared spectrum ranges along with 14 water quality parameters. Employing four machine learning algorithms (Random Forest, Gradient Boosting Machine, Extreme Gradient Boosting, and K-nearest Neighbor) under three scenarios, the research explored the utilization of only water quality parameters, both water quality and drone-based RGB data, and finally, only RGB data. Two data splitting methods, traditional random data splitting (ordinary data splitting) and quantile data splitting, were employed, with the latter providing a constant splitting ratio across each decile of the E. coli concentration distribution. Quantile data splitting resulted in a very good model performances and smaller differences between training and testing datasets. The RF, GBM, and XGB models, trained with quantile data splitting and hyperparameter optimization, resulted in R2 values above 0.847 for training and 0.689 for the test dataset. The integration of water quality and imagery data led to larger R2 values exceeding 0.896 for the test dataset. Shapley additive explanations (SHAP) highlighted the visible blue spectrum intensity and water temperature as the most influential inputs to the RF model. Overall, demosaiced RGB imagery proved to be a valuable predictor for E. coli concentration across the studied irrigation pond.
How to cite: Hong, S., Morgan, B., Stocker, M., Smith, J., Kim, M., Cho, K. H., and Pachepsky, Y.: Estimating Escherichia coli levels using drone-based RGB imagery and machine learning techniques, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6648, https://doi.org/10.5194/egusphere-egu24-6648, 2024.