Using explainable machine learning to better understand source and process contributions to atmospheric bio-aerosol

Hao Zhang; Congbo Song; David Topping; Ian Crawford; Martin Gallagher; Man Nin Chan; Hing Bun martin Lee; Sinan Xing; Tsin Hung Ng; Amos Tai

doi:https://doi.org/10.5194/egusphere-egu24-16338

[Back] [Session AS3.8]

EGU24-16338, updated on 09 Mar 2024

https://doi.org/10.5194/egusphere-egu24-16338

EGU General Assembly 2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Using explainable machine learning to better understand source and process contributions to atmospheric bio-aerosol

Hao Zhang¹, Congbo Song², David Topping¹, Ian Crawford¹, Martin Gallagher¹, Man Nin Chan³, Hing Bun martin Lee³, Sinan Xing³, Tsin Hung Ng³, and Amos Tai³

Hao Zhang et al.

¹The University of Manchester, Center of Atmospheric Sciences, Department of Earth and Environmental Sciences, Manchester, United Kingdom of Great Britain – England, Scotland, Wales (hao.zhang-26@postgrad.manchester.ac.uk)
²National Centre for Atmospheric Science (NCAS), The University of Manchester, Manchester, United Kingdom of Great Britain – England, Scotland, Wales
³Faculty of Science, The Chinese University of Hong Kong, Hong Kong, China

The role of atmospheric bio-aerosols as determinants of environmental and human health outcomes is receiving more attention. However, a lack of fully evaluated end-to-end detection techniques hinders our understanding of identifying bioaerosol types and their environmental drivers, particularly in complex environments. In this study we mitigate these challenges through development of a novel machine learning framework that combines unsupervised deep learning and explainable machine learning techniques. The first step combines bidirectional long short-term memory autoencoder (Bilstm-AE) and a relatively new hierarchical, fast, clustering technique. Our results indicate that this approach outperforms other models, successfully distinguishing between fungal spores, non-biological aerosols, and pollen solely based on fluorescence information without the need for training data. Subsequently using automated machine learning and the SHapley Additive eXplanation (SHAP) method, we quantitatively discerned the environmental drivers of bioaerosol types. The variation of SHAP value indicated that the elevated pollen concentrations at night could be attributed to changes in its air mass composition and origins. More importantly, we find ambient evidence that pollen may break into smaller fragments when RH is over 90, leading to significant changes in its fluorescence spectrum and a rapid increase in its concentration. Overall we find that combining unsupervised deep learning and explainable machine learning could provide new insights into type-specific bioaerosols process.

How to cite: Zhang, H., Song, C., Topping, D., Crawford, I., Gallagher, M., Chan, M. N., Lee, H. B. M., Xing, S., Ng, T. H., and Tai, A.: Using explainable machine learning to better understand source and process contributions to atmospheric bio-aerosol, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16338, https://doi.org/10.5194/egusphere-egu24-16338, 2024.