- 1Aalto University, School of Science, Department of Applied Physics, Espoo, Finland (linus.lind@aalto.fi, hilda.sandstrom@aalto.fi, patrick.rinke@aalto.fi)
- 2Technical University of Munich, TUM School of Natural Sciences, Physics Department, Garching, Germany (patrick.rinke@aalto.fi)
- 3Munich Data Science Institute, Technical University of Munich, Atomistic Modelling Center, Garching, Germany (patrick.rinke@aalto.fi)
Aerosol formation is a complex process involving numerous molecules, whose identities and environmental variations remain largely uncharted (Bianchi et al., 2019). Computational simulations and property prediction tools have emerged to identify compounds likely to participate in the particle formation process (Elm et al., 2020). In recent years, predictive machine learning models for saturation vapor pressure and partition coefficient estimation have achieved impressive accuracy, with mean absolute errors within one order of magnitude (Besel et al., 2023; Lumiaro et al., 2021); an advancement that enables the categorization of molecules into different volatility regions. However, the interpretability of these models in molecular sciences is often limited unless the molecular descriptor used is easily interpretable. Another challenge is that atmospheric molecules possess unique characteristics that may be overlooked by standard molecular representations developed in other chemical domains (Sandström et al., 2024). We hypothesize that combining sufficiently informative and interpretable descriptors with modern machine learning methods, chemical insight of these largely unknown chemical spaces can be gained in a data-driven way.
In this contribution, we introduce a new interpretable molecular descriptor, ATMOMACCS, specifically tailored to atmospheric molecules. We demonstrate its competitive performance in predicting various thermodynamic properties, such as saturation vapor pressure, vaporization enthalpy, partition coefficients, and glass-transition temperature, equaling or surpassing published results for four distinct atmospheric molecular datasets (Besel et al., 2023; Wang et al., 2017; Ferraz-Caetano et al., 2024; Li et al., 2020). Our descriptor is based on enumerating atmospherically relevant structural motifs, making it readily interpretable for atmospheric chemists. Additionally, in our approach, we analyze the relative importance of these motifs with Shapley Additive Explanations (SHAP) values (Lundberg & Lee, 2017), providing insight into the performance improvements observed. Notably, from this analysis, we found that explicitly counting the number of carbon atoms is particularly important for property prediction, though less so for water-gas phase partition coefficients. Moreover, the analysis shows that general structural motifs are roughly equally important as motifs specific to atmospheric organic chemistry, and the combinations of these two types of motifs were pivotal for predictive performance.
Our molecular descriptor, ATMOMACCS, can serve as a vital tool for advancing data-driven atmospheric science, addressing the need for more customized and accurate modelling in the field. Furthermore, the descriptor’s inherent interpretability and its strong performance in thermodynamic property prediction, with machine learning, show promise for further research in atmospheric chemistry.
This work was supported by the VILMA (Virtual laboratory for molecular level atmospheric transformations) centre of excellence funded by the Academy of Finland under grant 346377.
Besel, V. et al. (2023). Sci. Data 10, 1–11.
Bianchi, F et al. (2019). Chem. Rev. 119, 3472–3509.
Elm, J. et al. (2020) J. Aerosol Sci. 149, 105621.
Ferraz-Caetano, J. et al. (2024). Chemosphere, 359, 142257
Li, Y. et al. (2020). Atmos. Chem. Phys. 20, 8103–8122.
Lumiaro, E. et al. (2021). Atmos. Chem. Phys. 21, 13227–13246.
Lundberg, S. M. and Lee S.-I. (2017). Curran Associates Inc. 30, 9781510860964.
Sandström, H. et al. (2024). Adv. Sci. 11, 2306235.
Wang, C. et al. (2017). Atmos. Chem. Phys. 17, 7529–7540.
How to cite: Lind, L., Sandström, H., and Rinke, P.: ATMOMACCS: Predicting atmospheric compound properties, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-5719, https://doi.org/10.5194/egusphere-egu25-5719, 2025.