- 1University of Helsinki, INAR, Physics, Helsinki, Finland
- 2University of Helsinki, Department of Chemistry, Helsinki, Finland
- 3University of Helsinki, Department of Computer Science, Helsinki, Finland
Oxygenated organic molecules (OOMs), formed in the atmosphere by oxidation of volatile organic compounds, are expected to take part in new particle formation (NPF). To determine their contribution to NPF, it is necessary to sample global minima of OOM clusters. However, the complexity of potential energy surfaces and the requirement of expensive of quantum calculations makes modelling of OOM cluster formation extremely time consuming. We have previously addressed these bottlenecks by assuming that the minimum cluster energy is likely to found by maximizing the hydrogen bonds between the monomers. Thus, we initially perform a constrained sampling to force random hydrogen bond formation. Additional local minima are found by utilizing metadynamics simulations.
We further improve upon cluster sampling by replacing the costly DFT methods with significantly faster UMA and Orb-v3 neural network potentials (NNP). The pretrained models allow us optimize clusters geometries and predict cluster binding energies at near quantum chemical accuracy. We study the efficacy of the NNPs by generating dimer clusters of selected C10 sized OOMs. We find that the ability of OOMs to bind strongly is often hindered by the tendency of monomers to form intramolecular hydrogen bonds. Additionally, we show that C20 sized alpha-pinene accretion production may form cluster without the involvement of inorganic acids or ions, and their clustering ability with sulfuric acid is comparable to that of ammonia.
While our approach is more efficient, the sampling become less likely find the true global minima as cluster complexity increases. To further reduce the number of structures to needed optimize, we use the previously generated OOM cluster data to train a graph neutral network (GNN) model to predict energies of the configurations from graph-based descriptions. GNNs allow us to very quickly find a subset of hydrogen bond pairings most likely to optimize towards a new global energy minima though the prediction accuracy is significantly reduced compared to NNPs. Our goal is to train a general model which may also extrapolate to molecules and clusters not included in the training set.
How to cite: Kähärä, J., Haitsiukevich, K., Vehkamäki, H., and Kurtén, T.: Sampling of clusters of oxygenated organic molecules enhanced with machine learning models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13803, https://doi.org/10.5194/egusphere-egu26-13803, 2026.