EGU24-20342, updated on 11 Mar 2024
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Building A Machine Learning Model To Predict Sample Pesticide Content Utilizing Thermal Desorption MION-CIMS Analysis

Federica Bortolussi1, Hilda Sandström2, Fariba Partovi3,4, Joona Mikkilä4, Patrick Rinke2, and Matti Rissanen1,3
Federica Bortolussi et al.
  • 1Department of Chemistry, University of Helsinki, 00560 Helsinki, Finland (
  • 2Department of Applied Physics, Aalto University, Espoo, Finland
  • 3Aerosol Physics Laboratory, Physics Unit, Tampere University, 33720 Tampere, Finland
  • 4Karsa Ltd., A. I. Virtasen aukio 1, 00560 Helsinki, Finland

Pests significantly impact crop yields, leading to food insecurity. Pesticides are substances, or a mixture of substances, made to eliminate or control pests, or to regulate the growth of crops.
Currently, more than 1000 pesticides are available in the market. However, their long-lasting environmental impact necessitates strict regulation, especially regarding their presence in food (FAO, 2022). Pesticides play also a role in the atmosphere as their volatilization can produce oxidized products through photolysis or OH reactions and they can be transported over large distances.
The fundamental properties and behaviours of these compounds are still not well understood. Because of their complex structure, even low DFT level computations can be extremely expensive. 
This project applies machine learning (ML) tools to chemical ionization mass spectra to ultimately develop a technique capable of predicting spectra’s peak intensities and the chemical ionization mass spectrometry (CIMS) sensitivity to pesticides. The primary challenge is to develop a ML model that comprehensively explains ion-molecule interactions while minimizing computational costs.

Our data set comprises different standard mixtures containing, in total, 716 pesticides measured with an orbitrap atmospheric pressure CIMS, with a multi-scheme chemical ionization inlet (MION) and five different concentrations (Rissanen et al, 2019; Partovi et al, 2023). The reagents of the ionization methods are CH2Br2, H2O, O2 and (CH3)2CO, generating respectively Br- , H3O+, O2- and [(CH3)2 CO + H]+ ions.

The project follows a general ML workflow: after an exploratory analysis, the data are preprocessed and fed to the ML algorithm, which classifies the ionization method able to detect the molecule, and, therefore, predicts the peak intensity of each pesticide; the accuracy of the prediction can be retrieved after measuring the performance of the model.
A random forest classifier was chosen to perform the classification of the ionization methods, to predict which one was able to detect each pesticide. The regression was performed with a kernel ridge regressor. Each algorithm was run with different types of molecular descriptors (topological fingerprint, MACCS keys and many-body tensor representation), to test which one was able to represent the molecular structure most accurately.

The results of the exploratory analysis highlight different trends between the positive and negative ionization methods, suggesting that different ion-molecule mechanisms are involved (Figure 1). The classification reaches around 80% accuracy for each ionization method with all four molecular descriptors tested, while the regression can predict fairly well the logarithm of the intensities of each ionization method, reaching 0.5 of error with MACCS keys for (CH3)2CO reagent (Figure 2).

Figure 1: Distribution of pesticide peak intensities for each reagent ion at five different concentrations.

Figure 2: Comparison of the KRR performance on (CH3)2CO reagent data with four different molecular descriptors.



How to cite: Bortolussi, F., Sandström, H., Partovi, F., Mikkilä, J., Rinke, P., and Rissanen, M.: Building A Machine Learning Model To Predict Sample Pesticide Content Utilizing Thermal Desorption MION-CIMS Analysis, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20342,, 2024.