Interpretable pollen classification using empirical feature filtering and random forest&nbsp;models on holographic airflow cytometry data

Andreas Schwendimann; Kilian Koch; Yanick Zeder; Erny Niederberger; Sophie Erb

doi:https://doi.org/10.5194/egusphere-egu26-23287

[Back] [Session AS3.4]

EGU26-23287, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-23287

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Interpretable pollen classification using empirical feature filtering and random forest models on holographic airflow cytometry data

Andreas Schwendimann¹, Kilian Koch¹, Yanick Zeder¹, Erny Niederberger¹, and Sophie Erb²

Andreas Schwendimann et al.

¹Swisens AG, Meierhofstrasse 5A, CH-6032 Emmen
²MeteoSwiss, Chemin de l'Aérologie, CH-1530 Payerne

Automatic pollen monitoring has become increasingly important for aerobiology, public health, and climate-related
studies. Across Europe, manual Hirst-type traps are progressively complemented or fully replaced by automatic
instruments that acquire particle-resolved measurements and apply machine-learning–based classification instead of
manual light-microscopic identification. This transition enables real-time pollen information but introduces new
challenges related to data quality, model interpretability, and computational efficiency.

SwisensPoleno instruments are airflow cytometers that measure individual airborne particles in-flight. Each particle is
characterized by an array of sensors, including two orthogonal digital holography images, from which morphological
features are derived. Previous modelling approaches for pollen classification have largely relied on deep learning
architectures leveraging the full images. While these methods can achieve high accuracy, they are computationally
expensive to train and evaluate, are prone to overfit for the particular regions where training data was generated and
exhibit a black-box nature that complicates error analysis and systematic performance improvements. Persistent offseason false positives have thus remained difficult to diagnose and mitigate.

Here, we present a fast-feedback classification pipeline that combines manual prefiltering of datasets, automatic
filtering of holography-derived features and a random forest classifier (Figure 1). Prior to model training, datasets are
manually screened and particles are automatically filtered based on deviations from empirically derived feature
distributions. This effectively cleans the training datasets and removes non-representative or artefactual samples. The
resulting training-ready datasets are then used to train random forest models, providing both competitive classification
performance and full interpretability at the feature level.

This novel approach leads to significant performance gains compared to previous methods and successfully addresses
long-standing off-season false-positive issues (Figure 2). Thanks to the reduced specificity when using random forest
based models in comparison to deep-learning based models, the classification performance has proven to be robust
comparing 6 different locations in Southern Europe over multiple years. The proposed methodology offers a transparent,
computationally

How to cite: Schwendimann, A., Koch, K., Zeder, Y., Niederberger, E., and Erb, S.: Interpretable pollen classification using empirical feature filtering and random forest models on holographic airflow cytometry data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-23287, https://doi.org/10.5194/egusphere-egu26-23287, 2026.