Prediction of airborne allergenic pollen concentrations with machine learning
- 1Faculty of Earth Sciences and Environmental Management, University of Wrocław, Kosiby 8, 51-621 Wroclaw, Poland (tetiana.vovk@uwr.edu.pl)
- 2Institute of Computer Science, University of Wroclaw, Joliot-Curie 15, 50-383 Wrocław, Poland
Over the past 30 years, the prevalence of allergies has increased continuously. Allergic rhinitis and asthma are among the most frequent non-communicable diseases and cause serious public health concerns worldwide, with the highest prevalence rates among children and adolescents. Moreover, experts assume that climate change will worsen the impact of allergies within the next decades. Therefore, it is essential to develop high quality methods and tools that can forecast allergenic pollen in the air to prevent sensitized against contact to high concentrations of aeroallergens.
In this study we aim to develop a tool for prediction of pollen concentrations based on machine learning (ML) methods with the use of measured pollen concentrations and modelled meteorological parameters. We focus on the birch pollen, which is the most allergenic tree taxon in Central Europe. We use daily pollen concentration from Wrocław aerobiological station (south-west Poland) for years 2006 – 2022. Pollen grains were gathered with the use of the Burkard trap and counted following the recommendations of the International Association for Aerobiology. Meteorological data for the analysed period were provided with the Weather Research and Forecasting (WRF) model. We test different machine learning algorithms including: Random Forest, xgBoost, Support Vector Regression (SVR) and Multilayer Perceptron (MLP). The algorithms are used to detect the days with pollen concentrations exceeding the threshold levels of 20, 75 and 90 pollen m-3, which correspond to the first symptoms, symptoms in all subjects and severe symptoms, respectively.
For each ML algorithm, the whole data set was split into training and testing subsets in a proportion where the training set was ¾ of the data and the rest was the independent test set for the final model verification. Each model was checked during cross-validation for optimal hyperparameters. We test different parameters, including temporal variables and lagged predictors (e.g. pollen concentrations, air temperature, relative humidity, wind speed, solar radiation, planetary boundary height, rainfall – all derived with the mesoscale meteorological WRF model) to choose the most significant for prediction of pollen concentrations. We also compare the performance of the different algorithms in terms of such error metrics as F1 score, ROC-AUC and PR-AUC. The results of the analysis will be applied to forecast pollen concentrations based on the automatic pollen detector, newly installed at the station, and weather forecasts.
How to cite: Vovk, T., Kryza, M., Tomczyk, S., Malkiewicz, M., Lipiński, P., and Werner, M.: Prediction of airborne allergenic pollen concentrations with machine learning , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9473, https://doi.org/10.5194/egusphere-egu24-9473, 2024.