EGU General Assembly 2020
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Input-adaptive proxy of air quality parameters: A case study for black carbon in Helsinki, Finland

Pak L Fung1, Martha A Zaidan1, Salla Sillanpää1, Anu Kousa2, Jarkko V Niemi2, Hilkka Timonen3, Joel Kuula3, Erkka Saukko4, Krista Luoma1, Tuukka Petäjä1, Sasu Tarkoma5, Markku Kulmala1, and Tareq Hussein1,6
Pak L Fung et al.
  • 1Institute for Atmospheric and Earth System Research (INAR)/Physics, University of Helsinki, FI-00560 Helsinki, Finland (;;;;; ma
  • 2Helsinki Region Environmental Services Authority (HSY), PO Box 100, FI-00066 HSY, Finland (;
  • 3Atmospheric Composition Research, Finnish Meteorological Institute, FI-00560 Helsinki, Finland (;
  • 4Pegasor Oy, FI-33100 Tampere, Finland (
  • 5Department of Computer Science, University of Helsinki, FI-00560 Helsinki, Finland (
  • 6Department of Physics, The University of Jordan, Amman 11942, Jordan

Urban air pollution has been a global challenge, and continuous air quality measurement is important to understand the nature of the problem. However, missing data has often been an issue in air quality measurement. In this study, we presented a modified method to impute missing data by input-adaptive proxy. We used black carbon (BC) concentration data in Mäkelänkatu traffic site (TR) and Kumpula urban background site (BG) in Helsinki, Finland in 2017–2018 as training sets. The input-adaptive proxy selected input variables of other air quality variables based on their Pearson correlation coefficients with BC. In order to avoid overfitting, this proxy used the algorithm of least squares model with a bisquare weighting function and allowed a maximum of three input variables. The generated models were then evaluated and ranked by adjusted coefficient of determination (adjR2), mean absolute error and root mean square error. BC concentration was first estimated by the best model. In case of missing data in the input variables in the best model, the input-adaptive proxy then used the second-best model until all the missing data gaps were filled up.

The input-adaptive proxy managed to fill up 100% of the missing voids while traditional proxy filled only 20–80% of missing BC data. Furthermore, the overall performance of the input-adaptive proxy is reliable both in TR (adjR2=0.86–0.94) and in BG (adjR2=0.74–0.91). TR has a generally better regression performance because the level of BC can be mostly explained by traffic count, nitrogen oxides and accumulation mode. On the contrary, the source of BC in BG is more heterogeneous, which includes traffic emission and residential combustion, and the concentration of BC is influenced by meteorological parameters; therefore, the rule of including maximum three input variables might lead to the lower adjR2. The proxy works slightly better for workdays scenario than in weekends in both sites. In TR, the proxy works similarly in all seasons, while in BG, the proxy performance is better in winter and autumn than in the other seasons. The simplicity, full coverage and high reliability of the input-adaptive proxy make it sound to further estimate other air quality parameters. Moreover, it can act as an air quality virtual sensor alongside with on-site instruments.

How to cite: Fung, P. L., Zaidan, M. A., Sillanpää, S., Kousa, A., Niemi, J. V., Timonen, H., Kuula, J., Saukko, E., Luoma, K., Petäjä, T., Tarkoma, S., Kulmala, M., and Hussein, T.: Input-adaptive proxy of air quality parameters: A case study for black carbon in Helsinki, Finland, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2693,, 2020

Display materials

Display file

Comments on the display material

AC: Author Comment | CC: Community Comment | Report abuse

Display material version 2 – uploaded on 05 May 2020, no comments
Authors' names are now included.
Display material version 1 – uploaded on 05 May 2020, no comments