EGU General Assembly 2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

A Deep Learning approach to de-bias Air Quality forecasts, using heterogeneous Open Data sources as reference

Antonio Pérez1, Mario Santa Cruz1, Johannes Flemming2, and Miha Razinger2
Antonio Pérez et al.
  • 1Predictia Intelligent Data Solutions S.L., Santander, Spain (
  • 2European Centre for Medium-Range Weather Forecasts (ECMWF), Reading, United Kingdom

The degradation of air quality is a challenge that policy-makers face all over the world. According to the World Health Organisation, air pollution causes an estimate of 7 million premature deaths every year. In this context, air quality forecasts are crucial tools for decision- and policy-makers, to achieve data-informed decisions.

Global forecasts, such as the Copernicus Atmosphere monitoring service model (CAMS), usually exhibit biases: systematic deviations from observations. Adjusting these biases is typically the first step towards obtaining actionable air quality forecasts. It is especially relevant in health-related decisions, when the metrics of interest depend on specific thresholds.

AQ (Air quality) - Bias correction was a project funded by the ECMWF Summer of Weather Code (ESOWC) 2021 whose aim is to improve CAMS model forecasts for air quality variables (NO2, O3, PM2.5), using as a reference the in-situ observations provided by OpenAQ. The adjustment, based on machine learning methods, was performed over a set of specific interesting locations provided by the ECMWF, for the period June 2019 to March 2021.

The machine learning approach uses three different deep learning based models, and an extra neural network that gathers the output of the three previous models. From the three DL-based models, two of them are independent and follow the same structure built upon the InceptionTime module: they use both meteorological and air quality variables, to exploit the temporal variability and to extract the most meaningful features of the past [t-24h, t-23h, … t-1h] and future [t, t+1h, …, t+23h] CAMS predictions. The third model uses the station static attributes (longitude, latitude and elevation), and a multilayer perceptron interacts with the station attributes. The extracted features from these three models are fed into another multilayer perceptron, to predict the upcoming errors with hourly resolution [t, t+1h, …, t+23h]. As a final step, 5 different initializations are considered, assembling them with equal weights to have a more stable regressor.

Previous to the modelisation, CAMS forecasts of air quality variables were actually biassed independently from the location of interest and the variable (on average: biasNO2 = -22.76, biasO3 = 44.30, biasPM2.5 = 12.70). In addition, the skill of the model, measured by the Pearson correlation, did not reach 0.5 for any of the variables—with remarkable low values for NO2 and O3 (on average: pearsonNO2 = 0.10, pearsonO3 = 0.14).

AQ-BiasCorrection modelisation properly corrects these biases. Overall, the number of stations that improve the biases both in train and test sets are: 52 out of 61 (85%) for NO2, 62 out of 67 (92%) for O3, and 80 out of 102 (78%) for PM2.5. Furthermore, the bias improves with declines of -1.1%, -9.7% and -13.9% for NO2, O3 and PM2.5 respectively. In addition, there is an increase in the model skill measured through the Pearson correlation, reaching values in the range of 100-400% for the overall improvement of the variable skill.

How to cite: Pérez, A., Santa Cruz, M., Flemming, J., and Razinger, M.: A Deep Learning approach to de-bias Air Quality forecasts, using heterogeneous Open Data sources as reference, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1255,, 2022.

Display materials

Display link

Comments on the display material

to access the discussion