EGU24-10233, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-10233
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Evaluating the Impact of Aggregation Scales and Campaign Durations on Land-Use Regression Models for Air Pollution Estimation with Mobile NO2 Monitoring

Tian Tian1, Marco Helbich1, Zhendong Yuan2, Jules Kerckhoffs2, and Roel Vermeulen2,3
Tian Tian et al.
  • 1Utrecht University, Department of human geography and spatial planning, UTRECHT, Netherlands
  • 2Institute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands
  • 3Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht, The Netherlands

Background: It is common practice in land-use regressions for air pollutant predictions to aggregate mobile measurements into road segments of predefined lengths (e.g., 50 m or 100 m) or raster cell sizes (e.g., 10 m or 25 m) in an ad hoc manner. However, the selection of the segment lengths and cell sizes is arbitrary and possibly affects the prediction accuracy which, in turn, may lead to heterogeneous results in studies using these air pollution surfaces.  

Aims: We aimed to 1) assess how different aggregation approaches (i.e., segments and cells) affect the accuracy of air pollution predictions from land-use regression models based on mobile measurements, and 2) assess the impact of various aggregation scales and measurement durations on the accuracy of depicting long-term air pollution concentrations.  

Methods: We utilized around 5.6 million mobile nitrogen dioxide (NO2) measurements in Amsterdam, the Netherlands, from May 2019 to February 2020. The mobile measurements were collected across five distinct campaigns of 10, 20, 30, 50, and 70 days. We aggregated mobile measurements from each duration into road segments and cells with varying spatial resolutions (i.e., 25 m, 50 m, 100 m, 150 m, 200 m). A stepwise linear regression (SLR) and a random forest (RF) were trained for each aggregated dataset. Furthermore, 80 long-term stationary NO2 measurements were employed to validate the LUR models.

Results: First, in LUR model training. RF consistently outperformed the SLR across all spatial scales and measurement durations. The performance of cell-based LUR models fluctuated more than segment-based models across different scales. The explained variance in the RF-based LUR models decreased with increasing cell sizes (e.g., decreased from 61% to 48%). Conversely, the stepwise LUR models explained larger parts of the variance with increasing cell sizes (e.g., increased from 19% to 31%). Second, in the long-term validation with stationary NO2 measurements, the prediction accuracy varied across different scales, but no clear trend was observable. The segment-based LUR models were less sensitive to changes in the spatial scale than cell-based LUR models. Moreover, our results showed that the duration of the mobile measurements campaign is vital, with longer-duration campaigns (e.g., 50 days and 70 days) producing more accurate predictions than shorter ones (e.g., 10 days and 20 days).  

Conclusion: By examining the effects of different spatial and temporal aggregation schemes on LUR models, we found that using different-sized segments leads to less variance in the results for model training and long-term air pollution predictions than cells. Our results suggest that a segment-based approach is more robust and should be used to predict air pollution concentrations.

How to cite: Tian, T., Helbich, M., Yuan, Z., Kerckhoffs, J., and Vermeulen, R.: Evaluating the Impact of Aggregation Scales and Campaign Durations on Land-Use Regression Models for Air Pollution Estimation with Mobile NO2 Monitoring, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10233, https://doi.org/10.5194/egusphere-egu24-10233, 2024.