Neural network studies of air quality and socioeconomic predictors of mortality
- University of Cambridge, Centre for Atmospheric Science, Yusuf Hamied Department of Chemistry, Cambridge, United Kingdom of Great Britain – England, Scotland, Wales
After the Great Smog of London in 1952, the health impacts of air pollution exposure were launched into local public awareness. Today, these impacts have been established by epidemiological studies across the world.
Machine learning (ML) techniques applied to this field of study in recent years have demonstrated potential advantages over traditional statistical approaches. These techniques are well-suited to large sets of input features, which can describe more holistically the numerous factors affecting human health. Additionally, the data-driven nature of these techniques eliminates the requirement for prior definition of the mathematical relationships between driving factors, confounders, and health outcomes. Previous examples of ML applications have included the identification of exposure profiles, and prediction of disease rates.
In this work, a simplified feature set was used to develop predictive ML models of daily mortality in Greater London, UK. The input features to the predictive models were: outdoor nitrogen dioxide concentrations recorded by the London Air Quality Network, outdoor temperature measurements recorded by the UK Met Office, and gross disposable household income per capita, as published by the UK Office for National Statistics. Preliminary work explored the trends and correlations observed in the dataset, which spanned the years 1997–2018. Predictive model performance was then compared between linear and neural network regressor models. Each of the three input features were also excluded in turn, to test the roles they played as predictors of mortality rates in London.
Results found that, while both types of regressor architectures learnt to predict seasonal cycles in mortality rates, the neural network made test set predictions with a 73% reduction in mean squared error compared to the equivalent linear model. This illustrates the improved modelling power conferred by the nonlinear nature of neural networks, despite the network here being shallow in depth.
Additionally, the ablation studies demonstrated that both types of models were dependent on the income input feature in order to accurately predict general trends in mortality rates over the two decades. Only this input feature provided information about changing trends through time, and its inclusion in this modelling approach was intended to represent the gradual improvement of societal and individual health factors.
In ongoing work, the exploration of factors affecting mortality is extended using long short-term memory neural network architectures. This type of neural network is additionally able to consider the temporal dimension by handling sequences of time series datapoints. Information is incorporated from the sequence of previous time steps into a memory vector, which then forms part of the input to the subsequent time step. Sequence length thus corresponds to the length of time-lagged associations learnt by the network. By varying sequence length, it is then possible to examine the significance of time-lag windows of different day lengths.
How to cite: Wan, M. and Archibald, A.: Neural network studies of air quality and socioeconomic predictors of mortality, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-1535, https://doi.org/10.5194/egusphere-egu23-1535, 2023.