Maximising information in open source health data: A statistical approach for modelling aggregated time series of health outcomes
- The Cyprus Institute, Climate and Atmosphere Research Centre, Nicosia, Cyprus (t.economou@cyi.ac.cy)
One of the major concerns of climate change is the impact on human health. To that end, there is a lot of research on quantifying the effects of environmental exposure such as heat and air pollution on human health. The main challenge encountered in such research is the availability of health data. For short-term exposure effects, such as the impact of heat on daily mortality, daily data is required which can be obtained at country-level, but is generally not open-source or available at at a wider spatial extend (e.g., at continent or global scale).
It is also true however, that open-source repositories do contain health-related data at a large spatial scale, albeit at non-optimal temporal resolution. A bright example is Eurostat, which contains (but is not limited to) health data (e.g., mortality) at weekly time steps for the EU member states but also other countries peripheral to Europe.
The primary tool for quantifying the effect from various exposures on mortality and morbidity is the framework of Distributed Lag Non-linear Models (DLNMs). These models, applied to daily data, can capture the effects from environmental exposure across many days (lags). In this work, we exploit the mathematical properties of the Poisson distribution to enable the implementation of DLNMs on temporally aggregated data, and demonstrate that the loss of information is minimal, particularly when the goal is to understand aggregated quantities such as the attributable number of deaths. Using simulated data, we demonstrate that using the framework of Generalized Additive Models enables the application of DLNMs to weekly data, to emulate the situation of using data from databases such as Eurostat. We further illustrate our framework using real mortality data from the city of Thessaloniki, Greece and from Cyprus.
Another implication of our suggested framework is that large scale studies (e.g., at continental or global scale) can be made more optimal when the goal is to estimate aggregate risk measures. For instance, aggregating daily data to weekly data, reduces the amount of data by a factor of 7. We present a sensitivity analysis to the level of aggregation that can be performed before significant loss of information in the estimates starts to occur.
How to cite: Economou, T.: Maximising information in open source health data: A statistical approach for modelling aggregated time series of health outcomes, EMS Annual Meeting 2024, Barcelona, Spain, 1–6 Sep 2024, EMS2024-171, https://doi.org/10.5194/ems2024-171, 2024.