EMS Annual Meeting Abstracts
Vol. 21, EMS2024-782, 2024, updated on 05 Jul 2024
https://doi.org/10.5194/ems2024-782
EMS Annual Meeting 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Friday, 06 Sep, 12:45–13:00 (CEST)| Lecture room B5

Statistical summaries for streamed climate data

Katherine Grayson1, Stephan Thober2, Francesc Roura Adserias2, Aleksander Lacima-Nadolnik1, Ehsan Sharifi1, and Francisco Doblas-Reyes1,3
Katherine Grayson et al.
  • 1Barcelona Supercomputing Center , Earth Sciences, Barcelona, Spain (katherine.grayson@bsc.es)
  • 2Helmholtz Centre for Environmental Research, UFZ. Leipzig, Germany
  • 3Institució Catalana de Recerca i Estudis Avançats, Barcelona, 08010, Spain

Projections from global climate models (GCMs) are regularly used to create information for climate adaptation policies and socio-economic decisions. As demand grows for accuracy in these projections, GCMs are being run at increasingly finer spatiotemporal resolution to better resolve physical processes and consequently reduce uncertainty associated with parametrizations. Yet this increase in resolution and the consequent size of the data output makes the current state-of-the-art archives (e.g., CORDEX, CMIP) unfeasible. Moreover, the current archival method has left some data consumers without their required data due to the limited number of variables stored and their lower frequency (e.g., monthly means). Initiatives like Destination Earth are investigating the novel method of data streaming, where user applications can be run as soon as the required data is produced by the climate models. Data streaming allows users to access the climate data at the highest frequency possible (e.g., hourly) and native resolution in near real model run-time. This provides an unprecedented time-scale reduction to access the climate data compared with the current simulation paradigm and the possibility of using variables and frequencies not previously available.

Yet the advent of data streaming in the climate community poses its own set of challenges. Often users require climate data that spans long periods. For example, many hydrological impact models require daily, monthly or annual maximum precipitation values, while in the wind energy sector, accurate distributions of the wind speed over long periods are essential. Obtaining statistics for periods longer than the time the climate model output is accessible can no longer be done using traditional statistical algorithms. This introduces the one-pass problem; how to compute summaries, diagnostics or derived quantities that only see each data point once (i.e., pass through the data one time)?

We present here a detailed analysis on the use of one-pass algorithms to compute statistics on streamed climate data. Unlike traditional two-pass methods, one-pass algorithms do not have access to the full time series of data needed to estimate the statistic; instead, they process data incrementally every time that the model outputs new time steps. While these algorithms have been adopted in other fields such as online trading and machine learning, they have yet to find a foothold in climate science, mainly because they have not been necessary until now. Here we show how one-pass algorithms can be harnessed for use in Earth system digital twins, generating the statistics required by users with minimal loss in accuracy and bypassing unfeasible storage requirements.

How to cite: Grayson, K., Thober, S., Roura Adserias, F., Lacima-Nadolnik, A., Sharifi, E., and Doblas-Reyes, F.: Statistical summaries for streamed climate data, EMS Annual Meeting 2024, Barcelona, Spain, 1–6 Sep 2024, EMS2024-782, https://doi.org/10.5194/ems2024-782, 2024.