Building useful datasets for Earth System Model output
- Max-Planck-Institut für Meteorologie, Climate Physics, Hamburg, Germany (tobias.koelling@mpimet.mpg.de)
Global kilometer-scale climate models generate vast quantities of simulation output, presenting significant challenges in using this wealth of data effectively. We approach this challenge from the output user perspective, craft useful dataset requirements and show how this also simplifies data handling and reduces the “time to plot”.
At first, this strategy involves the creation of a consolidated and analysis-ready n-dimensional dataset, directly from the running model. A dataset forms a consistent set of data, which as a whole aims to inform a user about what and what not to expect from the output of a given model run. This is notably distinct from multiple independent datasets or messages, which can’t convey the overall structure and are often (slightly, but annoyingly) inconsistent. Thus, the amount of user surprise can be reduced by reducing the number of output datasets of a model run towards 1. This of course requires synthesizing relevant information from diverse model outputs, but in turn streamlines the accessibility and usability of climate simulation data.
With the goal of the user-perspective of a single large dataset established, we need to ensure that access to this dataset is swift and ergonomic. At the kilometer-scale, horizontal grids of global models outgrow the size of computer screens and the capacity of the human eye, changing viable dataset usage patterns: we can either observe a coarser version of the data globally or high resolution data locally, but not both. We make use of two concepts to cope with this fact: output hierarches and multidimensional chunking.
By accumulating data in both temporal and spatial dimensions, while keeping the dataset structure, users can seemlessly switch between resolutions, reducing the computational burden during post-processing at the large scale perspective. In addition, splitting high-resolution data in compact spatiotemporal chunks, regional subsets can be extracted quickly as well. While adding the hierarchy adds a small amount of extra data to already tight disk space quotas, a good chunk design and state-of-the-art compression techniques reduce storage requirements without adding access time overhead. On top, the approach generates an opportunity for hierarchical storage systems: only those regions and resoultions which are actively worked on have to reside in “hot” storage.
In summary, our collaborative efforts bring together diverse existing strategies to revolutionize the output and post-processing landscape of global kilometer-scale climate models. By creating a single analysis-ready dataset, pre-computing hierarchies, employing spatial chunking, and utilizing advanced compression techniques, we aim to address challenges associated with managing and extracting meaningful insights from these vast simulations. This innovative approach enhances the efficiency of many real-life applications, which is a necessity for analysing multi-decadal kilometer-scale model output.
How to cite: Kölling, T. and Kluft, L.: Building useful datasets for Earth System Model output, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17141, https://doi.org/10.5194/egusphere-egu24-17141, 2024.