EGU23-16795, updated on 26 Feb 2023
https://doi.org/10.5194/egusphere-egu23-16795
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Harmonizing Diverse Geo-Spatiotemporal Data for Event Analytics

Michael Rilee, Kwo-Sen Kuo, Michael Bauer, Niklas Griessbaum, and Dai-Hai Ton-That
Michael Rilee et al.
  • Bayesics LLC, Bowie, United States of America (kuo@bayesics.com)

Parallelization is the only means by which it is possible to process large amounts of diverse data on reasonably short time scales. However, while parallelization is necessary for performant and scalable BigData analysis, it is insufficient. We observe that we most often require spatiotemporal coincidence (i.e., at the same space and time) in geo-spatiotemporal analyses that integrate diverse datasets. Therefore, for parallelization, these large volumes of diverse data must be partitioned and distributed to cluster nodes with spatiotemporal colocation to avoid data movement among the nodes necessitated by misalignment. Such data movement devastates scalability.

The prevalent data structure for most geospatial data, e.g., simulation model output and remote sensing data products, is the (Raster) Array, with accompanying geolocation arrays, i.e., longitude-latitude,  of the same shape and size establishing, through the array index, a correspondence between a data array element and its geolocation. However, this array-index-to-geolocation relation is ever-changing from dataset to dataset and even within a dataset (e.g., swath data from LEO satellites). Consequently, it is impossible to use array indices for partitioning and distribution to achieve consistent spatiotemporal colocation.

A simplistic way to address this diversity is through homogenization, i.e., resampling (aka re-gridding) all data involved onto the same grid to establish a fixed array-index-to-geolocation relation. Indeed, this crude approach has become the existing common practice. However, different applications have different requirements for resampling, influencing the choice of the interpolation algorithm (e.g., linear, spline, flux-conserved, etc.). Regardless of which algorithm is applied, large amounts of modified and redundant data are created, which not only exacerbates the BigData Volume challenge but also obfuscates the processing and data provenance.

SpatioTemporal Adaptive-Resolution Encoding, STARE, was invented to address the scalability challenge through data harmonization, allowing efficient spatiotemporal colocation of the “native data” without re-gridding. STARE (1) ties its indices directly to space-time coordinate locations, unlike raster array indices used in the current practice which must go indirectly through the floating-point longitude-latitude arrays to reference geolocation, and (2) embeds neighborhood information in the indices to enable highly performant numerical operations for “joins” such as intersect, union, difference, and complement. These two properties together give STARE its exceptional data-harmonizing power because, when given a pair of STARE indices are associated with a data element, we know not only its spatiotemporal location but also its neighborhood, i.e., the spatiotemporal volume (2D in space plus 1D in time) that the data element represents.

These capabilities of STARE-based technologies allow not only the harmonization of diverse datasets but also sophisticated event analytics. In this presentation, we will discuss the application of STARE to the integrative analysis of Extra-Tropical Cyclones and precipitation events, wherein we use STARE to identify and catalog co-occurrences of these two kinds of events so that we may study their relationships using diverse data of the best spatiotemporal resolution available.

How to cite: Rilee, M., Kuo, K.-S., Bauer, M., Griessbaum, N., and Ton-That, D.-H.: Harmonizing Diverse Geo-Spatiotemporal Data for Event Analytics, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-16795, https://doi.org/10.5194/egusphere-egu23-16795, 2023.