EGU23-11601, updated on 22 Oct 2024
https://doi.org/10.5194/egusphere-egu23-11601
EGU General Assembly 2023
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Clustering Geodata Cubes (CGC) and Its Application to Phenological Datasets

Francesco Nattino1, Ou Ku1, Meiert W. Grootes1, Emma Izquierdo-Verdiguier2, Serkan Girgin3, and Raúl Zurita-Milla3
Francesco Nattino et al.
  • 1Netherlands eScience Center, Amsterdam, the Netherlands
  • 2Institute of Geomatics, University of Natural Resources and Life Science (BOKU), Vienna, Austria
  • 3Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, the Netherlands

Unsupervised classification techniques are becoming essential to extract information from the wealth of data that Earth observation satellites and other sensors currently provide. These datasets are inherently complex to analyze due to the extent across multiple dimensions - spatial, temporal, and often spectral or band dimension – their size, and the high resolution of current sensors. Traditional one-dimensional cluster analysis approaches, which are designed to find groups of similar elements in datasets such as rasters or time series, may come short of identifying patterns in these higher-dimensional datasets, often referred to as data cubes. In this context, we present our Clustering Geodata Cubes (CGC) software, an open-source Python package that implements a set of co- and tri-clustering algorithms to simultaneously group elements across two and three dimensions, respectively. The package includes different implementations to most efficiently tackle datasets that fit into the memory of a single machine as well as very large datasets that require cluster computing. A refining strategy to facilitate data pattern identification is also provided. We apply CGC to investigate gridded datasets representing the predicted day of the year when spring onset events (first leaf, first bloom) occur according to a well-established phenological model. Specifically, we consider spring indices computed at high spatial resolution (1 km) and continental scale (conterminous United States) for the last 40+ years and extract the main spatiotemporal patterns present in the data via CGC co-clustering functionality.  

How to cite: Nattino, F., Ku, O., Grootes, M. W., Izquierdo-Verdiguier, E., Girgin, S., and Zurita-Milla, R.: Clustering Geodata Cubes (CGC) and Its Application to Phenological Datasets, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11601, https://doi.org/10.5194/egusphere-egu23-11601, 2023.