EGU21-12324, updated on 16 Apr 2021
https://doi.org/10.5194/egusphere-egu21-12324
EGU General Assembly 2021
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

AWT - Clustering using an Aggregated Wavelet Tree: A novel automatic unsupervised clustering and outlier detection algorithm for time series

Christina Pacher1, Irene Schicker2, Rosmarie DeWit2, and Claudia Plant1
Christina Pacher et al.
  • 1Universität Wien, Wien, Austria (christina.pacher@univie.ac.at)
  • 2Zentralanstalt für Meteorologie und Geodynamik - ZAMG
Both clustering and outlier detection play an important role in meteorology. With clustering large sets of data points, such as numerical weather predicition (NWP) model data or observation sites, are separated into groups based on the characteristics found in the data grouping similar data points in a cluster. Clustering enables one, too, to detect outliers in the data. The resulting clusters are useful in many ways such as atmospheric pattern recognition (e.g. clustering NWP ensemble predictions to estimate the likelihood of the predicted weather patterns), climate applications (grouping point observations for climate pattern recognition), forecasting(e.g. data pool enhancement using data of similar sites for forecasting applications), in urban meteorology, air quality, renewable energy systems, and hydrologogical applications.  
 
Typically, one does not know in advance how many clusters or groups are present in the data. However, for algorithms such as K-means one needs to define how many clusters one wants to have as an outcome. With the proposed novel algorithm AWT,  a modified combination of several well-known clustering algorithms, this is not needed. It chooses the number of clusters automatically based on a user-defined threshold parameter. Furthermore, the algorithm can be used for heterogeneous meteorological input data as well as data sets that exceed the available memory size.
Similar as the classical BIRCH algorithm, our method AWT works on a multi-resolution data structure, an Aggregated Wavelet Tree that is suitable for representing multivariate time series. In contrast to BIRCH, the user does not need to specify the number of clusters K, as that is difficult in our application. Instead, AWT relies on a single threshold parameter for clustering and outlier detection. This threshold corresponds to the highest resolution of the tree. Points that are not in any cluster with respect to the threshold are naturally flagged as outliers.
 
With the recent increasing usage of non-traditional data sources, such as private, smart-home weather station, in NWP  models and other forecasting and applications outlier and clustering methods are useful in pre-processing and filtering these rather novel data sources. Especially in urban areas changes in the surface energy balance caused by urbanization result in temperatures generally being higher in cities than in the surrounding areas. In order to capture the spatial features of this effect data with high spatial resoltion are necessary. Here, these privately owned smart-home weather stations are useful as often only a limited number of official observation sites exist. However, to be able to use these data they need to be pre-processed.  
  
In this work we apply our novel algorithm AWT to crowdsourced data from the city of Vienna. We demonstrate the skill of the algorithm in outlier detection and filtering as well as clustering the data and evaluate it against commonly used algorithms. Furthermore, we show how one could use the algorithm in renewable energy applications.

How to cite: Pacher, C., Schicker, I., DeWit, R., and Plant, C.: AWT - Clustering using an Aggregated Wavelet Tree: A novel automatic unsupervised clustering and outlier detection algorithm for time series, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12324, https://doi.org/10.5194/egusphere-egu21-12324, 2021.

Displays

Display file