AWT - Clustering using an Aggregated Wavelet Tree: A novel automatic unsupervised clustering and outlier detection algorithm for time series

Christina Pacher; Irene Schicker; Rosmarie DeWit; Claudia Plant

doi:https://doi.org/10.5194/egusphere-egu21-12324

[Back] [Session ESSI1.6]

EGU21-12324, updated on 16 Apr 2021

https://doi.org/10.5194/egusphere-egu21-12324

EGU General Assembly 2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

AWT - Clustering using an Aggregated Wavelet Tree: A novel automatic unsupervised clustering and outlier detection algorithm for time series

Christina Pacher¹, Irene Schicker², Rosmarie DeWit², and Claudia Plant¹

Christina Pacher et al.

¹Universität Wien, Wien, Austria (christina.pacher@univie.ac.at)
²Zentralanstalt für Meteorologie und Geodynamik - ZAMG

Both clustering and outlier detection play an important role in meteorology. With clustering large sets of data points, such as numerical weather predicition (NWP) model data or observation sites, are separated into groups based on the characteristics found in the data grouping similar data points in a cluster. Clustering enables one, too, to detect outliers in the data. The resulting clusters are useful in many ways such as atmospheric pattern recognition (e.g. clustering NWP ensemble predictions to estimate the likelihood of the predicted weather patterns), climate applications (grouping point observations for climate pattern recognition), forecasting(e.g. data pool enhancement using data of similar sites for forecasting applications), in urban meteorology, air quality, renewable energy systems, and hydrologogical applications.

Typically, one does not know in advance how many clusters or groups are present in the data. However, for algorithms such as K-means one needs to define how many clusters one wants to have as an outcome. With the proposed novel algorithm AWT, a modified combination of several well-known clustering algorithms, this is not needed. It chooses the number of clusters automatically based on a user-defined threshold parameter. Furthermore, the algorithm can be used for heterogeneous meteorological input data as well as data sets that exceed the available memory size.

Similar as the classical BIRCH algorithm, our method AWT works on a multi-resolution data structure, an Aggregated Wavelet Tree that is suitable for representing multivariate time series. In contrast to BIRCH, the user does not need to specify the number of clusters K, as that is difficult in our application. Instead, AWT relies on a single threshold parameter for clustering and outlier detection. This threshold corresponds to the highest resolution of the tree. Points that are not in any cluster with respect to the threshold are naturally flagged as outliers.

With the recent increasing usage of non-traditional data sources, such as private, smart-home weather station, in NWP models and other forecasting and applications outlier and clustering methods are useful in pre-processing and filtering these rather novel data sources. Especially in urban areas changes in the surface energy balance caused by urbanization result in temperatures generally being higher in cities than in the surrounding areas. In order to capture the spatial features of this effect data with high spatial resoltion are necessary. Here, these privately owned smart-home weather stations are useful as often only a limited number of official observation sites exist. However, to be able to use these data they need to be pre-processed.

In this work we apply our novel algorithm AWT to crowdsourced data from the city of Vienna. We demonstrate the skill of the algorithm in outlier detection and filtering as well as clustering the data and evaluate it against commonly used algorithms. Furthermore, we show how one could use the algorithm in renewable energy applications.

How to cite: Pacher, C., Schicker, I., DeWit, R., and Plant, C.: AWT - Clustering using an Aggregated Wavelet Tree: A novel automatic unsupervised clustering and outlier detection algorithm for time series, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12324, https://doi.org/10.5194/egusphere-egu21-12324, 2021.

Displays

Display file