- University of Hamburg, Institute of Geophysics, Hamburg, Germany (oliver.boelt@uni-hamburg.de)
Distributed Acoustic Sensing (DAS) turns optical fibers into high resolution strain sensors by monitoring the scattering of light within the fiber. With channel distances in the order of a few meters and a typical sampling frequency of 1 kHz, DAS is capable of recording a wide range of natural and anthropogenic seismic signals. Furthermore, the optical fibers used for DAS can be several kilometers long and are suitable for long-term measurements over weeks, months or years. The datasets obtained by DAS can therefore be very large, with up to several terabytes of data per day. Due to this large amount of data, it is challenging to get a good overview of the different types of seismic signals contained in the data, since a manual inspection can become immensely time-consuming.
In this study we aim to automatize this process by clustering the data to detect and classify different types of seismic signals. A two-dimensional windowed Fourier transform is used to automatically extract features from the data. In contrast to many other approaches, this allows to not only use temporal information, but to also include the spatial dimension to further distinguish between different seismic sources and wave types.
The clustering is performed in two steps. First, a Gaussian Mixture Model (GMM) is used to cluster the feature set. Then, the final clusters are obtained by merging similar components of the GMM.
A key advantage of this method is that each final cluster represents a specific frequency distribution and can therefore be turned into a filter. While many clustering approaches only assign a list of labels or cluster memberships to the data, our method provides the ability to directly extract the characteristic seismic signals for each cluster. This helps greatly with cluster interpretation and can also be useful for further applications like event detection or denoising.
The proposed procedure is applied to different large DAS datasets, yielding a variety of different clusters. By filtering the data for each cluster and interpreting the obtained waveforms, as well as the long-term spatiotemporal amplitude patterns, different sources like traffic or machinery can be identified.
How to cite: Bölt, O., Hammer, C., and Hadziioannou, C.: Clustering of Large Distributed Acoustic Sensing Datasets, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9174, https://doi.org/10.5194/egusphere-egu26-9174, 2026.