- 1CNRS ITES UMR7063 / FEBUS Optics, FRANCE (camille.huynh@febus-optics.com)
- 2Institut Terre et Environnement de Strasbourg (ITES), CNRS UMR 7063 - Université de Strasbourg, 5 rue René Descartes, F-67084 Strasbourg, FRANCE
- 3Ecole et Observatoire des Sciences de la Terre (EOST), CNRS UAR 830 - Université de Strasbourg, 5 rue René Descartes, F-67084 Strasbourg, FRANCE
- 4NORSAR, Gunnar Randers vei 15, 2007 Kjeller, NORWAY
- 5FEBUS Optics, 2 avenue du Président Pierre Angot, 64000 Pau, FRANCE
Distributed Acoustic Sensing (DAS) enables seismic monitoring by transforming fiber optic cables into dense, cost-effective sensor arrays. However, the vast data volume generated by DAS presents challenges for labeling, sometimes even making data labeling more time consuming than processing and research. Traditional supervised machine learning methods require extensive manual labeling for individual events, which is both time-consuming and susceptible to user bias.
To address these challenges, we propose a clustering-based approach to group similar data, allowing for cluster-level labeling rather than event-by-event annotation. Our method employs a two-step processing chain denoted (a) and (b). In the step (a), data is represented in a latent space defined by hundreds of features. Two approaches for constructing this latent space are explored: one using human-engineered features based on seismological signal processing, and the other leveraging self-supervised learning via the image-BYOL algorithm, which utilizes bidimensional representations of DAS data. The step (b) applies unsupervised clustering, initially reducing the dataset to 5000 clusters using K-Means partitioning algorithm, followed by hierarchical clustering to condense these into 500–700 interpretable clusters using an inconsistency criterion.
This method was applied to two DAS datasets collected in the Hautes-Pyrénées. The first dataset involved six weeks of continuous measurements along an 800-m cable in Viella, recorded with a temporal resolution of 400 Hz, a gauge length of 10 m, and a channel spacing of 2.4 m. The second dataset consisted of 19 ten-minute recordings along a 91-km cable, with a temporal resolution of 200 Hz, a gauge length of 10 m, and a channel spacing of 4.8 m. Using cluster-based labeling on the Viella dataset, we successfully detected 100% of earthquakes with a magnitude Mw>2.0 and identified the daily periodicity of anthropogenic events, such as those related to farming activities. Continuous and long-duration (>30 s) seismic signals, primarily generated by mechanical farming engines, demonstrated a clear periodicity, whereas impact-driven or impulsive events were less consistent in timing due to their diverse origins.
These findings highlight the potential of clustering techniques to analyze DAS data efficiently, reducing reliance on manual event labeling. Nevertheless, further improvements are necessary to minimize false positives, particularly for smaller seismic events.
How to cite: Huynh, C., Rimpot, J., Hibert, C., Turquet, A., Stangeland, T., Malet, J.-P., and Lanticq, V.: Unsupervised Learning for In-Depth Analysis of Continuous DAS Data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19069, https://doi.org/10.5194/egusphere-egu25-19069, 2025.