EGU26-11636, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-11636
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Thursday, 07 May, 14:45–14:55 (CEST)
 
Room 2.31
Unsupervised pattern recognition for imperfect datasets: a visual workflow for plausibility checks and regime diagnosis in high-dimensional environmental time series
Kenneth Gutiérrez1,2, Gunnar Lischeid1,2, Gökben Demir1, Maren Dubbert1, Alexander Knohl3,4, and Christian Markwitz3
Kenneth Gutiérrez et al.
  • 1Leibniz Centre for Agricultural Landscape Research (ZALF), Research Area „Data Analysis and Simulation“, Müncheberg, Germany (kenneth.gutierrez-garcia@zalf.de)
  • 2Institute of Environmental Sciences and Geography, University of Potsdam, Potsdam, Germany
  • 3Bioclimatology, Faculty of Forest Sciences and Forest Ecology, University of Göttingen, Göttingen, Germany
  • 4Centre of Biodiversity and Sustainable Land Use, University of Göttingen, Göttingen, Germany

Data imperfection is characterized by fragmentation, sensor failures, and high-dimensional noise. This remains a persistent challenge in environmental monitoring. As observation networks expand to capture heterogeneous soil-atmosphere interactions, traditional quality control methods based on rigid statistical thresholds often struggle to distinguish between sensor errors and genuine, non-linear system dynamics. This study presents a methodological development for knowledge extraction from imperfect and fragmented data, employing a multivariate visualization workflow that combines Principal Component Analysis (PCA) and Self-Organizing Maps (SOM) with Sammon Mapping.

We applied this unsupervised learning approach to a high-dimensional dataset (~100 variables) from a field-scale agricultural system, including measurements of soil moisture and temperature, eddy covariance-derived CO2, energy fluxes, radiation, wind, precipitation, groundwater level and discharge.

This allowed us to compare a discontinuous period in 2024 against a continuous period in 2025. The results demonstrate the method's robustness in extracting coherent structural patterns despite data incompleteness. While PCA effectively isolated the dominant thermodynamic baselines from high-frequency hydrologic events, the topological SOM projection provided a rapid, visual plausibility check.

The visualization facilitated the identification of possible irregularities in the sensors as spatial outliers in the 2024 dataset, facilitating instant anomaly detection without manual time-series inspection. Furthermore, the method successfully captured shifts in system dynamics, such as the decoupling of surface moisture from groundwater, validating its utility for identifying physical regimes in heterogeneous data. We conclude that this visual workflow offers a scalable, data-driven solution for moving from raw, imperfect observations toward actionable system diagnostics, bridging the gap between data acquisition and process understanding in complex environmental observatories.

How to cite: Gutiérrez, K., Lischeid, G., Demir, G., Dubbert, M., Knohl, A., and Markwitz, C.: Unsupervised pattern recognition for imperfect datasets: a visual workflow for plausibility checks and regime diagnosis in high-dimensional environmental time series, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11636, https://doi.org/10.5194/egusphere-egu26-11636, 2026.