EGU26-21643, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-21643
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Wednesday, 06 May, 08:30–10:15 (CEST), Display time Wednesday, 06 May, 08:30–12:30
 
Hall X1, X1.41
Explainable Machine Learning for diagnosing Data Quality Issues in Dendrometer-Based Tree Growth Time Series
Valentina Disarlo1, Anahid Wachsenegger2, Jasmin Lampert3, and Anita Zolles4
Valentina Disarlo et al.
  • 1ISTA, Klosterneuburg, Austria and AIT, Data Science & AI, Vienna, Austria (Valentina.Disarlo@ist.ac.at)
  • 2Austrian Institute of Technology, Data Science & AI, Vienna, Austria (Anahid.Wachsenegger@ait.ac.at)
  • 3Austrian Institute of Technology, Data Science & AI, Vienna, Austria (Jasmin.Lampert@ait.ac.at)
  • 4Bundesforschungs- und Ausbildungszentrum für Wald, Naturgefahren und Landschaft, Vienna, Austria (anita.zolles@bfw.gv.at)

High-frequency dendrometer measurements provide valuable insights into short-term and seasonal tree growth dynamics, enabling detailed analyses of forest responses to climatic variability. At the same time, their practical use is strongly limited by data quality issues. Sensor freezing during cold periods, signal drift, data gaps, and site-specific artefacts introduce substantial noise and uncertainty into dendrometer time series. These effects persist even after expert-based corrections and challenge standard assumptions about the availability of reliable ground truth observations.

In this study, we investigate how data quality and model performance can be evaluated when both predictions and reference measurements are affected by uncertainty. We analyse multi-year, hourly dendrometer records of individual tree radial growth collected at forest monitoring sites in Austria, combined with in-situ environmental variables such as air temperature, precipitation, and soil moisture. As a modelling baseline, we employ statistically grounded time-series approaches, including exponential smoothing and seasonal autoregressive integrated moving average models with exogenous variables (SARIMAX). Lagged environmental predictors are incorporated to capture delayed physiological responses of trees to climatic drivers and to reflect the strong temporal dependencies present in the data.

Rather than focusing exclusively on predictive accuracy, we place emphasis on diagnosing data reliability and understanding how observational noise propagates through time-series models. We show that classical evaluation metrics are often insufficient when the target variable itself is noisy or partially unreliable. To address this, we adapt anomaly detection concepts to the specific characteristics of dendrometer data, developing season-aware diagnostics that help identify implausible growth patterns, abrupt regime changes, and periods of degraded sensor performance while preserving biologically meaningful variability.

In addition, we explore how model-based explanations can support data quality assessment in a diagnostic sense. Feature attribution analyses computed over multi-lag input structures are used to examine when model behaviour is driven by consistent environmental signals and when it becomes unstable or difficult to interpret. Rather than treating explainability as an end in itself, we use these attribution patterns as indicators of potential data issues, such as sensor artefacts or inconsistent environmental responses, that warrant closer expert inspection.

The combined use of anomaly-aware diagnostics and explanation-informed analysis provides complementary perspectives on uncertainty and noise in high-frequency ecological time series. Our results highlight the importance of data-centric evaluation strategies for tree growth modelling and demonstrate that interpretable statistical baselines remain essential tools when working with noisy environmental sensor data. The proposed framework supports more robust and transparent downstream applications, including growth forecasting, stress detection, and climate impact assessment under increasing climatic variability.

How to cite: Disarlo, V., Wachsenegger, A., Lampert, J., and Zolles, A.: Explainable Machine Learning for diagnosing Data Quality Issues in Dendrometer-Based Tree Growth Time Series, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21643, https://doi.org/10.5194/egusphere-egu26-21643, 2026.