EGU26-14633, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-14633
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Thursday, 07 May, 10:45–12:30 (CEST), Display time Thursday, 07 May, 08:30–12:30
 
Hall A, A.35
Composing Transparent Quality Control Pipelines from Basic Anomaly Descriptions
Peter Lünenschloß, David Schaefer, and Jan Bumberger
Peter Lünenschloß et al.
  • Helmholtz Centre for Environmental Research, Leipzig, Germany (peter.luenenschloss@ufz.de)

Quality control (QC) and data cleaning remain major bottlenecks in geoscientific data analysis as data volumes, dimensionality, and heterogeneity continue to increase. While machine- and deep-learning-based approaches have demonstrated impressive performance in selected applications, their practical adoption is often constrained by the availability of sufficiently large labelled training datasets and by the effort required to calibrate and adapt model hyperparameters across datasets and domains, particularly in unsupervised flagging scenarios. Conversely, rule-based, deterministic, and statistical QC approaches offer greater transparency and interpretability, but are frequently tailored to specific data structures and lack the flexibility required to robustly generalise to varying observational contexts and non-ideal data distributions.

We present a software framework that addresses this gap by enabling the formulation of QC pipelines in terms of a small set of basic anomaly descriptions, such as outliers, noisy regimes, and data gaps. These anomaly notions are intuitively understood by domain experts, while their systematic combination allows the representation of a wide range of anomaly patterns encountered in geoscientific observations.

The parameters of these compositions are then automatically calibrated with the data at hand, resulting in an instantiated QC pipeline. By internally reducing the calibration problem to the fitting of individual anomaly descriptions defined by only a small number of well-understood parameters, the optimisation achieves robust convergence even with a limited number of supervised examples. Within the framework, such examples can be generated interactively during pipeline construction by domain specialists themselves or imported from existing sources. This design lowers the entry barrier for effective automated quality control while enabling the explicit integration of domain knowledge into the calibration process.

The framework is implemented as a new module within the open-source quality-control software SaQC, thereby integrating seamlessly with existing data import, preprocessing, and flag management workflows. Calibrated QC pipelines can be exported and stored as portable, human-readable configuration files in a tabular format. These configurations can subsequently be loaded and applied using the SaQC application to new and unseen datasets, enabling reproducible and automated quality control.

In the poster, we present the conceptual design of the framework and demonstrate its application to a hydrological dataset, highlighting the transparent, combinatorial configuration interface and the integrated supervision workflow.

 

How to cite: Lünenschloß, P., Schaefer, D., and Bumberger, J.: Composing Transparent Quality Control Pipelines from Basic Anomaly Descriptions, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14633, https://doi.org/10.5194/egusphere-egu26-14633, 2026.