EGU26-14861, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-14861
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Friday, 08 May, 10:45–12:30 (CEST), Display time Friday, 08 May, 08:30–12:30
 
Hall X2, X2.8
How comparable are geochemical datasets really and why it matters
Marie Katrine Traun1, Axel D. Renno2, Leander Kallas1, Matthias Willbold1, Tod Waight3, Dieter Garbe-Schönberg4, and Gerhard Wörner1
Marie Katrine Traun et al.
  • 1Geochemistry and Isotope Geology, Geoscience Centre, University of Göttingen, Göttingen, Germany
  • 2Helmholtz-Zentrum, Dresden-Rossendorf, Germany
  • 3Department of Geosciences and Natural Resource Management, University of Copenhagen, Copenhagen, Denmark
  • 4Institute of Geosciences, Kiel University, Kiel, Germany

In geochemistry, big data applications at both global and regional scales rely on compilation and aggregation of numerous smaller datasets to form a big dataset. For this reason, geochemical data compilations are vulnerable to systematic biases between the smaller datasets they are composed of. These biases relate to the analytical methods and procedures, laboratories, instruments, sample preparation, detection limits, mass interferences, sample matrices, etc. In other words, the preparation and analysis of each sample batch is unique beyond the “method” or “laboratory” bias. This uniqueness and potential offsets it might cause between analytical batches we define as inter-study bias. The only quantitative way to evaluate inter-study bias in geochemical data compilations is through metadata and specifically assessment of analytical data of geochemical reference materials. Reference materials are substances of known composition measured alongside unknown samples, as is a standard good practice during routine geochemical analyses. In an ideal world, all geochemical studies report analyses and values of reference materials and analytical methods and analyses have been refined and calibrated to match the reference material’s certified value within uncertainty. Only in this case can the inter-study bias be considered negligible. Accordingly, most geochemical big data compilations are based on this assumption and do not explicitly assess the metadata for potential inter-study bias. In the real world, perfectly calibrated analyses are often not the case and metadata uncommonly reported.

To assess the comparability, compilability and inter-study bias between geochemical datasets, we have developed several data quality and outlier-detection tools based on the Geological and Environmental Reference Material database - GeoReM. We use these tools to showcase the implications of inter-study bias for global geochemical interpretation models using two well-known geochemical big data research topics: 1) identification of compositional end-members for oceanic basalts and the origin of their source mantle components (colloquially called “the mantle zoo”) and 2) compositional signatures of zircons as tracers for the growth, reworking and evolution of the continental crust. Our take home message: Geochemical datasets must be comparable to be compilable. We therefore advocate the assessment of your inter-study bias as well as comprehensive reporting of your metadata and reference materials, so that computational geochemistry can progress as a subdiscipline of big data science.

Keywords: Reference material, GeoReM, outlier, metadata, method bias, isotopes, mantle geochemistry, zircon, GEOROC, crustal evolution

How to cite: Traun, M. K., Renno, A. D., Kallas, L., Willbold, M., Waight, T., Garbe-Schönberg, D., and Wörner, G.: How comparable are geochemical datasets really and why it matters, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14861, https://doi.org/10.5194/egusphere-egu26-14861, 2026.