- 1Geochemistry and Isotope Geology, Geoscience Centre, University of Göttingen, Göttingen, Germany
- 2Helmholtz-Zentrum, Dresden-Rossendorf, Germany
- 3Department of Geosciences and Natural Resource Management, University of Copenhagen, Copenhagen, Denmark
- 4Institute of Geosciences, Kiel University, Kiel, Germany
In geochemistry, big data applications at both global and regional scales rely on compilation and aggregation of numerous smaller datasets to form a big dataset. For this reason, geochemical data compilations are vulnerable to systematic biases between the smaller datasets they are composed of. These biases relate to the analytical methods and procedures, laboratories, instruments, sample preparation, detection limits, mass interferences, sample matrices, etc. In other words, the preparation and analysis of each sample batch is unique beyond the “method” or “laboratory” bias. This uniqueness and potential offsets it might cause between analytical batches we define as inter-study bias. The only quantitative way to evaluate inter-study bias in geochemical data compilations is through metadata and specifically assessment of analytical data of geochemical reference materials. Reference materials are substances of known composition measured alongside unknown samples, as is a standard good practice during routine geochemical analyses. In an ideal world, all geochemical studies report analyses and values of reference materials and analytical methods and analyses have been refined and calibrated to match the reference material’s certified value within uncertainty. Only in this case can the inter-study bias be considered negligible. Accordingly, most geochemical big data compilations are based on this assumption and do not explicitly assess the metadata for potential inter-study bias. In the real world, perfectly calibrated analyses are often not the case and metadata uncommonly reported.
To assess the comparability, compilability and inter-study bias between geochemical datasets, we have developed several data quality and outlier-detection tools based on the Geological and Environmental Reference Material database - GeoReM. We use these tools to showcase the implications of inter-study bias for global geochemical interpretation models using two well-known geochemical big data research topics: 1) identification of compositional end-members for oceanic basalts and the origin of their source mantle components (colloquially called “the mantle zoo”) and 2) compositional signatures of zircons as tracers for the growth, reworking and evolution of the continental crust. Our take home message: Geochemical datasets must be comparable to be compilable. We therefore advocate the assessment of your inter-study bias as well as comprehensive reporting of your metadata and reference materials, so that computational geochemistry can progress as a subdiscipline of big data science.
Keywords: Reference material, GeoReM, outlier, metadata, method bias, isotopes, mantle geochemistry, zircon, GEOROC, crustal evolution
How to cite: Traun, M. K., Renno, A. D., Kallas, L., Willbold, M., Waight, T., Garbe-Schönberg, D., and Wörner, G.: How comparable are geochemical datasets really and why it matters, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14861, https://doi.org/10.5194/egusphere-egu26-14861, 2026.