EGU General Assembly 2021
© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

When does gap filling of trait data confound taxonomic and functional analyses? 

Julia Joswig1,2, Jens Kattge2,3, Guido Kraemer2,3,4, Miguel Mahecha2,3,4, Nadja Rüger3,5,6, Michael Schaepman1, Franziska Schrodt7, Christian Wirth2,3,8, and Meredith Schuman1,9
Julia Joswig et al.
  • 1University of Zürich (UZH), Remote Sensing, Geography, Zürich, Switzerland (
  • 2Max Planck Institute for Biogeochemistry, Jena, Germany
  • 3German Centre for Integrative Biodiversity Research (iDiv) Leipzig, Germany
  • 4Remote Sensing Centre for Earth System Research, University of Leipzig and Helmholtz Centre for Environmental Research, Leipzig, Germany
  • 5Department of Economics, University of Leipzig, Leipzig, Germany
  • 6Smithsonian Tropical Research Institute, Ancon, Panama
  • 7School of Geography, University of Nottingham, University Park, Nottingham, UK
  • 8Institute of Systematic Botany and Functional Biodiversity, University of Leipzig, Leipzig, Germany
  • 9Department of Chemistry, University of Zürich, Zürich, Switzerland

Data on plant traits are increasingly used to understand relationships between biodiversity and ecosystem processes. Large trait databases are sparse because they are compiled from many smaller and usually more local databases. This sparsity severely limits the potential for both multivariate and global data analyses, and so "gap-filling" (imputation) approaches are commonly used to predict missing trait data prior to analysis. Data imputation can result in large biases and circularity; yet, no best practice has evolved for the appropriate use of gap-filled data. Here, we use the TRY database, the largest global database of plant traits, in combination with the commonly used gap-filling algorithm, BayesianHierarchical Probabilistic Matrix Factorization (BHPMF), to address opportunities and problems introduced by gap-filling. BHPMF is the gap-filling method of choice for both TRY, and the large and widely used database sPLOT. It predicts missing trait data using the taxonomic hierarchy and observed patterns of trait variance and trait-trait correlations. We use three metrics: root mean square error estimates, coefficient of variation to assess univariate deviation, and silhouette indices to assess multivariate deviation and clustering strength. We show that gap-filling results in deviation of these metrics calculated for groupings at lower taxonomic levels (intra-specific and intra-genera), but less so at higher taxonomic levels (family) and for functional groups. Trait-trait correlations are preserved at all levels. The strength of deviations depends both on the percentage of gaps, and on data characteristics, e.g. intra-taxa variability. Gap-filling with dataset-external trait data generally ameliorates prediction error, but the deviations of intra-taxonomic variation measures depend on the content of the added data. We conclude that BHPMF gap-filling introduces little bias if specifically used for analyses of traits within functional groups, including growth forms and plant functional types (PFTs), as well as trait-trait correlations. However, we generally discourage their use for analyses of taxonomic groupings at or below the family level. In summary, our study supports decisions on when and how to integrate BHPMF gap-filled trait data in future studies. We conclude with selected best practices when using sparse databases.

How to cite: Joswig, J., Kattge, J., Kraemer, G., Mahecha, M., Rüger, N., Schaepman, M., Schrodt, F., Wirth, C., and Schuman, M.: When does gap filling of trait data confound taxonomic and functional analyses? , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13954,, 2021.


Display file