When does gap filling of trait data confound taxonomic and functional analyses?

Julia Joswig; Jens Kattge; Guido Kraemer; Miguel Mahecha; Nadja Rüger; Michael Schaepman; Franziska Schrodt; Christian Wirth; Meredith Schuman

doi:https://doi.org/10.5194/egusphere-egu21-13954

[Back] [Session BG3.20]

EGU21-13954

https://doi.org/10.5194/egusphere-egu21-13954

EGU General Assembly 2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

When does gap filling of trait data confound taxonomic and functional analyses?

Julia Joswig

^1,2, Jens Kattge

^2,3, Guido Kraemer^2,3,4, Miguel Mahecha

^2,3,4, Nadja Rüger^3,5,6, Michael Schaepman

¹, Franziska Schrodt⁷, Christian Wirth^2,3,8, and Meredith Schuman^1,9

Julia Joswig et al.

¹University of Zürich (UZH), Remote Sensing, Geography, Zürich, Switzerland (julia.joswig@geo.uzh.ch)
²Max Planck Institute for Biogeochemistry, Jena, Germany
³German Centre for Integrative Biodiversity Research (iDiv) Leipzig, Germany
⁴Remote Sensing Centre for Earth System Research, University of Leipzig and Helmholtz Centre for Environmental Research, Leipzig, Germany
⁵Department of Economics, University of Leipzig, Leipzig, Germany
⁶Smithsonian Tropical Research Institute, Ancon, Panama
⁷School of Geography, University of Nottingham, University Park, Nottingham, UK
⁸Institute of Systematic Botany and Functional Biodiversity, University of Leipzig, Leipzig, Germany
⁹Department of Chemistry, University of Zürich, Zürich, Switzerland

Data on plant traits are increasingly used to understand relationships between biodiversity and ecosystem processes. Large trait databases are sparse because they are compiled from many smaller and usually more local databases. This sparsity severely limits the potential for both multivariate and global data analyses, and so "gap-filling" (imputation) approaches are commonly used to predict missing trait data prior to analysis. Data imputation can result in large biases and circularity; yet, no best practice has evolved for the appropriate use of gap-filled data. Here, we use the TRY database, the largest global database of plant traits, in combination with the commonly used gap-filling algorithm, BayesianHierarchical Probabilistic Matrix Factorization (BHPMF), to address opportunities and problems introduced by gap-filling. BHPMF is the gap-filling method of choice for both TRY, and the large and widely used database sPLOT. It predicts missing trait data using the taxonomic hierarchy and observed patterns of trait variance and trait-trait correlations. We use three metrics: root mean square error estimates, coefficient of variation to assess univariate deviation, and silhouette indices to assess multivariate deviation and clustering strength. We show that gap-filling results in deviation of these metrics calculated for groupings at lower taxonomic levels (intra-specific and intra-genera), but less so at higher taxonomic levels (family) and for functional groups. Trait-trait correlations are preserved at all levels. The strength of deviations depends both on the percentage of gaps, and on data characteristics, e.g. intra-taxa variability. Gap-filling with dataset-external trait data generally ameliorates prediction error, but the deviations of intra-taxonomic variation measures depend on the content of the added data. We conclude that BHPMF gap-filling introduces little bias if specifically used for analyses of traits within functional groups, including growth forms and plant functional types (PFTs), as well as trait-trait correlations. However, we generally discourage their use for analyses of taxonomic groupings at or below the family level. In summary, our study supports decisions on when and how to integrate BHPMF gap-filled trait data in future studies. We conclude with selected best practices when using sparse databases.

How to cite: Joswig, J., Kattge, J., Kraemer, G., Mahecha, M., Rüger, N., Schaepman, M., Schrodt, F., Wirth, C., and Schuman, M.: When does gap filling of trait data confound taxonomic and functional analyses? , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13954, https://doi.org/10.5194/egusphere-egu21-13954, 2021.

Displays

Display file