WBF2026-772, updated on 10 Mar 2026
https://doi.org/10.5194/wbf2026-772
World Biodiversity Forum 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Wednesday, 17 Jun, 11:30–11:45 (CEST)| Room Sanada 1
Bias Aware Benchmark for Species Distribution Modeling
Emilia Arens1, Nina van Tiel2, Robin Zbinden2, Chiara Vanalli2, Damien Robert1, Lukas Drees1, Benjamin Kellenberger3, Loïc Pellissier4, Jan Dirk Wegner1, and Devis Tuia2
Emilia Arens et al.
  • 1University of Zurich (emilia.arens@uzh.ch)
  • 2Ecole Polytechnique Fédérale de Lausanne
  • 3Yale University
  • 4ETH Zurich

Understanding how biodiversity patterns are shaped and accurately modeling them is critical for establishing effective conservation strategies. Species distribution models (SDMs) are a fundamental tool in this effort, linking species occurrences with environmental drivers to estimate their spatial distribution. While SDMs have traditionally been fitted on one or a few species at a time, recent deep learning-based approaches jointly model thousands of species, with the potential of leveraging shared environmental structure and co-occurrence patterns in the data. However, evaluating multi-species models is inherently non-trivial, and we show that existing metrics do not adequately capture model performance.
Most studies dealing with large sets of species summarize performance using a few averaged metrics. However, species distribution modeling is strongly affected by geographic and taxonomic biases in the occurrence data. A single metric blurs the extent to which these biases shape model performance. Moreover, widely used metrics, such as the area under the receiver operating characteristic curve (AUROC), inadvertently reflect these biases making them more difficult to interpret, especially when aggregated between species. These issues exist in any multi-species setting, but are amplified when scaling to thousands of species across broader geographic ranges and more heterogeneous biases.
To address these gaps, we propose a bias-aware evaluation framework for multi-species SDMs. We define several proxy scores for characterizing various per-species biases in the training data - including sampling imbalance, occurrence sparsity, and taxonomic neglect. These scores allow us to evaluate model performance across species groups with differing bias levels, revealing, for example, whether a model is robust to geographic bias, struggles with sparsely sampled taxa, or performs well only for well-sampled species. Along with these bias-informed metrics, we introduce a curated, fully non-anonymized global plant dataset combining GBIF citizen-science records with SPlotOpen vegetation plots. This dataset is explicitly designed to enable transparent, species-resolved performance evaluation.
Together, the dataset and bias evaluation scheme provide the framework needed to test the currently most pressing yet unsolved SDM challenges, including long-tailed distributions and spatiotemporal observation biases, at scales and complexities that are appropriate for modern-day, deep learning-based multi-species distribution models.

How to cite: Arens, E., van Tiel, N., Zbinden, R., Vanalli, C., Robert, D., Drees, L., Kellenberger, B., Pellissier, L., Wegner, J. D., and Tuia, D.: Bias Aware Benchmark for Species Distribution Modeling, World Biodiversity Forum 2026, Davos, Switzerland, 14–19 Jun 2026, WBF2026-772, https://doi.org/10.5194/wbf2026-772, 2026.