Bias Aware Benchmark for Species Distribution Modeling

Emilia Arens; Nina van Tiel; Robin Zbinden; Chiara Vanalli; Damien Robert; Lukas Drees; Benjamin Kellenberger; Loïc Pellissier; Jan Dirk Wegner; Devis Tuia

doi:https://doi.org/10.5194/wbf2026-772

[Back] [Session IND11]

WBF2026-772, updated on 10 Mar 2026

https://doi.org/10.5194/wbf2026-772

World Biodiversity Forum 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Bias Aware Benchmark for Species Distribution Modeling

Emilia Arens¹, Nina van Tiel², Robin Zbinden², Chiara Vanalli², Damien Robert¹, Lukas Drees¹, Benjamin Kellenberger³, Loïc Pellissier⁴, Jan Dirk Wegner¹, and Devis Tuia²

Emilia Arens et al.

¹University of Zurich (emilia.arens@uzh.ch)
²Ecole Polytechnique Fédérale de Lausanne
³Yale University
⁴ETH Zurich

Understanding how biodiversity patterns are shaped and accurately modeling them is critical for establishing effective conservation strategies. Species distribution models (SDMs) are a fundamental tool in this effort, linking species occurrences with environmental drivers to estimate their spatial distribution. While SDMs have traditionally been fitted on one or a few species at a time, recent deep learning-based approaches jointly model thousands of species, with the potential of leveraging shared environmental structure and co-occurrence patterns in the data. However, evaluating multi-species models is inherently non-trivial, and we show that existing metrics do not adequately capture model performance.
Most studies dealing with large sets of species summarize performance using a few averaged metrics. However, species distribution modeling is strongly affected by geographic and taxonomic biases in the occurrence data. A single metric blurs the extent to which these biases shape model performance. Moreover, widely used metrics, such as the area under the receiver operating characteristic curve (AUROC), inadvertently reflect these biases making them more difficult to interpret, especially when aggregated between species. These issues exist in any multi-species setting, but are amplified when scaling to thousands of species across broader geographic ranges and more heterogeneous biases.
To address these gaps, we propose a bias-aware evaluation framework for multi-species SDMs. We define several proxy scores for characterizing various per-species biases in the training data - including sampling imbalance, occurrence sparsity, and taxonomic neglect. These scores allow us to evaluate model performance across species groups with differing bias levels, revealing, for example, whether a model is robust to geographic bias, struggles with sparsely sampled taxa, or performs well only for well-sampled species. Along with these bias-informed metrics, we introduce a curated, fully non-anonymized global plant dataset combining GBIF citizen-science records with SPlotOpen vegetation plots. This dataset is explicitly designed to enable transparent, species-resolved performance evaluation.
Together, the dataset and bias evaluation scheme provide the framework needed to test the currently most pressing yet unsolved SDM challenges, including long-tailed distributions and spatiotemporal observation biases, at scales and complexities that are appropriate for modern-day, deep learning-based multi-species distribution models.

How to cite: Arens, E., van Tiel, N., Zbinden, R., Vanalli, C., Robert, D., Drees, L., Kellenberger, B., Pellissier, L., Wegner, J. D., and Tuia, D.: Bias Aware Benchmark for Species Distribution Modeling, World Biodiversity Forum 2026, Davos, Switzerland, 14–19 Jun 2026, WBF2026-772, https://doi.org/10.5194/wbf2026-772, 2026.