Deep-Learning Site-Occupancy Models for Disentangling Biases in Species Distribution and Trend Assessment from Citizen Science Data

Raphaël Benerradi; Christophe Botella; Maximilien Servajean; Alexis Joly

doi:https://doi.org/10.5194/wbf2026-738

[Back] [Session IND11]

WBF2026-738, updated on 10 Mar 2026

https://doi.org/10.5194/wbf2026-738

World Biodiversity Forum 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Oral | Wednesday, 17 Jun, 11:15–11:30 (CEST)| Room Sanada 1

Deep-Learning Site-Occupancy Models for Disentangling Biases in Species Distribution and Trend Assessment from Citizen Science Data

Raphaël Benerradi¹, Christophe Botella¹, Maximilien Servajean^1,2, and Alexis Joly¹

Raphaël Benerradi et al.

¹LIRMM, Inria, Université de Montpellier, Montpellier, France
²Université de Montpellier Paul-Valéry, Montpellier, France

Monitoring species distributions at large spatial and temporal scales is critical for understanding ecological dynamics and informing conservation strategies. Standardized survey protocols can produce presence-absence data that are particularly valuable, but they remain costly, time-consuming, and require botanical expertise, resulting in limited and sparse data. In contrast, opportunistic presence-only data from citizen science programs such as Pl@ntNet offer unprecedented spatial, temporal, and taxonomic coverage. However, they are affected by strong sampling, detection, and reporting biases that can obscure true species distributions and trends. To address these challenges, we propose a framework that integrates deep learning with site-occupancy models to estimate species distributions from presence-only data while explicitly accounting for these biases.

Site-occupancy models allow disentangling species presence from observation processes, yielding more reliable estimates of species presence probabilities. Incorporating deep learning enables these models to be fitted efficiently and flexibly through stochastic gradient-based optimization, making it possible to analyze massive opportunistic datasets at scale.

We first validate the approach using realistic simulated datasets, comparing deep-learning-based inference with classical methods, including gradient-based maximum likelihood and Bayesian approaches, to demonstrate both computational scalability and reliability of the resulting predictions.
We then evaluate model performance on benchmark spatial species distribution modeling datasets using presence-only data. Results highlight the ability of deep-learning site-occupancy models to capture spatial variation in occurrence probabilities while mitigating reporting biases.
Finally, we explore the potential of our framework to leverage large citizen science datasets for assessing spatio-temporal changes in species distributions. In particular, we compare trends inferred from opportunistic observations – such as those from Pl@ntNet – using our approach, with trends derived from structured survey protocols to assess similarities, differences, and complementary information provided by these different types of data.

By combining site-occupancy modeling with deep learning on massive opportunistic datasets, our approach would bring new insights into large-scale species distributions and monitor changes over time. In addition, the flexibility of deep learning could allow for refined modeling of observer behaviors, detection patterns, enabling more accurate assessments of species distribution and trends from heterogeneous data sources.

How to cite: Benerradi, R., Botella, C., Servajean, M., and Joly, A.: Deep-Learning Site-Occupancy Models for Disentangling Biases in Species Distribution and Trend Assessment from Citizen Science Data, World Biodiversity Forum 2026, Davos, Switzerland, 14–19 Jun 2026, WBF2026-738, https://doi.org/10.5194/wbf2026-738, 2026.