- 1McGill University, Montréal, Canada
- 2Mila–Quebec AI Institute, Montréal, Canada
- 3Université de Montréal, Montréal, Canada
- 4Environmental Computational Science and Earth Observation Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Recently, deep learning approaches to species distribution models (SDMs) have increasingly focused on integrating information-rich modalities such as natural language and remote sensing, motivated by the hypothesis that capturing the non-linear relationships between these inputs and species occurrences should help compensate for limited biodiversity data in poorly monitored regions. However, while leveraging additional modalities has been shown to improve predictions in certain settings, we argue that these improvements remain highly dependent on the task formulation and dataset. We consider the SatBird dataset (Teng et al., 2023) as an illustrative example, showing how leveraging representations derived from satellite imagery does not consistently translate into performance improvements, especially in low-data regimes. We argue that multimodality shouldn't be treated as a generic stepping stone towards improving deep learning-based SDMs, as it can often boil down to the naive assumption that any additional information will be beneficial regardless of their ecological relevance. We also highlight that multimodal approaches in deep learning-based SDMs are predominantly reducible to the inclusion of more and more abiotic covariates, and discuss how such a strategy can amplify the risk of overfitting to sampling biases and amplifying spurious correlations. Finally, we show that leveraging relevant, context-dependent biotic information offers a particularly promising alternative research direction, considering as case studies our work with 1) BATIS (Villeneuve et al., 2026), a novel Bayesian framework that iteratively refines prior predictions from an uncertainty-aware SDM using limited local observations in data-scarce regions, and 2) CISO (Abdelwahed et al., 2025), a novel transformer-based approach that leverages well-documented species groups to improve predictions for data-limited taxa. Results with both BATIS and CISO suggest that universal solutions are unlikely to be sufficient to address current limitations in deep learning-based SDMs, and that further improvements in predictive performance are more likely to come from targeted approaches dedicated to specific data gaps and ecological contexts.
How to cite: Villeneuve, C., Teng, M., Akera, B., Radi Abdelwahed, H., Zbinden, R., Pollock, L., Larochelle, H., Tuia, D., and Rolnick, D.: Reevaluating Multimodal Approaches To Deep Species Distribution Models, World Biodiversity Forum 2026, Davos, Switzerland, 14–19 Jun 2026, WBF2026-813, https://doi.org/10.5194/wbf2026-813, 2026.