Self-explaining Deep Learning-based Species Distribution Models

Benjamin Kellenberger; Walter Jetz

doi:https://doi.org/10.5194/wbf2026-921

[Back] [Session IND11]

WBF2026-921, updated on 10 Mar 2026

https://doi.org/10.5194/wbf2026-921

World Biodiversity Forum 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Self-explaining Deep Learning-based Species Distribution Models

Benjamin Kellenberger and Walter Jetz

Yale University, Center for Biodiversity and Global Change, New Haven, USA

Many fundamental processes in ecology are nowadays modelled with ever-increasing complexity, also thanks to advancements in data science. Species distribution modelling is no exception to this trend, and in particular deep learning-based models have steadily been maturing in recent years, promising high prediction performance for many species at large scales. Yet, a fundamental desire in ecology is not just to predict, but understand, observed processes, both environmental and model-intrinsic. Deep learning models are often described as black boxes due to their complexity, and hence have a notorious reputation of being unsuitable for either. However, recent years have seen great advancements in both unravelling and more explicitly quantifying the decision process of deep neural networks.

In this work, we explore the potential of a deep learning-based species distribution model (SDM) that explains itself by design. The model achieves this via a learned top-K sampling scheme with attention mechanisms on the environmental covariates it receives. In detail, the model is forced to select a subset of user-definable size (K) of covariates that is as useful as possible for the prediction of species encounter likelihoods at each data point. Within this sampling scheme, covariates are either available or not (and not modulated as in regular attention mechanisms), and unlike in post-hoc explainability methods, no auxiliary model is required to explain the SDM's decision process. The result are per-covariate importance scores that are as trustworthy as possible.

We evaluate our model on a set of around 830,000 observations for 356 mammal species, sampled over North America, comparing prediction performances and investigating obtained covariate importances. We find that our sampling scheme does result in highly consistent covariate combinations across runs, and further see plausible correlations with the environmental configuration across the continent. We further investigate correspondence with post-hoc explainability methods and find improvable agreement, highlighting the challenges in explainability for machine learning models in general, and deep learning SDMs in particular.

How to cite: Kellenberger, B. and Jetz, W.: Self-explaining Deep Learning-based Species Distribution Models, World Biodiversity Forum 2026, Davos, Switzerland, 14–19 Jun 2026, WBF2026-921, https://doi.org/10.5194/wbf2026-921, 2026.