- ECEO, EPFL, Sion, Switzerland (robin.zbinden@epfl.ch)
Species distribution models (SDMs) are vital tools for monitoring biodiversity. By relating environmental conditions to species occurrences, these statistical models enable the mapping of species distribution and provide insights into the key drivers influencing their patterns. In this context, the recently proposed MaskSDM approach presents a novel opportunity to highlight and interpret the influence of input variables and data modalities on the modeled outcomes. By leveraging masked data modeling during training, MaskSDM allows the flexible selection of any subset of input variables, based on their availability for the given location and their relevance to the target species. This flexibility enables the evaluation of both predictions and model performance dynamics across different subsets of variables.
In this study, we use MaskSDM to investigate the impact of using time series alongside traditional tabular data for modeling the distribution of plant species across Europe. The temporal dimension of ecological processes is crucial, with phenology playing a significant role in driving plant species behavior and distribution. Time series data effectively capture these dynamic processes, providing valuable insights to the model. For our analysis, we utilize the GeoPlant dataset, which comprises monthly climatic time series for temperature and precipitation, as well as satellite-derived time series spanning six spectral bands at a quarterly resolution. These satellite data capture local patterns, such as seasonal vegetation changes and the effect of extreme natural events like wildfires.
MaskSDM being based on a transformer model, we evaluate several approaches for tokenizing the time series data, and assess the individual contribution of each input. Our results show that the performance of different tokenization methods is comparable. We then examine the effect of incorporating various types of time series data on model performance and compare it to the use of tabular data only: adding satellite time series increases the AUC on the spatially separated test set by 2.4%. The addition of climatic time series yields a smaller improvement, likely because the tabular data already includes some aggregated form of statistics redundant to these time series. The best performance is achieved by combining all time series data with the tabular data, showing their complementary nature.
Finally, we produce species distribution maps that consider different data types. The impact of adding time series data to the tabular data is evident after the analysis of the maps, which become closer to the spatial distribution of the presence observations. These findings emphasize the importance of incorporating time series data into SDMs, particularly satellite data, as it captures temporal dynamics that are difficult to represent through tabular data alone.
How to cite: Zbinden, R., Charlet, J., Sümbül, G., and Tuia, D.: Evaluating the impact of multimodal climate and satellite time series in species distribution modeling using MaskSDM, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-11949, https://doi.org/10.5194/egusphere-egu25-11949, 2025.