- 1Norwegian University of Science and Technology
- 2Norwegian Institute for Nature Research
Fragmented datasets, sampling bias and inconsistent observation protocols often limit the use of citizen science data for indicator development. Citizen science data are often collected opportunistically without a design for use in biodiversity metrics. However, the large volume of data, and the broad spatial and taxonomic coverage, provide an invaluable source of biodiversity information at scale.
Here, we present a pipeline that integrates heterogeneous datasets to generate large scale maps of biodiversity metrics. These maps form a basis for management relevant information tools. We apply integrated species distribution modelling (iSDM) to correct for sampling bias and differences in data collection methods. We use the large number of open datasets available through aggregators such as GBIF.
The workflow has four main steps. These are data acquisition, data integration, integrated species distribution modelling (iSDM) and the production of derived outputs. Input data include structured surveys, opportunistic observations and environmental covariates. We standardise these inputs and combine them in a common iSDM framework. This produces species intensity maps, associated uncertainty estimates and sampling effort maps. We further process these outputs to identify biodiversity hotspots and to summarise species environment relationships.
We use Norway as a case study. Norway has extensive opportunistic citizen science programs. We produced detailed maps of species richness, biodiversity hotspots, uncertainty and sampling intensity. Our results show the potential of pipelines that integrate disparate datasets. Our example also reveals important limitations in the current body of data. In particular, it is not possible to infer and correct for sampling bias without data that allow estimation of the probability of occurrence. In practice this means data that include information on both what was observed and what was not observed. Our study therefore demonstrates a clear need to incorporate more structured approaches into citizen science data. This should not replace opportunistic, curiosity driven activity. It should add to it and support both the large data volumes and the high level of public engagement.
How to cite: Finstad, A., Perin, S., Mostert, P., Adjei, K., Togunov, R., and O'Hara, B.: Addressing data fragmentation in biodiversity citizen science data: Pipelines for integrated species distribution Models , World Biodiversity Forum 2026, Davos, Switzerland, 14–19 Jun 2026, WBF2026-643, https://doi.org/10.5194/wbf2026-643, 2026.