WBF2026-395, updated on 10 Mar 2026
https://doi.org/10.5194/wbf2026-395
World Biodiversity Forum 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Tuesday, 16 Jun, 09:00–09:15 (CEST)| Room Aspen 1
Literature mining of species traits integrated with genomics to transform biodiversity modelling
Robert Waterhouse1, Donat Agosti2, and Fabio Rinaldi3
Robert Waterhouse et al.
  • 1SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland (robert.waterhouse@sib.swiss)
  • 2Plazi, Bern, Switzerland (agosti@amnh.org)
  • 3Scuola Universitaria professionale della Svizzera italiana, Lugano, Switzerland (fabio.rinaldi@supsi.ch)

Combining text mining of species taxonomy and traits with biodiversity genomics is a transformative approach to enhance how data are used for species and habitat protection. A recently started Swiss National Science Foundation project aims at tackling this problem in an interdisciplinary setting. The project will deliver an open-access knowledge graph and modelling portal that links text-mined traits, genomic indicators, and environmental layers for transparent and reusable analyses. The main goals are to improve (1) spatiotemporal species distribution mapping and (2) taxonomic richness modelling by integrating organismal traits extracted from literature with genomic data, supported by benchmarking of these integrated models against occurrence-only baselines.

Although species distribution modelling increasingly incorporates traits and genomic information, progress is limited by difficulties in accessing and standardising these data. We address this gap through AI-assisted literature digitisation, named-entity recognition and normalisation, relationship extraction, and semi-automated expert curation that convert heterogeneous sources into machine-actionable formats. Uncertainties in taxonomic richness and the prevalence of undescribed “dark taxa” will be explicitly propagated into model predictions using taxonomic concept reconciliation and uncertainty quantification.

To overcome data scarcity and heterogeneity, we will mobilise “grey literature” and public genomic repositories to harmonise traits and genetic information for modelling at scale. Achieving this requires advances in biodiversity-focused text mining and the integration of extracted data with genomic analyses for species and population differentiation. The geographic and taxonomic scope is designed around key research questions and centres on birds, bats, and fish in Switzerland, and butterflies, bumblebees, and amphipods in Europe. These groups represent a gradient of taxonomic resolution (well-defined species, cryptic species, and dark taxa), varying volumes of existing knowledge that can be mined from the literature, differing baselines for trait-collection efforts, and increasing genomic data availability. 

The liberation of trapped information about species life histories, interactions, habitat preferences, etc. from the vast resources of published literature is a challenge that must be tackled in a systematic manner to advance biodiversity science.

How to cite: Waterhouse, R., Agosti, D., and Rinaldi, F.: Literature mining of species traits integrated with genomics to transform biodiversity modelling, World Biodiversity Forum 2026, Davos, Switzerland, 14–19 Jun 2026, WBF2026-395, https://doi.org/10.5194/wbf2026-395, 2026.