WBF2026-845, updated on 10 Mar 2026
https://doi.org/10.5194/wbf2026-845
World Biodiversity Forum 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Tuesday, 16 Jun, 10:30–10:45 (CEST)| Room Aspen 1
Wikimedia as a Platform for Evidence Synthesis: Quantifying Bias and Literature Coverage in Crowd-Sourced Knowledge Graphs
Daniel Mietchen1,2,3 and Jacqueline Dearborn4
Daniel Mietchen and Jacqueline Dearborn
  • 1FIZ Karlsruhe — Leibniz Institute for Information Infrastructure, Berlin, Germany
  • 2Leibniz-Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany
  • 3Institute for Globally Distributed Open Research and Education (IGDORE), Jena, Germany
  • 4Data Futures GmbH, Leipzig, Germany

The extraction and liberation of biodiversity knowledge from scientific literature increasingly depends on both automated and community-driven workflows. The Wikimedia ecosystem comprised of Wikipedia, Wikidata, Wikimedia Commons, Wikisource, Wikispecies and related projects has become a large, open, multilingual environment consistently ranked amongst the world’s top 10 websites. It reflects many of the challenges and opportunities outlined in the Disentis Roadmap. This contribution presents Wikimedia projects as complementary infrastructures for literature-based biodiversity evidence synthesis and examines how scientific publications enter and are transformed within this ecosystem.

Biodiversity literature reaches Wikimedia through several pathways, including (1) papers cited from encyclopedic entries related to biodiversity and beyond, (2) suitably licensed digital publications or digitized out-of-copyright publications hosted in Wikisource, (3) images and other media extracted from the literature and deposited in Wikimedia Commons, (4) structured bibliographic, taxonomic and methodological metadata represented in Wikidata through both automated imports (e.g., WikiCite workflows) and community editing, and (5) taxonomic concepts curated in Wikispecies. Entities named in the literature can further propagate across Wikimedia projects, creating a rich network of linked context.

We examine how these distributed contributions together form a community-maintained pipeline that captures, structures, and redistributes biodiversity knowledge from the scientific literature to the large, diverse, global and multilingual audience of Wikimedia projects. We outline how taxonomic groups, geographic regions, habitat types and publication types are represented across Wikimedia platforms, how media extraction compares to the volume of available literature, how community editing patterns introduce or mitigate various types of biases, and how these facets change over time. We also discuss Wikidata’s increasing integration with research biodiversity infrastructures like BHL, Bionomia or GBIF, and we explore the strengths and limitations of query-based evidence synthesis using Wikidata-related tools like Scholia.

By situating Wikimedia workflows within the broader goals of the Disentis Roadmap, we highlight how community curation, open licensing, and machine-readable knowledge graphs can complement large-scale digitization and text-mining pipelines. Finally, we outline pathways for integrating Wikimedia-derived biodiversity information further with infrastructures such as GBIF and Biodiversity PMC, thereby enhancing the accessibility, reusability, and interoperability of literature-based biodiversity evidence.

How to cite: Mietchen, D. and Dearborn, J.: Wikimedia as a Platform for Evidence Synthesis: Quantifying Bias and Literature Coverage in Crowd-Sourced Knowledge Graphs, World Biodiversity Forum 2026, Davos, Switzerland, 14–19 Jun 2026, WBF2026-845, https://doi.org/10.5194/wbf2026-845, 2026.