EGU26-7020, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-7020
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Tuesday, 05 May, 16:15–18:00 (CEST), Display time Tuesday, 05 May, 14:00–18:00
 
Hall X4, X4.30
FAIRenrich: Distributed semantic annotation at the repository edge
Alexander Wolodkin, Claus Weiland, Jonas Grieb, and Robert Brylka
Alexander Wolodkin et al.
  • Senckenberg – Leibniz Institution for Biodiversity and Earth System Research, Senckenberg Digital Collection Technologies (SDCT), Frankfurt, Germany (alexander.wolodkin@senckenberg.de)

Senckenberg’s natural history collections encompass over 45 million physical specimens distributed across 11 facilities, with 1.6 million digitized records accessible in 124 collections. Additional digital objects are stored in various infrastructures, such as Edaphobase, a digital repository for harmonized soil information (physical, chemical, and biological), and the WildlIVE Portal, a platform for FAIR (Findable, Accessible, Interoperable, and Reusable) data sharing of biodiversity monitoring with edge sensors such as camera traps. Managing this heterogeneous landscape, ranging from legacy specimen data from the pre-digital era to newly digitized objects, presents significant challenges regarding legal compliance, data sovereignty, and the implementation of FAIR principles.

To address this volume, the FAIRenrich workflow automates the semantic annotation and maintenance of existing digital collection data. Handling the complexity of such enrichment requires AI models suited for partially non-deterministic tasks, incorporating an optional human-in-the-loop mechanism. By executing these workflows on a distributed network of stationary and mobile edge computing devices ('last-mile AI'), the architecture ensures strict adherence to data sovereignty and privacy requirements.

Beyond data curation, FAIRenrich's distributed architecture enables systemic efficiency through resource pooling informed by industry practice. Institutional edge infrastructure is rarely fully utilized; by networking heterogeneous devices, the system dynamically reallocates idle capacity to enrichment tasks. This mirrors industry practice: major technology operators, such as Google, systematically redeploy hardware from their refresh cycles into secondary-use programs.

FAIRenrich extends this model to legacy hardware: rather than treating end-of-life equipment as waste, the system enables cost-effective redeployment for delay-tolerant semantic enrichment tasks, such as inference workloads without strict latency requirements. By aligning workload scheduling to renewable peaks (e.g., photovoltaic installations), the approach implements carbon-aware scheduling principles used by major technology operators, achieving both infrastructure cost reduction and extended hardware lifecycles. This creates a circular-economy model for research institutions, transforming refresh-cycle surplus into productive scientific infrastructure.

This contribution demonstrates how FAIRenrich enables sustainable semantic annotation through a distributed edge architecture that simultaneously ensures data sovereignty, optimizes infrastructure utilization, and can realize cost-effective redeployment of legacy hardware. The approach exemplifies a scalable blueprint for research institutions seeking to decouple semantic enrichment from project-resource limitations through parallelization, temporal flexibility, and circular infrastructure practices.

How to cite: Wolodkin, A., Weiland, C., Grieb, J., and Brylka, R.: FAIRenrich: Distributed semantic annotation at the repository edge, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7020, https://doi.org/10.5194/egusphere-egu26-7020, 2026.