- 1MARIS, Nootdorp, Netherlands (paul@maris.nl)
- 2MARIS, Nootdorp, Netherlands (peter@maris.nl)
- 3MARIS, Nootdorp, Netherlands (tjerk@maris.nl)
- 4MARIS, Nootdorp, Netherlands (robin@maris.nl)
- 5MARIS, Nootdorp, Netherlands (dick@maris.nl)
The Horizon Europe Blue-Cloud 2026 project evolved the pilot Blue-Cloud infrastructure into an ecosystem supporting FAIR and open data and analytical services. This ecosystem is envisioned as a data and analytical component for EDITO and can serve as a blueprint for thematic EOSC instances and Research Infrastructures. Within this context, a concrete plan was developed for high-performance data subsetting capabilities across the Blue-Cloud Virtual Research Environment (VRE), enabling researchers and WorkBench developers to access harmonised and validated Essential Ocean Variables (EOVs) from heterogeneous sources.
To implement this, the project adopted the fully open-source (AGPLv3) Beacon technology developed by MARIS as the core software for deploying data lakes across the VRE. Beacon provides very fast and easy access to data subsets from large multidisciplinary collections, returning a single harmonised output file regardless of the source formats. Eight monolithic Beacon instances were deployed for major Blue Data Infrastructure (BDI) collections including the World Ocean Database, ERA5, Copernicus Marine CORA, Euro-Argo, and SeaDataNet. All instances were integrated with the D4Science federated AAI and complemented by dedicated Jupyter notebooks to support reproducible workflows.
Based on extensive testing with the WorkBench teams, two integrated Beacon instances have been developed, combining data from multiple monolithic nodes through Beacon’s federation capabilities. A common metadata profile was set-up in collaboration with the WorkBenches, to support semantic harmonisation across different data sources, using the NERC Vocabulary Service, semantic tools, and unit-conversions. These merged nodes demonstrate cross-infrastructure data integration, representing a big step toward a European-scale federated data ecosystem.
This presentation will demonstrate how Beacon enables integrated workflows across infrastructures, significantly reducing effort for both data providers and researchers. While widely used in Blue-Cloud, Beacon’s design is domain-agnostic, with ongoing applications in other European and national initiatives, illustrating its potential as an innovative data lake tool for federating infrastructures.
How to cite: Weerheim, P., Thijsse, P., Krijger, T., Kooyman, R., and Schaap, D.: Beacon data lakes for federated, high-performance access to marine data in the Blue-Cloud2026 ecosystem, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10574, https://doi.org/10.5194/egusphere-egu26-10574, 2026.