EGU25-4993, updated on 14 Mar 2025
https://doi.org/10.5194/egusphere-egu25-4993
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Thursday, 01 May, 14:45–14:55 (CEST)
 
Room -2.92
Blue-Cloud 2026 project - Deploying BEACON data lakes for harmonizing ocean data access for Virtual Research Environments
Dick M. A. Schaap, Peter Thijsse, Tjerk Krijger, and Robin Kooyman
Dick M. A. Schaap et al.
  • Marine Information Service MARIS. B.V, Nootdorp, Netherlands (dick@maris.nl)

In order to provide users with fast and easy access to multidisciplinary data originating from large collections, MARIS has developed a software system called BEACON that can, on the fly with high performance, extract specific data based on the user’s request. This software has been customised and deployed in the Blue-Cloud2026 project and several other European projects and is designed to return one single harmonised file as output, regardless of whether the input contains different data types. In January 2025, BEACON 1.0.0 was made publicly available as an open-source software, allowing everyone to set-up their own BEACON ‘node’ to enhance the access to their data or use existing BEACON nodes from well-known data infrastructures such as Euro-Argo or the World Ocean Database for fast and easy access to harmonized data subsets. More technical details, example applications and general information on BEACON can be found on the website https://beacon.maris.nl/.

Within the context of Blue-Cloud2026, BEACON is deployed to provide access to harmonised subsets from Blue Data Infrastructures for the WorkBenches (WB) that aim to generate harmonised and validated data collections of Essential Ocean Variables (EOVs). To this end a set of monolithic BEACON nodes were set-up for relevant data collections such as the WOD, CMEMS Cora, Euro-Argo and more. Developments are well underway for parallel deployment of these BEACON instances and related notebooks at the D4Science e-infrastructure as part of the Blue-Cloud VRE, giving access to all users registered as Blue-Cloud users. 

Going one step further, the output from multiple monolithic BEACON instances are combined into one merged BEACON node for each WB. Work is ongoing for a structural mapping from each monolithic BEACON to the target Common Metadata Profile as defined by the WB teams. These mappings will be used in the BEACON queries to retrieve and load contents ‘as-is’ from monolithic BEACON instances into the merged BEACON instances, giving a common structure for variables, units, values, quality flags, and common metadata profile fields. The structured metadata and data will be supplemented by additional metadata data as available for each of the monolithic BEACON instances.

This presentation will cover an introduction of the Blue-Cloud 2026 project and  developments of the merged BEACON nodes, explaining how it can practically serve as data lakes for many VRE applications and how it is extendable to other domains. By using examples from the WBs, the reduction in time and effort spent for the researchers to collect the data are highlighted. 

How to cite: Schaap, D. M. A., Thijsse, P., Krijger, T., and Kooyman, R.: Blue-Cloud 2026 project - Deploying BEACON data lakes for harmonizing ocean data access for Virtual Research Environments, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-4993, https://doi.org/10.5194/egusphere-egu25-4993, 2025.