EGU25-4337, updated on 14 Mar 2025
https://doi.org/10.5194/egusphere-egu25-4337
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Wednesday, 30 Apr, 14:15–14:25 (CEST)
 
Room -2.92
BEACON - Accelerating access to multidisciplinary data with Relative Optimized Chunking technology
Robin Kooyman1, Peter Thijsse2, Dick Schaap3, and Tjerk Krijger4
Robin Kooyman et al.
  • 1MARIS, Nootdorp, the Netherlands (robin@maris.nl)
  • 2MARIS, Nootdorp, the Netherlands (peter@maris.nl)
  • 3MARIS, Nootdorp, the Netherlands (dick@maris.nl)
  • 4MARIS, Nootdorp, the Netherlands (tjerk@maris.nl)

Achieving fast access to analysis-ready data from a large number of multidisciplinary data resources is key for contributing to many of the nowadays societal and scientific challenges via Digital Twins of the Oceans or virtual research environments. However, achieving this kind of performance is a major challenge as original data is often organised in millions of (observation) files which makes it hard to achieve fast responses. Next to this, data from different domains are stored in a large variety of data infrastructures, each with their own data-access mechanisms, which causes researchers to spend much time on trying to access relevant data. In a perfect world, users should be able to retrieve analysis-ready data in a uniform way from different data infrastructures following their selection criteria, including for example spatial or temporal boundaries, parameter types, depth ranges and other filters. 

Therefore, as part of several European projects, MARIS has developed a software system called BEACON with a unique indexing and dynamic chunking system that can, on the fly with high performance, extract specific data based on the user’s request from millions of (observational) data files, containing multiple parameters in diverse units. The system returns one single harmonised file as output, regardless of whether the input contains many different data types or dimensions. In January 2025, BEACON 1.0.0 was made publicly available as an open-source software, allowing everyone to set-up their own BEACON node to enhance the access to their data or use existing BEACON nodes from well-known data infrastructures such as Euro-Argo or the World Ocean Database for fast and easy access to harmonized data subsets. More technical details, example applications and general information on BEACON can be found on the website https://beacon.maris.nl/.

The presentation would focus on one of the core features of BEACON called “Relative Optimized Chunking (ROC)”, which is a unique dynamic chunking technology that has been developed specifically to make the data retrieval as fast as possible. This optimized way of chunking reduces the number of chunks BEACON has to search through when a data request has been made. This is done by applying variable sized chunking on multiple levels at the same time such as geo-location, depth and time, which means that data that is relatively close to each other is chunked accordingly. This enhances the speed because it allows BEACON to traverse the millions of datasets using its index with much more precision by not only finding the relevant datasets, but also the exact data blocks containing the relevant data.

The demonstration will involve the use of an existing BEACON node in the field of marine science to access data subsets via its REST API and demonstrate its performance. This will be done in a Jupyter Notebook by querying data via a JSON request to the BEACON system. By going through the Notebook, it will be explained how the BEACON system can be accessed and used by developers including the most recent developments.

How to cite: Kooyman, R., Thijsse, P., Schaap, D., and Krijger, T.: BEACON - Accelerating access to multidisciplinary data with Relative Optimized Chunking technology, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-4337, https://doi.org/10.5194/egusphere-egu25-4337, 2025.