- 1MARIS, Nootdorp, Netherlands (robin@maris.nl)
- 2MARIS, Nootdorp, Netherlands (peter@maris.nl)
- 3MARIS, Nootdorp, Netherlands (dick@maris.nl)
- 4MARIS, Nootdorp, Netherlands (tjerk@maris.nl)
- 5MARIS, Nootdorp, Netherlands (paul@maris.nl)
Environmental science increasingly relies on large, heterogeneous, and rapidly growing data collections that must be accessed, subsetted, and harmonised efficiently for use in models, digital twins, AI pipelines, and Virtual Research Environments (VREs). The open-source (AGPLv3) Beacon software developed by MARIS addresses this challenge by enabling cloud-native, high-performance data lakes that are easy and fast to access (user) and set-up (provider).
Beacon is designed for very fast real-time access to data subsets from large collections, returning one harmonised file on-the-fly. The software can read datasets stored in a wide variety of file formats (NetCDF, Parquet, Zarr, and Beacon Binary Format) stored locally or stored on S3 compatible Object Stores. Subsetting by users can be done using SQL or JSON queries on individual datasets, multiple datasets at the same time, or entire collections of datasets.
It is written in Rust and C, chosen for their low-level control and superior performance compared to Python-based or traditional database systems. It runs on any platform via Docker containers and consists of a REST API for data querying and index management, combined with core libraries that enable fast data indexing and search. Next to this, Beacon supports making your data collection more interoperable, by including mappings and allowing for harmonisation with other sources on the fly.
From a provider perspective it is very simple to set-up a Beacon instance containing your data collection. The easiest and fastest way to get a Beacon Instance up and running is through using the Beacon docker compose file. To enable Beacon to connect to an existing S3 bucket requires only 2 additional environment variables to be set. The “AWS_ENDPOINT” which tells Beacon what the URL to the S3 provider is, and the “BEACON_S3_BUCKET” which tells Beacon which Bucket to use as data collection to enable subsetting on. This means it can be set up in less than a minute.
After setting up your Beacon instance, it is immediately accessible via various entries, such as Jupyter Notebooks or a newly developed User Interface called Beacon Studio. Beacon Studio enables users to easily query, explore, download, and visualise data from a Beacon instance through a User Interface, without requiring programming skills. It allows users to build and execute queries against a Beacon instance using simplified menus that describe the contents of the collection. After running a query, users can download the resulting dataset in multiple formats or display the data directly on an interactive map.
This presentation will highlight Beacon’s technological innovations, cloud-ready deployment pathways, successful implementations in BlueCloud2026 context, and practical and simple applications from a user’s perspective. With its domain-agnostic and scalable architecture, Beacon is now being adopted in national and European initiatives, showcasing its value for a wide variety of different use cases.
How to cite: Kooyman, R., Thijsse, P., Schaap, D., Krijger, T., and Weerheim, P.: Beacon: A FAIR high-performance, ARCO data lake technology supporting interoperable environmental research, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10638, https://doi.org/10.5194/egusphere-egu26-10638, 2026.