- 1SURF, Amsterdam, Netherlands (robert.griffioen@surf.nl)
- 2Utrecht University, Faculty of Geosciences, Physical Geography, Utrecht, Netherlands
Geoscience research faces enormous data growth, larger, more versatile datasets from satellites, IoT devices, and measurement instruments. To make full use of these data opportunities and the demand for integrated analysis, there is a need for new IT-solutions. SURF, the Dutch national digital infrastructure provider for research and education, is investigating a data lakehouse architecture in the context of an innovation project and the project of the SAGE European Green Deal Data Space (https://www.greendealdata.eu/). In SAGE we collaborate with geoscientists from the Department of Geography at Utrecht University in processing heterogeneous environmental monitoring datasets into data products for further research.
The data lakehouse architecture combines the flexibility of a datalake for handling heterogeneous data and ML workflows with the properties of a database (ACID transactions) and the governance of data warehouses. We explore this architecture using SURF services, like the object store, and open-source software from existing geoscience ecosystems like Pangeo and Earthmover. The exact properties of the data lakehouse depend on the software packages used. We present the lakehouse solution for UU use-case of serving and publishing exposome data products. Currently, data-processing of the data products is handled by a batch service. We will discuss how the lakehouse architecture could be extended to both serve the resulting data products and cover the processing stage and subsequent analysis-workflows.
How to cite: Griffioen, R., Loffredo, L., Bood, R.-J., Oonk, R., Kuipers, E., Schmitz, O., and Karssenberg, D.: A data lakehouse solution for geoscience workflows, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12863, https://doi.org/10.5194/egusphere-egu26-12863, 2026.