EGU26-12863, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-12863
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Tuesday, 05 May, 08:30–10:15 (CEST), Display time Tuesday, 05 May, 08:30–12:30
 
Hall X4, X4.87
A data lakehouse solution for geoscience workflows
Robert Griffioen1, Layla Loffredo1, Robert-Jan Bood1, Raymond Oonk1, Els Kuipers2, Oliver Schmitz2, and Derek Karssenberg2
Robert Griffioen et al.
  • 1SURF, Amsterdam, Netherlands (robert.griffioen@surf.nl)
  • 2Utrecht University, Faculty of Geosciences, Physical Geography, Utrecht, Netherlands

Geoscience research faces enormous data growth, larger, more versatile datasets from satellites, IoT devices, and measurement instruments. To make full use of these data opportunities and the demand for integrated analysis, there is a need for new IT-solutions. SURF, the Dutch national digital infrastructure provider for research and education, is investigating a data lakehouse architecture in the context of an innovation project and the project of the SAGE European Green Deal Data Space (https://www.greendealdata.eu/). In SAGE we collaborate with geoscientists from the Department of Geography at Utrecht University in processing heterogeneous environmental monitoring datasets into data products for further research.  

The data lakehouse architecture combines the flexibility of a datalake for handling heterogeneous data and ML workflows with the properties of a database (ACID transactions) and the governance of data warehouses. We explore this architecture using SURF services, like the object store, and open-source software from existing geoscience ecosystems like Pangeo and Earthmover. The exact properties of the data lakehouse depend on the software packages used. We present the lakehouse solution for UU use-case of serving and publishing exposome data products. Currently, data-processing of the data products is handled by a batch service. We will discuss how the lakehouse architecture could be extended to both serve the resulting data products and cover the processing stage and subsequent analysis-workflows. 

How to cite: Griffioen, R., Loffredo, L., Bood, R.-J., Oonk, R., Kuipers, E., Schmitz, O., and Karssenberg, D.: A data lakehouse solution for geoscience workflows, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12863, https://doi.org/10.5194/egusphere-egu26-12863, 2026.