- 1Johannes Gutenberg-University Mainz, Institute of Geography, Earth System Modelling Group, Mainz, Germany (reinecke@uni-mainz.de)
- 2International Centre for Water Resources and Global Change, Koblenz, Germany
- 3Global Runoff Data Centre (GRDC), Federal Institute of Hydrology, Koblenz, Germany
In situ and remote sensing data are crucial in earth sciences, as they provide complementary perspectives on environmental phenomena. In situ data, collected directly from the Earth’s surface, offer high accuracy and detailed insights into local conditions, enabling precise measurements of variables such as soil moisture, temperature, and pollutant levels. Conversely, remote sensing data provides for extensive spatial coverage and the ability to monitor changes over time across vast areas, capturing large-scale patterns and trends that in situ data alone cannot reveal. By combining these two data sources and automatically preprocessing them into Analysis-Ready Data, researchers can enhance scientific insights, improve the robustness of machine learning applications, and refine models used to predict environmental changes or assess the impacts of human activity on natural systems. This integrated approach promotes a more comprehensive understanding of complex Earth processes, enabling better-informed decision-making and effective management strategies for sustainable development. However, preprocessing and combining in situ data from different sources can be highly complex, especially for global datasets. Joining this data with remotely sensed products may require substantial computational resources, given the increased number of observational records and high temporal resolutions. Here, we present a prototype of such a pipeline, CULTIVATE, an open-source data-processing pipeline that efficiently cleans in situ records and combines them with remote sensing data to create an automatically curated database. As new in situ data records are inserted, CULTIVATE updates only those records in the final database. In this presentation, we showcase CULTIVATE for over 200,000 global groundwater well observation time series that are merged with an extensive list of other time-series products, and we show how data curators can interact with the data processing pipeline. We further discuss how this prototype can serve as a blueprint for future architecture development for Research Data Infrastructures, how we can implement and enforce international standards, and how we can enable global datacenters to utilize automated data preparation in operational settings.
How to cite: Reinecke, R., Bäthge, A., Noack, D., Zink, M., Mischel, S., and Dietrich, S.: A prototype Open-Source data-processing pipeline to efficiently combine in-situ data with remote-sensing observations of the Earth, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-6238, https://doi.org/10.5194/egusphere-egu26-6238, 2026.