- German Aerospace Center (DLR), Institute of Data Science, Jena, Germany (firstname.lastname@dlr.de)
The exponential growth in data generation across scientific domains has amplified the critical role of data science in extracting actionable insights from complex datasets (Chen, 2012; Müller, 2018; Wamba, 2017; Yin, 2015). Traditional data science methodologies, such as the Cross-Industry Standard Process for Data Mining (CRISP-DM) and the Knowledge Discovery in Databases (KDD) process, provide structured frameworks for data processing and model development (Fayyad, 1996; Shearer, 2000). However, these approaches often treat visualization as a terminal step for communicating results rather than as an integral component of the analytical process. Visual analytics addresses this limitation by emphasizing human-computer interaction throughout the analytical workflow, enabling iterative exploration, hypothesis testing, and knowledge generation through interactive visual interfaces (Keim, 2008; Sacha, 2014; Thomas, 2006). Data scientists increasingly rely on computational notebooks for their flexibility in combining code, data, and visualization within unified environments (Chattopadhyay, 2020; Kosara, 2023). However, traditional notebook platforms face significant challenges, including a lack of reproducibility due to execution order dependencies, limited interactivity, difficult version control, and constrained deployment options (Chattopadhyay, 2020). These limitations create friction when transitioning from exploratory analysis to production systems, particularly for visual analytics applications requiring sophisticated interactive visualizations and real-time analytical capabilities (Barik, 2016; Haertel, 2023).
This research investigates the applicability of modern interactive visualization notebooks as comprehensive platforms for end-to-end data science and visual analytics pipelines. The solution artifact employs Marimo, an open-source Python notebook solution that addresses traditional notebook limitations through reactive cell execution and deterministic ordering, as well as a Python-code structure (Kluyver, 2016). The approach integrates multiple technologies, including object storage (e.g., MinIO) for centralized data repositories, analytical databases for efficient data management, and declarative visualization libraries based on Vega and Vega-Lite grammars for flexible interactive graphics (Heer, 2024; VanderPlas, 2018). The methodology is demonstrated through a space weather exploration use case examining the impact of solar activity on Global Navigation Satellite Systems (Su, 2019). The implementation follows the KDD process phases (Fayyad, 1996), beginning with the selection of the NEDM space weather model, which provides three-dimensional electron density estimates based on the F10.7 solar flux index combined with satellite orbital data (Hoque, 2022). The process commences with preprocessing to calculate rolling averages of solar activity indices and to derive satellite identifiers. Following this, transformations are performed to determine satellite positions using Simplified General Perturbations algorithms and to aggregate electron density across spatial grids. Data mining is utilized to create interactive visualizations of visible satellites, including their calculated electron content values. Ultimately, interpretation facilitates interactive selection and recalibration through user-driven dashboard interfaces.
The demonstrator effectively combines data management, processing, and interactive visual analytics into a cohesive notebook environment. This integration fosters streamlined workflows that reduce friction between disparate tools, enhances transparency through documented, reproducible analytical processes (Kosara, 2023), and facilitates real-time interactivity, enabling dynamic parameter adjustments and iterative exploration. Additionally, it provides extensive support for visual analytics that spans the entire knowledge-generation model, from data transformation to insight discovery (Sacha, 2014).
How to cite: Pohl, M. and Reibert, J.: Enhancing Data Science Pipelines through Interactive Environments for Visual Analytics of Spatiotemporal Data, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19078, https://doi.org/10.5194/egusphere-egu26-19078, 2026.