- 1Institute for Geography, Johannes Gutenberg-University, Mainz, Germany (cthoene@students.uni-mainz.de)
- 2Institute for Geography, Johannes Gutenberg-University, Mainz, Germany (a.baethge@uni-mainz.de)
- 3Institute for Geography, Johannes Gutenberg-University, Mainz, Germany (reinecke@uni-mainz.de)
Earth system data, measured by satellites and terrestrial stations and simulated by increasingly complex models, provide valuable information for identifying functional relationships within the Earth system. These relationships are essential for understanding complex interactions and predicting changes, for example, in climatic or ecological processes, but often only occur in certain spatiotemporal sections or within certain threshold values. With the increasing spatiotemporal resolution of remote sensing products and models, a manual analysis is impractical, and hypothesis-driven approaches can lead to undiscovered hidden relationships. Previous work proposed the SONAR (automated diScovery Of fuNctionAl Relationships) decision-tree algorithm to automatically search for functional relationships in earth system data without a-priori assumptions. We analyzed the proposed algorithm using artificially generated data to evaluate SONAR's functionality. We tested if the choice of statistical indicator (Pearson’s r, Spearman’s ρ, Kendall’s τ, and Mutual Information) influences the functionality of the SONAR algorithm and which factors are important for the identification of functional relationships. Using 1512 synthetic data sets and the developed SAMPI (Similarity of A Manifested and Prototypical decision tree Indicator) coefficient, we demonstrate how the performance of the algorithm changes under different variations of the data sets - including the number of designated splits, the presence of interfering variables and the strength and nature of the underlying functional relationships. In particular, we show which statistical indicator provides the best results under these conditions. The results demonstrate that the SONAR algorithm is very versatile, especially when employing the most reliable statistical indicator. The SONAR algorithm could, therefore, have far-reaching applications, for example, in analyzing climatic patterns or investigating dependencies between environmental factors.
How to cite: Thöne, C., Bäthge, A., and Reinecke, R.: The effects of different statistical indicators in the new decision-tree-based SONAR algorithm for automated detection of functional relationships in Big Earth Data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10444, https://doi.org/10.5194/egusphere-egu25-10444, 2025.