Objectively Determining the Number of Similar Hydrographic Clusters with Unsupervised Machine Learning

Carola Trahms; Yannick Wölker; Arne Biastoch

doi:https://doi.org/10.5194/egusphere-egu23-11687

[Back] [Session ITS1.13/AS5.2]

EGU23-11687, updated on 11 Jun 2025

https://doi.org/10.5194/egusphere-egu23-11687

EGU General Assembly 2023

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Objectively Determining the Number of Similar Hydrographic Clusters with Unsupervised Machine Learning

Carola Trahms

, Yannick Wölker

, and Arne Biastoch

Carola Trahms et al.

GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel, Physikalische Ozeanographie, Kiel, Germany (ctrahms@geomar.de)

Determining the number of existing water masses and defining their boundaries is subject to ongoing discussion in physical oceanography. Traditionally, water masses are defined manually by experts setting constraints based on experience and previous knowledge about the hydrographic properties describing them. In recent years, clustering, an unsupervised machine learning approach, has been introduced as a tool to determine clusters, i.e., volumes, with similar hydrographic properties without explicitly defining their hydrographic constraints. However, the exact number of clusters to be looked for is set manually by an expert up until now.

We propose a method that determines a fitting number of clusters for hydrographic clusters in a data driven way. In a first step, the method averages the data in different-sized slices along the time or depth axis as the structure of the hydrographic space changes strongly either in time or depth. Then the method applies clustering algorithms on the averaged data and calculates off-the-shelf evaluation scores (Davies-Bouldin, Calinski-Harabasz, Silhouette Coefficient) for several predefined numbers of clusters. In the last step, the optimal number of clusters is determined by analyzing the cluster evaluation scores across different numbers of clusters for optima or relevant changes in trend.

For validation we applied this method to the output for the subpolar North Atlantic between 1993 and 1997 of the high-resolution Atlantic Ocean model VIKING20X, in direct exchange with domain experts to discuss the resulting clusters. Due to the change from strong to weak deep convection in these years, the hydrographic properties vary strongly in the time and depth dimension, providing a specific challenge to our methodology.

Our findings suggest that it is possible to identify an optimal number of clusters using the off-the-shelf cluster evaluation scores that catch the underlying structure of the hydrographic space. The optimal number of clusters identified by our data-driven method agrees with the optimal number of clusters found by expert interviews. These findings contribute to aiding and objectifying water mass definitions across multiple expert decisions, and demonstrate the benefit of introducing data science methods to analyses in physical oceanography.

How to cite: Trahms, C., Wölker, Y., and Biastoch, A.: Objectively Determining the Number of Similar Hydrographic Clusters with Unsupervised Machine Learning, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-11687, https://doi.org/10.5194/egusphere-egu23-11687, 2023.