- Institute of Atmospheric Physics, CAS, Prague, Praha 4, Czechia (stryhal@ufa.cas.cz)
Principal component analysis (PCA) and self-organizing maps (SOMs) are two of the most widely used tools in climate research. Although typically applied to different ends, both methods involve a search for a latent, low-dimensional space that facilitates the analysis of complex, high-dimensional datasets, such as those representing atmospheric circulation. Seeds spanning the space defined by a few (typically two) leading principal components (PCs) are sometimes used to initialize a SOM. However, PCA may prove useful even beyond SOM initialization—a possibility explored in the proposed contribution.
One of the critical choices when training a SOM is the selection of the number of nodes (each representing a typical circulation pattern) and the organization of these nodes, or the SOM topology. In synoptic climatology, a two-dimensional planar SOM topology is typically used, with the SOM structured as a grid of x × y nodes. The size of this grid governs the complexity of the information captured by the trained SOM, with each scalar representing the size of one SOM dimension. At present, researchers must often rely on extensive trial-and-error testing to determine an optimal SOM configuration, particularly when working with new datasets or geographic regions.
In previous work using synthetic data, we found that SOM performance depends on a complex interplay between SOM parameters and the structure of the input data—specifically, the ratios of variance explained by leading PCs. We hypothesize that:
(1) leveraging information on data structure from PCA could help determine the optimal ratio of SOM dimensions, improving classification results and minimizing the need for initial testing; and
(2) datasets lacking a clear drop-off between the second and third PCs may be better represented using non-planar SOM topologies, such as those organized on the surface of a torus.
How to cite: Stryhal, J.: On applying PCA to identify the optimal SOM topology for synoptic climatological research, EMS Annual Meeting 2025, Ljubljana, Slovenia, 7–12 Sep 2025, EMS2025-496, https://doi.org/10.5194/ems2025-496, 2025.