- Indian Institute of Technology ISM Dhanbad (aditi.seal.94@gmail.com)
Several machine learning algorithms have been developed for earthquake catalog declustering and have demonstrated high accuracy, particularly for the well-studied Southern California region. This study applies a machine learning–based Probabilistic Random Forest (PRF) approach to earthquake declustering and compares its performance with that of the Random Forest (RF) method in the Southern California region by introducing noise into the dataset. Although the Southern California dataset is of high quality due to a dense seismic network, inherent observational and instrumental noise can still affect model performance. Five features are considered, each describing a different aspect of the space–time–magnitude interactions inherent in seismicity. The rescaled time (T*) represents the temporal interval between consecutive seismic events, while the rescaled distance (R*) quantifies their spatial separation. The magnitude difference is expressed as Δmj = mi − mj, where i denotes the nearest neighbor, and generally attains larger values when event j is an aftershock of a stronger mainshock. The number of siblings refers to the count of events that share the same nearest neighbor as event j, with higher values indicating multiple aftershocks associated with a common parent event. The number of offspring denotes the number of subsequent events that identify event j as their nearest neighbor, thereby reflecting its triggering potential. For training and testing the RF and PRF algorithms, the original dataset was supplied to the epidemic type aftershock sequence (ETAS) model for parameter estimation using the maximum likelihood method. Based on the estimated parameters, 100 different realizations of the combined background–cluster labeled dataset were generated using the thinning algorithm. Background events were labeled as “0”, whereas clustered events were labeled as “1” in the synthetic dataset. Three types of feature noise are introduced to assess model robustness: Type-I applies uniform Gaussian noise across all objects and features, Type-II assigns different noise levels to randomly grouped objects and features, and Type-III applies independent noise levels to training and testing datasets. Noise magnitudes are controlled by feature-wise standard deviations and an overall noise factor, with noisy values sampled from Gaussian distributions. For the synthetic datasets, figure illustrates the difference in declustering accuracy between the Probabilistic Random Forest (PRF) and standard Random Forest (RF) models across the three types of noise. For Type I noise, the maximum accuracy improvement is approximately 2%, while Type II noise shows an increase of around 2.5%. Type III noise, which represents a more complex noise scenario, exhibits a moderate accuracy gain of about 1.5%. For the real seismic datasets, the accuracy differences between PRF and RF are generally higher. As shown in figure, Type I noise leads to an accuracy improvement of nearly 2%, Type II noise also shows an enhancement of about 2%, while Type III noise, representing the most complex scenario, exhibits a substantial improvement of nearly 6%. The results demonstrate that as noise complexity increases particularly when the correlation within the noise becomes weaker, the PRF model consistently outperforms the standard RF classification.
How to cite: Seal, A. and Jana, N.: Earthquake Catalog Declustering in Southern California Using a Probabilistic Random Forest Approach, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16116, https://doi.org/10.5194/egusphere-egu26-16116, 2026.