- Naval Research Lab, Ocean Sciences, United States of America (renzaglia3@gmail.com)
Big data has become increasingly important in marine geoscience, where in situ measurements are often limited, leaving large portions of the seafloor unsampled. To address this gap, we present a data-driven approach that leverages non-parametric machine learning algorithms—specifically, an ensemble of k-Nearest Neighbors (kNN) and Random Forest regressors—to predict a global geospatial prediction of median grain size (D50) at a 2-arc minute resolution. Our methodology incorporates parametric uncertainty quantification in the form of distance-to-nearest-neighbor metrics in feature space, thereby creating spatially explicit uncertainty maps that highlight regions where additional data collection would most effectively improve model predictions. This emphasis on parametric uncertainty serves as a roadmap for data-driven exploration, reducing the time, energy, and cost associated with collecting or curating a comprehensive dataset.
We train the model on ~40,000 publicly available, seafloor grain size measurements and iteratively optimize hyperparameters based on prediction error and out-of-sample validation. The final model is a global prediction of seafloor grain size with a correlation of ~0.65 between observed and predicted grain size values. We also apply a ranked noise grid analysis to select predictor variables that minimize the overall predictive error, ensuring the feature set is robust and agnostic to human bias.
Regions with sparse data coverage or atypical geological conditions manifest as areas of high uncertainty, underscoring the need for targeted sampling. By mapping this uncertainty, our framework facilitates strategic data acquisition efforts and reduces curation time and cost. We demonstrate the impact of sampling high uncertainty regions on not only improving predictions in the newly sampled geographical location but are also geologically similar (close in parameter space) around the globe. In doing so, it demonstrates how the synergy between machine learning approaches and systematic data-driven exploration can enhance the dependency of global seafloor property models. Our predicted grain size map provides a proxy for further regional and global studies that rely on grain size measurements, while more broadly highlighting the transformative potential of machine learning methods to refine our approach to data exploration and curation.
How to cite: Renzaglia, J., Lee, T., and Le, A.: Global Seafloor Grain-Size Prediction: A Data-Driven Approach, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-12879, https://doi.org/10.5194/egusphere-egu25-12879, 2025.