Statistical Upscaling of Point-Based Sediment Observations to Continental-Scale Maps Using 40 Years of Landsat and Explainable Machine Learning

Gültekin Erten

doi:https://doi.org/10.5194/egusphere-egu26-1419

[Back] [Session BG9.7]

EGU26-1419, updated on 13 Mar 2026

https://doi.org/10.5194/egusphere-egu26-1419

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Oral | Thursday, 07 May, 15:15–15:25 (CEST)

Room 1.14

Statistical Upscaling of Point-Based Sediment Observations to Continental-Scale Maps Using 40 Years of Landsat and Explainable Machine Learning

Gültekin Erten

General Directorate of Mineral Research and Exploration, Department of Geological Research, Ankara, Türkiye (gultekin.erten@mta.gov.tr)

Large-scale mapping of environmental variables increasingly relies on integrating sparse in-situ measurements with dense remote sensing archives using advanced machine learning approaches. Suspended sediment concentration (SSC), a key driver of water quality, biogeochemical fluxes, and river-delta morphodynamics, is traditionally monitored at point locations and exhibits strong spatial and temporal heterogeneity. This makes SSC an ideal testbed for evaluating methodological challenges in upscaling point observations to continuous environmental surfaces. In this study, a continental-scale SSC mapping framework is developed by combining 40 years of Landsat surface reflectance with 16,311 quality-filtered SSC measurements from 247 U.S. Geological Survey stations across diverse hydroclimatic regions of the United States.

Tree-based ensemble models, including CatBoost, are employed to learn nonlinear relationships between spectral indices (e.g., Red/Green ratio, MNDWI, NIR reflectance), topographic metrics, precipitation records, and spatiotemporal predictors (latitude, longitude, month). During model development, several challenges central to environmental upscaling are addressed: (i) reference data that are not independent and identically distributed, (ii) spatial heterogeneity in sediment-generating processes, (iii) systematic biases introduced by log-transformation and back-transformation, and (iv) the risk of extrapolation artifacts when predictions are generated outside the feature space of the training data. Spatial dependencies in residuals are quantified using Moran’s I, and the performance of direct SSC prediction and ln(SSC)-based models is compared to illustrate how transformation choices influence uncertainty and predictive robustness across sediment regimes. Spatiotemporal predictors primarily encode climatological and regional priors rather than explicit causal processes, and results are therefore interpreted in a large-scale, statistical context.

To enhance interpretability, an increasingly important component of environmental machine learning, SHAP (SHapley Additive Explanations) values are computed to quantify feature contributions. SHAP analysis reveals strong physical consistency in the model outputs: elevated SSC is associated with high Red/Green ratios and NIR reflectance, water-pixel purity is improved by MNDWI, and elevation and longitude capture broad geomorphic and climatic gradients at continental scale, including the well-documented east-west aridity-driven increase in sediment yield. These insights allow regions with limited representativeness or increased extrapolation risk to be identified, providing a transparent diagnostic tool that extends beyond traditional accuracy metrics.

Generalizability is examined by applying the model to four major U.S. rivers (Mississippi, Colorado, Columbia, and Hudson). Spatial and temporal dynamics are reproduced, including snowmelt-driven sediment pulses in the Colorado River, regulated low-sediment conditions in the Columbia, seasonal fluctuations in the Mississippi, and episodic sediment events in the Hudson. These results demonstrate that spatially explicit machine learning models, when carefully validated, can upscale sparse in-situ measurements into continuous environmental maps that preserve regionally consistent behaviors and large-scale patterns.

Overall, the study shows that long-term satellite archives, physically informed predictors, and explainable machine learning techniques provide a robust foundation for upscaling environmental variables. By addressing spatial heterogeneity, uncertainty propagation, and interpretability, the framework contributes to the broader effort to generate reliable, large-scale geospatial products from distributed observation networks and can be transferred to other environmental variables requiring point-to-continuous scaling.

How to cite: Erten, G.: Statistical Upscaling of Point-Based Sediment Observations to Continental-Scale Maps Using 40 Years of Landsat and Explainable Machine Learning, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-1419, https://doi.org/10.5194/egusphere-egu26-1419, 2026.