EGU22-11565
https://doi.org/10.5194/egusphere-egu22-11565
EGU General Assembly 2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Assessing the effect of spatial autocorrelation in predicting groundwater salinity with Machine Learning

Panagiotis Tziachris, George Arampatzis, Vassilios Aschonitis, Katerina Sachsamanoglou, and Evangelos Tziritis
Panagiotis Tziachris et al.
  • Soil and Water Resources Institute (SWRI), Hellenic Agricultural Organization Demeter (ELGO-DIMITRA) (p.tziachris@swri.gr)

Machine learning (ML) models that are robust, efficient and exhibiting sound generalization capabilities rely on the assumption that they are trained with data that are independent and identically distributed (i.i.d). Violating this assumption may result in overfitting these highly flexible methods to the training data and underestimating spatial prediction errors. Making models appear more reliable than they are, could lead in a bias assessment of the model’s capability to generalize the learned relationship to independent data and consequently models with overall poor prediction accuracy.

Spatial data are special kind of data that the i.i.d. does not hold most of the times due to their spatial autocorrelation. Cross-validation is a very common resampling method both for the tuning of ML models and for the assessment of their predictive capabilities. Studies have shown that using random cross-validation methods with spatial data could produce overoptimistic results due to the violation of the i.i.d assumption. In order to mitigate this problem, spatial cross-validation is proposed alternatively that splits the data into spatially disjoint subsets, which are subsequently used for cross-validation.

In the context of the MEDSAL Project (www.medsal.net), multiple data of different covariates were collected in order to study groundwater salinization. Machine learning was applied to predict salinity concentration based on these data. In the current presentation some of the results of the ML analysis are shown along with the effect of the spatial autocorrelation in the ML models' prediction capabilities. This was implemented by comparing the prediction results of the ML models created with random cross-validation versus spatial cross-validation resampling methods. Possible spatial autocorrelation, along with time series autocorrelation, in water data are important issues that data analysts should study and address especially when pairing with ML analysis and modeling.

How to cite: Tziachris, P., Arampatzis, G., Aschonitis, V., Sachsamanoglou, K., and Tziritis, E.: Assessing the effect of spatial autocorrelation in predicting groundwater salinity with Machine Learning, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11565, https://doi.org/10.5194/egusphere-egu22-11565, 2022.

Displays

Display file