Displays

HS3.7

Geostatistics is commonly applied in the Water, Earth and Environmental sciences to quantify spatial variation, produce interpolated maps with quantified uncertainty and optimize spatial sampling designs. Extensions to the space-time domain are also a topic of current interest. Due to technological advances and abundance of new data sources from remote and proximal sensing and a multitude of environmental sensor networks, big data analysis and data fusion techniques have become a major topic of research. Furthermore, methodological advances, such as hierarchical Bayesian modeling and machine learning, have enriched the modelling approaches typically used in geostatistics.

Earth-science data have spatial and temporal features that contain important information about the underlying processes. The development and application of innovative space-time geostatistical methods helps to better understand and quantify the relationship between the magnitude and the probability of occurrence of these events.

This session aims to provide a platform for geostatisticians, soil scientists, hydrologists, earth and environmental scientists to present and discuss innovative geostatistical methods to study and solve major problems in the Water, Earth and Environmental sciences. In addition to methodological innovations, we also encourage contributions on real-world applications of state-of-the-art geostatistical methods.

Given the broad scope of this session, the topics of interest include the following non-exclusive list of subjects:
1. Advanced parametric and non-parametric spatial estimation and prediction techniques
2. Big spatial data: analysis and visualization
3. Optimisation of spatial sampling frameworks and space-time monitoring designs
4. Algorithms and applications on Earth Observation Systems
5. Data Fusion, mining and information analysis
6. Integration of geostatistics with optimization and machine learning approaches
7. Application of covariance functions and copulas in the identification of spatio-temporal relationships
8. Geostatistical characterization of uncertainties and error propagation
9. Bayesian geostatistical analysis and hierarchical modelling
10. Functional data analysis approaches to geostatistics
11. Geostatistical analysis of spatial compositional data
12. Multiple point geostatistics
13. Upscaling and downscaling techniques
14. Ontological framework for characterizing environmental processes

Share:
Co-organized by ESSI1/GI6/NH1/SSS10
Convener: Emmanouil Varouchakis | Co-conveners: Gerard Heuvelink, Dionissios Hristopulos, R. Murray Lark, Alessandra MenafoglioECSECS
Displays
| Attendance Wed, 06 May, 08:30–10:15 (CEST)

Files for download

Download all presentations (124MB)

Chat time: Wednesday, 6 May 2020, 08:30–10:15

Chairperson: Gerard Heuvelink, Dionissios Hristopulos, R. Murray Lark, Alessandra Menafoglio, Emmanouil Varouchakis
D90 |
EGU2020-7958<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"></span>
| solicited
Madlene Nussbaum and Stéphane Burgos

Spatial information on soil is crucial for many applications such as spatial planning, erosion reduction, climate mitigation and forest or natural hazard management. Many countries (e. g. Switzerland, France, Germany, Albania) still use conventional soil mapping approaches which are often very time consuming and costly. Methods to gain soil maps with geostatistics and supported with other digital technologies have reached a high level of maturity some time ago. Each single method has been well studied and transfer to practice took place in some countries. Nevertheless, we are not aware of a large soil mapping endeavor that sampled a considerable amount of new soil data by a practical and geostatistically sound sampling design and by integrating digital field tools, centralized soil data management, soil spectroscopy, digital soil mapping and subsequent soil function assessment all followed by quality assurance measures.

 

In Switzerland, political pressure has recently risen to improve the basis for soil related decision making. The administration of the Swiss Canton of Berne aims to map agricultural and forest soils of the lowlands (210000 hectares) with high resolution to allow for decisions relevant to landownership. In the mountainous areas (240000 hectares) at least maps with medium detail are necessary, especially for natural hazard management. Currently, the project is in the phase of efficiency testing of each methodological element and establishing of interfaces between them. We present a concept that combines available state-of-the-art technologies and should allow to create the required detailed soil maps within the next 15 years. Only few legacy soil data are available, hence we planned for 5200 newly sampled profile pits and about 360000 auger holes. This large sampling effort is hierarchically structured with field observations based on classical pedological descriptions supported with laboratory and field spectroscopy. Iterative sampling is driven by the uncertainty of the maps up to the point where the required accuracy is reached. Intermediate and final soil maps are created with machine learning based digital soil mapping techniques. From the finally mapped soil properties soil functions and application products are derived by digital soil assessment approaches driven by the needs of the end users.

 

Within this phase of the project we exploited the legacy soil maps available for the surroundings of some villages. As soil augerings were not recorded during map production, we generated “virtual soil samples” from the maps and used a machine learning based model averaging approach to predict soil properties for the nearby areas. Class width and multiple assignments of legend units per soil map polygon were considered by a non-parametric bootstrap approach to create predictive distributions and map the uncertainty. To avoid extrapolation into areas with different soil forming factors we have carefully chosen the target area for prediction based on a similarity analysis. The predictions have been successfully validated with legacy soil profiles and new field observations.

How to cite: Nussbaum, M. and Burgos, S.: Detailed soil mapping for large areas in Berne – putting well researched knowledge into practice, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7958, https://doi.org/10.5194/egusphere-egu2020-7958, 2020

D91 |
EGU2020-2051<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Federico Gatti, Niccolò Togni, Alessandra Menafoglio, Luca Bonaventura, Monica Papini, and Laura Longoni

SMART-SED is a project aimed at developing an innovative framework for the numerical simulation of sediment motion in river catchments, intended to be used by local territorial management institutions and professionals to design proper strategies for the mitigation of hydrogeological instability. Uncertainty analysis is an intrinsic feature of models simulating natural processes. In order to perform an effective uncertainty quantification, it is necessary to properly identify the variability of the input parameters and to design stochastic simulation methods able to provide realistic realisations, based on the available data. This thesis focuses on the use of digital soil maps for the prediction and stochastic simulation of terrain-related quantities used for the estimation of the input parameters of the SMART-SED model. The digital maps are obtained from SoilGrids, a system for automated soil mapping based on state-of-the-art spatial predictions methods. Innovative approaches are introduced to account for the limitations of SoilGrids data (low resolution, inaccuracy) and for the specificities of the variables in exam. Although the focus is on the SMART- SED project, the methods proposed can be generally used for geostatistical modelling at a local scale using auxiliary coarse information obtained through remote sensing or from previously fitted digital maps.

How to cite: Gatti, F., Togni, N., Menafoglio, A., Bonaventura, L., Papini, M., and Longoni, L.: Geostatistical analysis for Uncertainty Quantification in the SMART-SED model: a downscaling approach based on Digital Soil Mapping data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2051, https://doi.org/10.5194/egusphere-egu2020-2051, 2020

D92 |
EGU2020-5486<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"></span>
Mathieu Le Coz, Léa Pannecoucke, Xavier Freulon, Charlotte Cazala, and Chantal de Fouquet

Characterization of contamination in soils resulting from nuclear or industrial activities is a crucial issue for site remediation. A classical approach consists in delineating the contaminated zones based on a geostatistical estimation calibrated from measured activities, but it results in high uncertainties when the number of measurements is low and/or the spatial variability of the studied variable is governed by complex processes. In order to reduce these uncertainties, a novel approach, called Kriging with Numerical Variogram (KNV), is developed: the variogram is computed from a set of physically-based flow-and-transport simulations rather than from the measurements.

The KNV approach is assessed on a two-dimensional synthetic reference test case reproducing the migration of a tritium plume within an unsaturated soil with hydraulic properties highly variable in space. The results show that the mean absolute error in estimated activities is 50% to 75% lower with KNV compared to classical geostatistical approaches, depending on the sampling scenario. Moreover, KNV leads to a significant reduction of the empirical error standard deviation, which reflects uncertainties on the estimated activities. The performance of KNV regarding the classification into contaminated or not-contaminated zones is yet sensitive to the contamination threshold.

The KNV approach could thus help to better estimate volumes of soils to be decontaminated in the context of remediation of nuclear or industrial sites. This approach can be transposed to other scales of heterogeneities, such as systems with several geological units, or other pollutants with a more complex chemical behavior, as soon as a numerical code that simulates the phenomenon under study is available.

This study is part of Kri-Terres project, supported by the French National Radioactive Waste Management Agency (Andra) under the “Investments for the Future” national program.

How to cite: Le Coz, M., Pannecoucke, L., Freulon, X., Cazala, C., and de Fouquet, C.: Combining geostatistics and physically-based simulations to characterize contaminated soils, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5486, https://doi.org/10.5194/egusphere-egu2020-5486, 2020

D93 |
EGU2020-1685<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Qinzhuo Liao, Gang Lei, and Shirish Patil

We propose an efficient analytical upscaling method to compute the equivalent conductivity tensor for elliptic equations in three-dimensional space. Our approach uses perturbation expansion and Fourier analysis, and considers heterogeneity, anisotropy and geometry of coarse gridblocks. Through low-order approximation, the derived analytical solution accurately approximates the central-difference numerical solution with periodic boundary conditions. Numerical tests are performed to demonstrate the capability and efficiency of this analytical approach in upscaling fluid flow in heterogeneous formations. We test the method in synthetic examples and benchmark cases with both Gaussian random fields and channelized non-Gaussian fields. In addition, we examine the impact of each parameter on the upscaled conductivity, and investigate the sensitivity of the variance and correlation lengths to the coefficients. We also indicate how to extend this approach to multiphase flow problems.

How to cite: Liao, Q., Lei, G., and Patil, S.: Efficient Analytical Upscaling of Conductivity Tensor for Three-dimensional Heterogeneous Anisotropic Formations, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1685, https://doi.org/10.5194/egusphere-egu2020-1685, 2019

D94 |
EGU2020-11781<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"></span>
Ingelin Steinsland, Thea Roksvåg, and Kolbjørn Engeland

We present a new Bayesian geostatistical hierarchical model that is particularly suitable for interpolation of hydrological data when the available dataset has short records, for including overlapping catchments consistently and for combining areal (runoff) and point (precipitation) observations. A key feature of the proposed framework is that several years of runoff are modeled simultaneously with two Gaussian random fields (GRFs): One that is common for all years under study and represents the runoff generation due to long-term climatic conditions, and one that is year specific. The framework is demonstrated by filling in missing values of annual runoff and by predicting mean annual runoff for about 200 catchments in Norway. The predictive performance is compared to Top-Kriging (interpolation method) and simple linear regression (method for exploiting short records). The results show that if the runoff is driven by weather patterns that are repeated over time, the value of including short records is large, and that we for partially gauged catchments perform better than comparable methods for both annual spatial interpolation and mean annual runoff. We also find that short records, even of length one year, can safely be included in the model.

In a smaller case study of ten years of annual runoff in Voss in Norway it is demonstrated that by combining runoff and precipitation data in the model framework that includes consistently modelling of overlapping catchments on average preforms better compared to using only one of the data sources. Further, the interaction between nested areal data and point data gives a geostatistical model that takes us beyond smoothing: The model can give predictions that are higher (or lower) than any of the observations.

A finding is that in Norway the climatic effects dominates over annual effects for annual runoff. Through a simulation study we demonstrate that in this case systematic under- and overestimation of runoff over time can be expected. On the other hand, a strong climate implies that short records of runoff from an otherwise ungauged catchment can lead to large improvements in the predictability of runoff.

How to cite: Steinsland, I., Roksvåg, T., and Engeland, K.: A new Bayesian hierarchical geostatistical model based on two spatial fields with case studies with short records of annual runoff in Norway, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11781, https://doi.org/10.5194/egusphere-egu2020-11781, 2020

D95 |
EGU2020-5249<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Fabio Oriani, Simon Stisen, Mehmet C. Demirel, and Gregoire Mariethoz

In the era of big data, missing data imputation remains a delicate topic for both the analysis of natural processes and to provide input data for physical models. We propose here a comparative study for missing data imputation on daily rainfall, a variable that can exhibit a complex structure composed of a dry/wet pattern and anisotropic sharp variations.

The seven algorithms considered can be grouped in two families: geostatistical interpolation techniques based on inverse-distance weighting and Kriging, widely used in gap-filling [1], and data-driven techniques based on the analysis of historical data patterns. This latter family of algorithms has been already applied to rainfall generation [2, 3], but it is not originally suitable to historical datasets presenting many data gaps. This happens because they usually operate in a rigid framework where, when a rainfall value is estimated for a station, the others are considered as predictor variables and require to be informed. To overcome this limitation, we propose here i) an adaptation of k-nearest neighbor (KNN) and ii) a new algorithm called Vector Sampling (VS), that combines concepts of multiple-point statistics and resampling. These data-driven algorithms can draw estimations from largely and variably incomplete data patterns, allowing the target dataset to be at the same time the training dataset.

Tested on different case studies from Denmark, Australia, and Switzerland, the algorithms show a different performance that seems to be related to the terrain type: on flat terrains with spatially uniform rain events, geostatistical interpolation tends to minimize the error, while, in mountainous regions with non-stationary rainfall statistics, data mining can recover better the complex rainfall patterns. The VS algorithm, being faster than KNN and requiring minimal parametrization, turns out to be a convenient option for routine application if a representative historical dataset is available. VS is open-source and freely available at .

 

REFERENCES:

org/

org/

How to cite: Oriani, F., Stisen, S., Demirel, M. C., and Mariethoz, G.: Missing data imputation for multisite rainfall networks: a comparison between geostatistical interpolation and data-mining estimation on different terrain types, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5249, https://doi.org/10.5194/egusphere-egu2020-5249, 2020

D96 |
EGU2020-972<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Aleksandar Sekulic, Milan Kilibarda, Gerard B.M. Heuvelink, Mladen Nikolić, and Branislav Bajat

Regression kriging is one of the most popular spatial interpolation techniques. Its main strength is that it exploits both spatial autocorrelation as well as information contained in environmental covariates. While regression kriging is still dominant, in the past few years machine learning, especially random forest, is increasingly being used for mapping. Machine learning is more flexible than multiple linear regression and can thus make better use of environmental covariates. But machine learning typically ignores spatial autocorrelation. Several attempts have been made to include spatial autocorrelation in random forest, by adding distances to observation locations and other geometries to the set of covariates. But none of these studies have tried the obvious solution to include the nearest observations themselves and the distances to the nearest observations as covariates. In this study we tried this solution by introducing and testing Random Forest for Spatial Interpolation (RFSI). RFSI trains a random forest model on environmental covariates as well as nearest observations and their distances from the prediction point. We applied and evaluated RFSI for mapping daily precipitation in Catalonia for the 2016-2018 period. We trained four models (random forests, RFsp, pooled regression kriging and RFSI) using 63,927 daily precipitation observations from 87 GHCN-stations located in Catalonia. Maximum and minimum daily temperatures and IMERG precipitation estimates (derived from the GPM mission) were used as environmental covariates for all four models. Results based on 5-fold cross validation showed that RFSI (R-square 69.4%, RMSE 3.8 mm) significantly outperformed all random forest (R-square 50.6%, RMSE 3.8 mm), RFsp (R-square 55.5%, RMSE 4.6 mm) and pooled regression kriging (R-square 65.3%, RMSE 4.0 mm). Finetuning RFSI could potentially improve prediction accuracy even more. In addition to improved prediction accuracy, RFSI has the advantage that it uses much fewer spatial covariates than RFsp.

How to cite: Sekulic, A., Kilibarda, M., Heuvelink, G. B. M., Nikolić, M., and Bajat, B.: Spatial interpolation of daily precipitation using random forest, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-972, https://doi.org/10.5194/egusphere-egu2020-972, 2019

D97 |
EGU2020-537<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Is there a 'right' spatial scale? Improving pedological multi- scale modelling by optimizing input data grain size: A Case Study using Average Local Variance.
(withdrawn)
Christopher Scarpone, Anders Knudby, Stephanie Melles, and Andrew Millward
D98 |
EGU2020-1355<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Stephanie Thiesen, Diego Vieira, Mirko Mälicke, Florian Wellmann, and Uwe Ehret

Interpolation of spatial data has been considered in many different forms. This study proposes a stochastic, non-parametric, geostatistical estimator that combines measures of information theory with probability aggregation method. Histogram via entropy reduction (HER) can be used to analyze the data spatial correlation and for predicting distributions at unobserved locations directly based on empirical probability. The method minimizes estimation uncertainty, relaxes normality assumptions and therefore avoids the risk of adding information not available in data (or losing available information). In particular, the applied probability aggregation method provides a proper framework for uncertainty estimation that reflects both the spatial configuration of the data as well as data values, while allowing to infer (or introduce) physical properties (continuous or discontinuous characteristics) from the field under study. Three different aggregation methods were explored in terms of uncertainty, resulting in predictions ranging from conservative to more confident ones. We investigate the performance of the framework using four synthetically generated datasets from known Gaussian processes and demonstrate the efficacy of the method in ascertaining the underlying true field with varying sample sizes. By comparing the method performance to popular benchmark models, namely nearest neighbors (NN), inverse distance weighting (IDW) and ordinary kriging (OK), we were able to obtain competitive results with respect to OK, with the advantage of presenting generalization properties. The novel method brings a new perspective of spatial and uncertainty analysis to geostatistics and statistical learning, using the lens of information theory.

How to cite: Thiesen, S., Vieira, D., Mälicke, M., Wellmann, F., and Ehret, U.: HER: an information theoretic alternative for geostatistics, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1355, https://doi.org/10.5194/egusphere-egu2020-1355, 2019

D99 |
EGU2020-6551<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Sebastian Reuschen, Teng Xu, and Wolfgang Nowak

Geostatistical inversion methods estimate the spatial distribution of heterogeneous soil properties (here: hydraulic conductivity) from indirect information (here: piezometric heads). Bayesian inversion is a specific approach, where prior assumptions (or prior models) are combined with indirect measurements to predict soil parameters and their uncertainty in form of a posterior parameter distribution. Posterior distributions depend heavily on prior models, as prior models describe the spatial structure of heterogeneity. The most common prior is the stationary multi-Gaussian model, which expresses that close-by points are more correlated than distant points. This is a good assumption for single-facies systems. For multi-facies systems, multiple-point geostatistical (MPS) methods are widely used. However, these typically only distinguish between several facies and do not represent the internal heterogeneity inside each facies.

We combine these two approaches to a joint hierarchical model, which results in a multi-facies system with internal heterogeneity in each facies. Using this model, we propose a tailored Gibbs sampler, a kind of Markov Chain Monte Carlo (MCMC) method, to perform Bayesian inversion and sample from the resulting posterior parameter distribution. We test our method on a synthetic channelized flow scenario for different levels of data available: A highly informative setting (with many measurements) where we recover the synthetic truth with relatively small uncertainty invervals, and a weakly informative setting (with only a few measurements) where the synthetic truth cannot be recovered that clearly. Instead, we obtain a multi-modal posterior. We investigate the multi-modal posterior using a clustering algorithm. Clustering algorithms are a common machine learning approach to find structures in large data sets. Using this approach, we can split the multi-modal posterior into its modes and can assign probabilities to each mode. A visualization of this clustering and the according probabilities enables researchers and engineers to intuitively understand complex parameter distributions and their uncertainties.

How to cite: Reuschen, S., Xu, T., and Nowak, W.: Bayesian inversion and visualization of hierarchical geostatistical models, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6551, https://doi.org/10.5194/egusphere-egu2020-6551, 2020

D100 |
EGU2020-2954<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"></span>
Dionissios Hristopulos, Vasiliki Agou, Andreas Pavlides, and Panagiota Gkafa

We present recent advances related to Stochastic Local Interaction (SLI) models. These probabilistic models capture local correlations by means of suitably constructed precision matrices which are inferred from the available data. SLI models share features with Gaussian Markov random fields, and they can be used to complete spatial and spatiotemporal datasets with missing data.  SLI models are applicable to data sampled on both regular and irregular space-time grids.  The SLI models can also incorporate space-time trend functions. The degree of localization provided by SLI models is determined by means of kernel functions and appropriate bandwidths that adaptively determine local neighborhoods around each point of interest (including points in the sampling set and the map grid). The local neighborhoods lead to sparse precision (inverse covariance) matrices and also to explicit, semi-analytical relations for predictions, which are based on the conditional mean and the conditional variance.

We focus on a simple SLI model whose parameter set involves amplitude and rigidity coefficients as well as a characteristic length scale. The SLI precision matrix is expressed explicitly in terms of the model parameter and the kernel function. The parameter estimation is based on the method of maximum likelihood estimation (MLE). However, covariance matrix inversion is not required, since the precision matrix is known conditionally on the model parameters. In addition, the calculation of the precision matrix determinant can be efficiently performed computationally given the sparsity of the precision matrix.  Typical values of the sparsity index obtained by analyzing various environmental datasets are less than 1%. 

We discuss the results of SLI predictive performance with both real and simulated data sets. We find that in terms of cross validation measures the performance of the method is similar to ordinary kriging while the computations are faster.  Overall, the SLI model takes advantage of sparse precision matrix structure to reduce the computational memory and time required for the processing of large spatiotemporal datasets.  

 

References

  1. D. T. Hristopulos. Stochastic local interaction (SLI) model: Bridging machine learning and geostatistics. Computers and Geosciences, 85(Part B):26–37, December 2015. doi:10.1016/j.cageo.2015.05.018.
  2. D. T. Hristopulos and V. D. Agou. Stochastic local interaction model for space-time data. Spatial Statistics, page 100403, 2019. doi:10.1016/j.spasta.2019.100403.
  3. D. T. Hristopulos, A. Pavlides, V. D. Agou, P. Gkafa. Stochastic local interaction model for geostatistical analysis of big spatial datasets, 2019. arXiv:2001.02246

How to cite: Hristopulos, D., Agou, V., Pavlides, A., and Gkafa, P.: Stochastic Local Interaction Models for Processing Spatiotemporal Datasets, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2954, https://doi.org/10.5194/egusphere-egu2020-2954, 2020

D101 |
EGU2020-6665<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Emmanouil A Varouchakis and George P Karatzas

In geostatistical analysis a Bayesian approach has more advantages over classical methods since it allows to deal with the parameters and the uncertainty in the model. Spatiotemporal geostatistical modelling can be performed by using the Gaussian process regression method under a Bayesian framework. In a Bayesian approach the overall uncertainty can be represented by a probability distribution. In this work the groundwater level spatiotemporal variability was assessed based on a ten years’ time series of biannual average data from an extensive network of wells in the island of Crete, Greece. The Gaussian process regression method was employed to produce reliable maps of groundwater level variability and to identify groundwater level patterns for the island of Crete. Thus, this work could help to detect areas where interventions of groundwater management are necessary considering the associated uncertainty.

How to cite: Varouchakis, E. A. and Karatzas, G. P.: Gaussian process regression for spatiotemporal analysis of groundwater level variations., EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6665, https://doi.org/10.5194/egusphere-egu2020-6665, 2020

D102 |
EGU2020-695<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Hélio Jhunior, João Martins, and Edson Wendland

Performing flow characterization and quantification inside fractured aquifers has been a great challenge faced by hydrogeologists, mainly due to technological limitations of well-established techniques to accurately measure the fracture geometry, like resin casting technique and profilometers. X-ray microtomography (micro-CT) is a non-intrusive option that ensures an interior detailing of solid objects through 3D images witch accuracy of a dozen microns. However, the size of the fractured basaltic rocks samples that can be analyzed is around 2 inches. The use of statistical methods can increase the representativeness of the data obtained by micro-CT, highlighting the application of the Multi-point Geostatistics methods (MPS). The MPS allows a characterization and reproduction of curvilinear heterogeneity patterns of a physical phenomenon, considering the spatial relation between finite points from a conceptual model called training image (TI). In this research, we evaluated the potentiality of multiple-point geostatistics technique to characterize and reproduce the random patterns of distribution of aperture values existing in a given fracture plane using a 3D micro-CT images of a basaltic sample as TI. This evaluation can help the accuracy and representativity of models that seek to simulate the flow in fractured media. Two MPS algorithms were used: The Direct Sampling-DS, a Pixel-Based method, adapted from Mariethoz (2009), and the Multi-Scale Cross Correlation-based Simulation-MS CCSIM, a Pattern-Based algorithm, based in the work of Tahmasebi, Sahimi, and Caers (2014). The TI used was obtained from a fractured plan of a basalt sample witch dimensions of 2.6 cm in length and 2.2 cm in diameter, taken from an outcrop area of the Guarani aquifer, in São Paulo, Brazil. The aperture values ranging from 0 to 500 μm. Initially, analyzes were made to identify the importance and the susceptibility of parameters/factors that govern the performance in both algorithms. The number of repetitions was 10 for each combination of values of the factors used. For the best configuration of these parameters, the DS results showed better spatial connectivity of the structures and channels existing in the fracture plane, through which the flow can occur, regarding the randomness of the aperture values and the distribution pattern found in the TI. The images reproduced by MS CCSIM, in contrast, tended to copy certain regions of TI to most of the combinations of parameters used. On the other hand, in terms of the computational effort required, the DS underperformed MS CCIM. Comparing their global statistics with those of the TI, both presented similar representativeness of the aperture values. A preference for the DS algorithm is made and recommended for TI’s with similar characteristics. However, for images with different features, sensitivity analysis should be performed. A second quality analysis of the reproductions obtained by DS was then performed, considering the use of conditional data taken from the TI, which were point conditionals and pixel groups. The DS showed a great ability to reconstruct the images from these conditional data, maintaining the randomness of the aperture values, the connectivity of both global and local structures, without a tendency to copy the TI.

How to cite: Jhunior, H., Martins, J., and Wendland, E.: Characterization and reproduction of the aperture distribution patterns in a basaltic fracture plane by Multi-point Geostatistics algorithms, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-695, https://doi.org/10.5194/egusphere-egu2020-695, 2019

D103 |
EGU2020-2068<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"></span>
Aojie Shen, Yanchen Bo, and Duoduo Hu

Scientific research of land surface dynamics in heterogeneous landscapes often require remote sensing data with high resolutions in both space and time. However, single sensor could not provide such data at both high resolutions. In addition, because of cloud pollution, images are often incomplete. Spatiotemporal data fusion methods is a feasible solution for the aforementioned data problem. However, for existing data fusion methods, it is difficult to address the problem constructed regular and cloud-free dense time-series images with high spatial resolution. To address these limitations of current spatiotemporal data fusion methods, in this paper, we presented a novel data fusion method for fusing multi-source satellite data to generate s a high-resolution, regular and cloud-free time series of satellite images.

We incorporates geostatistical theory into the fusion method, and takes the pixel value as a random variable which is composed of trend and a zero-mean second-order stationary residual. To fuse satellite images, we use the coarse-resolution image with high frequency observation to capture the trend in time, and use Kriging interpolation to obtain the residual in fine-resolution scale to provide the informative spatial information. In this paper, in order to avoid the smoothing effect caused by spatial interpolation, Kriging interpolation is performed only in time dimension. For certain region, the temporal correlation between pixels is fixed after the data reach stationary. So for getting the weight in temporal Kriging interpolation, we can use the residuals obtained from coarse-resolution images to construct the temporal covariance model. The predicted fine-resolution image can be obtained by returning the trend value of pixel to their own residual until the each pixel value was obtained. The advantage of the algorithm is to accurately predict fine-resolution images in heterogeneous areas by integrating all available information in the time-series images with fine spatial resolution.  

We tested our method to fuse NDVI of MODIS and Landsat at Bahia State where has heterogeneous landscape, and generated 8-day time series of NDVI for the whole year of 2016 at 30m resolution. By cross-validation, the average R2 and RMSE between NDVI from fused images and from observed images can reach 95% and 0.0411, respectively. In addition, experiments demonstrated that our method also can capture correct texture patterns. These promising results demonstrated this novel method can provide effective means to construct regular and cloud-free time series with high spatiotemporal resolution. Theoretically, the method can predict the fine-resolution data required on any given day. Such a capability is helpful for monitoring near-real-time land surface and ecological dynamics at the high-resolution scales most relevant to human activities.

 

How to cite: Shen, A., Bo, Y., and Hu, D.: A Spatiotemporal data Fusion method for generating a high-resolution, regular and cloud-free time series of satellite images, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-2068, https://doi.org/10.5194/egusphere-egu2020-2068, 2020

D104 |
EGU2020-6678<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Mirko Mälicke

Geostatistical and spatio-temporal methods and applications have made major advances during the past decades. New data sources became available and more powerful and available computer systems fostered the development of more sophisticated analysis frameworks. However, the building blocks for these developments, geostatistical packages available in a multitude of programming languages, have not experienced the same attention. Although there are some examples, like the gstat package available for the R programming language, that are used as a de-facto standard for geostatistical analysis, many languages are still missing such implementations. During the past decade, the Python programming language has gained a lot of visibility and became an integral part of many geoscientist’s tool belts. Unfortunately, Python is missing a standard library for geostatistics. This leads to a new technical implementation of geostatistical methods with almost any new publication that uses Python. Thus, reproducing results and reusing codes is often cumbersome and can be error-prone.

During the past three years I developed scikit-gstat, a scipy flavored geostatistical toolbox written in Python to tackle these challenges. Scipy flavored means, that it uses classes, interfaces and implementation rules from the very popular scipy package for scientific Python, to make scikit-gstat fit into existing analysis workflows as seamlessly as possible. Scikit-gstat is open source and hosted on Github. It is well documented and well covered by unit tests. The tutorials made available along with the code are styled as lecture notes and are open to everyone. The package is extensible, to make it as easy as possible for other researchers to build new models on top, even without experience in Python. Additionally, scikit-gstat has an interface to the scikit-learn package, which makes it usable in existing data analysis workflows that involve machine learning. During the development of scikit-gstat a few other geostatistical packages evolved, namely pykrige for Kriging and gstools mainly for geostatistical simulations and random field generations. Due to overlap and to reduce development efforts, the author has made effort to implement interfaces to these libraries. This way, scikit-gstat understands other developments not as competing solutions, but as parts of an evolving geostatistical framework in Python that should be more streamlined in the future.

How to cite: Mälicke, M.: SciKit-GStat: A scipy flavored geostatistical analysis toolbox written in Python, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-6678, https://doi.org/10.5194/egusphere-egu2020-6678, 2020

D105 |
EGU2020-7830<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"></span>
Carolina Guardiola-Albert, Nuria Naranjo-Fernández, Héctor Aguilera, and Esperanza Montero-González

Nowadays, the application of time series clustering is increasing in hydrogeology works. Groundwater level long data series provides a useful record to identify different hydrological behaviors and to validate the conceptual model of groundwater flow in aquifer systems. Piezometers also register the response to any changes that directly affect the amount of available groundwater resources (recharge or exploitation).

What are the expected variations of groundwater levels in an aquifer under high exploitation pressure? In this work, groundwater level time series from 160 piezometers in the hydrological years from 1975 to 2016 were analyzed. Especially, 24 piezometers are deeply studied. Data were preprocessed and transformed: selection of points, missing data imputation and data standardization. Visual clustering, k-means clustering and time series clustering were applied to classify groundwater level hydrographs using the available database. Six and seven groups of piezometers were identified to be associated with the different hydrofacies and extraction rates. Time series clustering was found to be the best method to analyze the studied piezometric database. Moreover, it was possible to characterize actual hydrodynamics, which will be useful for groundwater managers to make sustainable decisions.

How to cite: Guardiola-Albert, C., Naranjo-Fernández, N., Aguilera, H., and Montero-González, E.: Identifying anthropogenic effects into Doñana aquifer (SW Spain) through hydrogram clustering of piezometric database, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7830, https://doi.org/10.5194/egusphere-egu2020-7830, 2020

D106 |
EGU2020-7952<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Nuria Naranjo-Fernández, Carolina Guardiola-Albert, Héctor Aguilera, Ana Fernandez-Ayuso, and Esperanza Montero-González

Groundwater is the main water source for irrigation in arid and semi-arid areas. Unfortunately, it has been proven very difficult to prevent unauthorized extractions. The present work studies the application of wavelet analysis to detect and quantify the unfavorable effects of these extractions on the piezometry.

Wavelets have been widely applied for hydrologic time series analysis since the 1990s, with increasing popularity in recent years. This method can be applied to hydrologic series to reveal complex hydrological processes and evaluate complex latent factors, such as seasonal crop irrigation, controlling groundwater level fluctuations.

Records of the piezometric level from more than 150 piezometers were studied from 1975 to 2016 in the Almonte-Marismas aquifer (SW Spain). The majority of these time series presented periodicities between 11-12 months, which corresponded to hydrological cycles of recharge and discharge. Nevertheless, in some areas close to crop fields, periodicities of 2-3 and 4-6 months have been detected. In these cases, wavelet analysis could be used as a tool to prevent damage in areas in need of deeper legal control.

How to cite: Naranjo-Fernández, N., Guardiola-Albert, C., Aguilera, H., Fernandez-Ayuso, A., and Montero-González, E.: Detecting groundwater anthropogenic extraction with cyclicity results of wavelet models, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7952, https://doi.org/10.5194/egusphere-egu2020-7952, 2020

D107 |
EGU2020-10104<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Sebastian Müller, Lennart Schüler, Alraune Zech, Sabine Attinger, and Falk Heße

Geo-scientific model development is lacking comprehensive open source tools, that are providing state-of-the art geo-statistic methods. To bridge this gap, we developed a geo-statistical toolbox named GSTools, which is a Python package providing an abundance of methods in a modern object oriented approach. Covered use-cases are:

  • covariance models (many readily provided and even user-defined models with a lot of functionality)

  • random field generation (multigaussian and in-compressible vector fields)

  • field transformations (boxcox, Zinn and Harvey, log-normal, binary)

  • kriging (simple, ordinary, universal, external drift or detrended)

  • variogram estimation (Cressie and Matheron estimators)

  • I/O routines (interfaces to pyvista and meshio for mesh support)

  • plotting routines (inspect your covariance model or random field on the fly)

GSTools is developed openly within a GitHub organization (https://github.com/GeoStat-Framework). On the one hand to be able to respond to the needs of the modeling community and integrate suggested functionalities and contributions, on the other hand to guarantee stability and reliability of the code-base through continuous-integration features provided by the GitHub infrastructure.

We will present several applications of the mentioned routines to demonstrate the interface and capabilities of GSTools.

How to cite: Müller, S., Schüler, L., Zech, A., Attinger, S., and Heße, F.: GSTools: The Python toolbox for your geo-statistical project!, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10104, https://doi.org/10.5194/egusphere-egu2020-10104, 2020

D108 |
EGU2020-13386<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"></span>
Steven R. Fassnacht, Antonio-Juan Collados-Lara, Eulogio Pardo-Igúzquiza, and David Pulido-Velazquez

Each experimental data measured by an instrument has an associated spatial (and temporal) support to which the measurement is assigned. In this sense a logger provides the temperature at a particular spatial location and has a point-support while satellite derived temperatures have an areal support equal to the size of the pixel of the satellite image (i.e. the spatial resolution of the image). Thus, when combining or merging both types of measurement, their support must be taken into account. In fact, in nature there is a continuous temperature field that is only accessible from empirical data with its associated support. In this work three sources of data have been considered to model the variability of temperature at two scales in the Southern Rocky Mountains across the northern Front Range of Colorado (NFRC). The coarse scale uses the NRCS SNOTEL stations across the NFRC and the fine scale uses iButton sensors at the Colorado State University Mountain Campus (CSUMC) located within the NFRC. The MODIS-based land surface temperature (LST), which has a spatial resolution of about 1 km, has been considered for both scales. The SNOTEL stations and the iButton sensors have a point support while satellite LST has an areal support. The main goal of this work is to assess the variability of the temperature field at both scales, taking into account the support effect of each set of experimental data, by using a geostatistical approach.

This research has been partially supported by the SIGLO-AN project from the Spanish Ministry of Science, Innovation and Universities (Programa Estatal de I+D+I orientada a los Retos de la Sociedad).

How to cite: Fassnacht, S. R., Collados-Lara, A.-J., Pardo-Igúzquiza, E., and Pulido-Velazquez, D.: Accounting for the spatial support-effect on modelling a temperature field from different sources of experimental data, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13386, https://doi.org/10.5194/egusphere-egu2020-13386, 2020

D109 |
EGU2020-13655<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Thea Roksvåg, Ingelin Steinsland, and Kolbjørn Engeland

Conceptual hydrological models are process-based models that are used to simulate flow indices based on physical or empirical relationships and input variables like precipitation, temperature and land use. For many applications the goal is to use the process-based model to construct a gridded map of the flow index of interest, e.g. for mean annual runoff. However, one challenge is that the resulting runoff map does not necessarily fit to the actually observed streamflow data when the grid nodes are aggregated to catchment areas. A solution to this problem is to correct the gridded hydrological product afterwards relative to the actually observed streamflow in areas where we have measurements. In this work, we explore different Bayesian geostatistical tools that can contribute to this correction. We suggest a model where the  observed streamflow is used as a response variable and the gridded hydrological product is used as a covariate. In particular, a geostatistical model with a spatially varying coefficient (SVC) is suggested, and we develop a linear relationship between the response and the covariate that is allowed to vary in the study area. This is achieved by modeling the regression coefficient as a Gaussian random field (GRF) that defines the spatial pattern of the linear relationship. We also test two simpler geostatistical models, and investigate how short records of runoff can be included in the correction procedure. 

The geostatistical models are tested by correcting a gridded mean annual runoff product from the HBV model relative to the observed  mean annual runoff. We use data from around 400 catchments in Norway from 1981-2010. The results show that all three geostatistical methods lead to a considerably better fit between the corrected product and the actually observed streamflow for the gauged catchments, which was our main goal. In addition, we also obtain improved predictions for many of the ungauged catchments in Norway.

How to cite: Roksvåg, T., Steinsland, I., and Engeland, K.: Using Bayesian geostatistical models to correct gridded hydrological products relative to the actually observed streamflow, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-13655, https://doi.org/10.5194/egusphere-egu2020-13655, 2020

D110 |
EGU2020-15098<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Leticia Baena-Ruiz, Antonio-Juan Collados-Lara, Eulogio Pardo-Igúzquiza, and David Pulido-Velazquez

Wind plays a key role in different processes of the earth system such as the earth's energy and water cycles. The use of the wind to produce clean energy as a substitute of other traditional systems may help to reduce the emission, and, therefore, to mitigate climate change. Wind is defined by two variables, direction and speed. This work is focused on the assessment of the second one. The aim is to estimate wind speed at ten meters (U10) fields in the province of Granada (Southern Spain). A grid with a spatial resolution of 300 m and an hourly temporal resolution has been adopted to estimate it for the period 1986 to 2016. Different geostatistical estimation approaches (ordinary kriging, kriging with external drift, regression and regression kriging) have been evaluated considering a monthly variogram model. Elevation showed a good correlation with wind speed and has been used as secondary variable for the external drift and the regression approaches. We have also tested mesoscale (U80) and logarithm transformations of U10 for each of the geostatistical techniques. The performance of each transformation and geostatistical approach was assessed using a cross validation experiment. In general, geostatistical techniques that takes into account elevation as secondary information and approaches without transformation of data showed better accuracy. The regression kriging without transformation showed the lower mean error and mean squared error (0.03 m s-1 and 3.46 [m s-1]2 respectively) for the considered period but other approaches such as kriging with external drift showed similar results (0.04 m s-1 and 3.52 [m s-1]2 respectively).

This research has been partially supported by the SIGLO-AN project from the Spanish Ministry of Science, Innovation and Universities (Programa Estatal de I+D+I orientada a los Retos de la Sociedad).

How to cite: Baena-Ruiz, L., Collados-Lara, A.-J., Pardo-Igúzquiza, E., and Pulido-Velazquez, D.: Comparison of different geostatistical approaches to estimate wind speed at hourly scale in the province of Granada (Southern Spain), EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-15098, https://doi.org/10.5194/egusphere-egu2020-15098, 2020

D111 |
EGU2020-20008<span style="font-size: .8em!important; font-weight: bold; vertical-align: super; color: green!important;"><span title="Early career scientist: an ECS is an undergraduate or postgraduate (Masters/PhD) student or a scientist who has received their highest degree (BSc, MSc, or PhD) within the past seven years. Provided parental leave fell into that period, up to one year of parental leave time may be added per child, where appropriate.">ECS</span></span>
Vaida Suslovaite, James Shucksmith, and Vanessa Speight

Diffuse pollution resulting from rainfall runoff processes is known to adversely affect surface water quality, including in areas where surface water is used for drinking water supply. Designing and implementing targeted mitigation measures to reduce peak concentrations of specific contaminants such as pesticides is challenging due to the spatial and temporal variability of rainfall-runoff processes. Receiving water pollutant concentrations are a function of rainfall processes, catchment characteristics, receiving water conditions and the locations of pollution sources (i.e. spatial distribution of ‘high risk’ land use types). Past work has developed a validated, travel time based, physically distributed model used to predict metaldehyde levels after a rainfall event accounting for variations in rainfall and distribution of land use. However, targeted field scale mitigation measures require an understanding of how different land use distributions affect pollutant concentrations in river water over a representative number of rainfall events. 

In this study, an inverse modelling approach is adopted in which the metaldehyde model is used in conjunction with spatial and temporal distributions of rainfall data spanning over a number of years. Genetic algorithm(GA) technique is used to carry out land use optimisation. This technique can be used to determine distributions of land use that minimises the total number of predicted hours that metaldehyde levels exceed the EU and UK threshold of 0.1 μg L−1 for pesticides in drinking water. The approach can also be used to show how the removal of specific high risk fields will affect metaldehyde concentrations as well as rank and prioritise specific catchment areas. This can be used to inform catchment management groups of the most effective locations for the implementation of mitigation measures.

How to cite: Suslovaite, V., Shucksmith, J., and Speight, V.: Catchment scale land use optimisation using genetic algorithm to mitigate acute diffuse pollution , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-20008, https://doi.org/10.5194/egusphere-egu2020-20008, 2020