ESSI1.3 | Strategies and Applications of AI and ML in a Spatiotemporal Context
EDI PICO
Strategies and Applications of AI and ML in a Spatiotemporal Context
Co-organized by GI3
Convener: Christopher KadowECSECS | Co-conveners: Hanna Meyer, Jens Klump, Ge Peng
PICO
| Mon, 15 Apr, 16:15–18:00 (CEST)
 
PICO spot 4
Mon, 16:15
Modern challenges of climate change, disaster management, public health and safety, resources management, and logistics can only be addressed through big data analytics. A variety of modern technologies are generating massive volumes of conventional and non-conventional geospatial data at local and global scales. Most of this data includes geospatial data components and is analysed using spatial algorithms. Ignoring the geospatial component of big data can lead to an inappropriate interpretation of extracted information. This gap has been recognised and led to the development of new spatiotemporally aware strategies and methods.

This session discusses advances in spatiotemporal machine learning methods and the software and infrastructures to support them.

PICO: Mon, 15 Apr | PICO spot 4

Chairpersons: Christopher Kadow, Jens Klump
16:15–16:20
16:20–16:22
|
PICO4.1
|
EGU24-1198
|
ECS
|
On-site presentation
Bilal Aslam, Toby Hocking, Pawlok Dass, Anna Kato, and Kevin Gurney

As cities grow and more cars occupying the roads, greenhouse gas emissions and air pollution in urban areas are going up. To better understand the emissions and pollutions, and help effective urban environmental mitigation, an accurate estimation of traffic volume is crucial. This study delves into the application of Hybrid Machine Learning models to estimate and predict traffic volume by utilizing satellite data and other datasets in both the USA and Europe. The research investigates the predictive capabilities of machine learning models employing freely accessible global datasets, including Sentinel 2, Night-time light data, population, and road density. Neural Network, nearest neighbours, random forest and XGBoost regression models were employed for traffic volume prediction, and their accuracy was enhanced using a hyperparameter-tuned K-Fold Cross-validation technique. Model accuracy, evaluated through Mean Percentage Error (MPE%) and R-square, revealed that XGBoost Regression model yielding an R2 accuracy of 0.81 and MPE of 13%. The low error (and therefore high accuracy) as well as the model's versatility allows its application worldwide for traffic volume computation utilizing readily available datasets. Machine learning models, particularly the XGBoost Regression model, prove valuable for on-road traffic volume prediction, offering a dataset applicable to town planning, urban transportation, and combating urban air pollution.

How to cite: Aslam, B., Hocking, T., Dass, P., Kato, A., and Gurney, K.: Satellite-Driven Traffic Volume Estimation: Harnessing Hybrid Machine Learning for Sustainable Urban Planning and Pollution Control, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1198, https://doi.org/10.5194/egusphere-egu24-1198, 2024.

16:22–16:24
|
PICO4.2
|
EGU24-2021
|
ECS
|
On-site presentation
Zhicheng Zhang, Zhenhua Xiong, Xiaoyu Pan, and Qinchuan Xin

Long-term satellite-based imagery provides fundamental data support for identifying and analyzing land surface dynamics. Although moderate-spatial-resolution data, like the Moderate Resolution Imaging Spectroradiometer (MODIS), were widely used for large-scale regional studies, their limited availability before 2000 restricts their usage in long-term investigations. To reconstruct retrospective MODIS-like data, this study proposes a novel deep learning-based model, named the Land-Cover-assisted SpatioTemporal Fusion model (LCSTF). LCSTF leverages medium-grained spatial class features from Landcover300m and temporal seasonal fluctuations from the Global Inventory Modelling and Mapping Studies (GIMMS) NDVI3g time series data to generate 500-meter MODIS-like data from 1992 to 2010 over the continental United States. The model also implements the Long Short-Term Memory (LSTM) sensor-bias correction method to mitigate systematic differences between sensors. Validation against actual MODIS images confirms the model’s ability to produce accurate MODIS-like data. Additionally, when assessed with Landsat data prior to 2000, the model demonstrates excellent performance in reconstructing retrospective data. The developed model and the reconstructed biweekly MODIS-like dataset offer significant potential for extending the temporal coverage of moderate-spatial-resolution data, enabling comprehensive long-term and large-scale studies of land surface dynamics.

How to cite: Zhang, Z., Xiong, Z., Pan, X., and Xin, Q.: Developing a land-cover-assisted spatiotemporal fusion model for producing pre-2000 MODIS-like data over the continental United States, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2021, https://doi.org/10.5194/egusphere-egu24-2021, 2024.

16:24–16:26
|
PICO4.3
|
EGU24-4445
|
On-site presentation
|
Peter Pavlík, Martin Výboh, Anna Bou Ezzeddine, and Viera Rozinajová

The task of precipitation nowcasting is often perceived as a computer vision problem. It is analogous to next frame video prediction - i.e. processing consecutive radar precipitation map frames and predicting the future ones. This makes convolutional neural networks (CNNs) a great fit for this task. In the recent years, the CNNs have become the de-facto state-of-the-art model for precipitation nowcasts.

However, a pure machine learning model has difficulties to capture accurately the underlying patterns in the data. Since the data behaves according to the known physical laws, we can incorporate this knowledge to train more accurate and trustworthy models.

We present a double U-Net model, combining a continuity-constrained Lagrangian persistence U-Net with an advection-free U-Net dedicated to capturing the precipitation growth and decay. In contrast to previous works, the combined model is fully differentiable, allowing us to fine-tune these models together in a data-driven way. We examine the learned Lagrangian mappings, along with a thorough quantitative and qualitative evaluation. The results of the evaluation will be provided in the presentation.

How to cite: Pavlík, P., Výboh, M., Bou Ezzeddine, A., and Rozinajová, V.: Fully Differentiable Physics-informed Lagrangian Convolutional Neural Network for Precipitation Nowcasting, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4445, https://doi.org/10.5194/egusphere-egu24-4445, 2024.

16:26–16:28
|
PICO4.4
|
EGU24-10278
|
ECS
|
On-site presentation
Fabian Schumacher, Christian Knoth, Marvin Ludwig, and Hanna Meyer

Machine learning is frequently used in the field of earth and environmental sciences to produce spatial or spatio-temporal predictions of environmental variables based on limited field samples - increasingly even on a global scale and far beyond the extent of available training data. Since new geographic space often goes along with new environmental properties, the spatial applicability and transferability of models is often questionable. Predictions should be constrained to environments that exhibit properties the model has been enabled to learn.

Meyer and Pebesma (2021) have made a first proposal to estimate the area of applicability (AOA) of spatial prediction models. Their method is based on distances - in the predictor space - of the prediction data point to the nearest reference data point to derive a dissimilarity Index (DI). Prediction locations with a DI larger than DI values observed through cross-validation during model training are considered outside of the AOA. As a consequence, the AOA is defined as the area where the model has been enabled to learn about relationships between predictors and target variables and where, on average, the cross-validation performance applies. The method, however, is only based on the distance - in the predictor space - to the nearest reference data point. Hence, a single data point in an environment may define a model as “applicable” in this environment. Here we suggest extending this approach by considering the densitiy of reference data points in the predictor space, as we assume that this is highly decisive for the prediction quality.

We suggest extending the methodology with a newly developed local data point density (LPD) approach based on the given concepts of the original method to allow for a better assessment of the applicability of a model. The LPD is a quantitative measure for a new data point that indicates how many similar (in terms of predictor values) reference data points have been included in the model training, assuming a positive relationship between LPD values and prediction performance. A reference data point is considered similar if it defines a new data point as being within the AOA, i.e. the model is considered applicable for the corresponding prediction location. We implemented the LPD approach in the R package CAST. Here we explain the method and show its applicability in simulation studies as well as real-world applications.

Reference:

Meyer, H; Pebesma, E. 2021. ‘Predicting into unknown space? Estimating the area of applicability of spatial prediction models.’ Methods in Ecology and Evolution 12: 1620–1633. doi: 10.1111/2041-210X.13650.

How to cite: Schumacher, F., Knoth, C., Ludwig, M., and Meyer, H.: Assessing the area of applicability of spatial prediction models through a local data point density approach, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10278, https://doi.org/10.5194/egusphere-egu24-10278, 2024.

16:28–16:30
|
PICO4.5
|
EGU24-13452
|
ECS
|
On-site presentation
Kin To Wong

With the rapid growth in global trade, the demand for efficient route planning and resource utilization in logistics and transportation mirrors the Travelling Salesman Problem (TSP). TSP refers to finding the shortest route possible of N destinations by visiting each destination once and returning to the starting point. Moreover, the computational complexity of TSP increases exponentially with the number of destinations, where finding an exact solution is not practical in larger instance. It has long been a challenging optimization problem, prompting the development of various methodologies to seek for more efficient solution, especially towards metaheuristics in recent research. Therefore, this research proposes an optimization algorithm with the implementation of the Swarm Intelligence-based method for solving TSP, providing an approximate solution. The proposed algorithm is evaluated by comparing its performance in terms of solution quality and computation time to well-known optimization methods, namely the Genetic Algorithm and the Ant Colony Optimization. 47 cities and 50 landmarks in the U.S. are selected as the destinations for two experimental datasets respectively with geospatial data retrieved from Google Maps Platform API. The experiment result suggests that the proposed algorithm has computed a near-optimal solution along with the shortest computation time among the three optimization methods. Solving the TSP efficiently contributes significantly to route planning for transportation and logistics. By shortening the travelling time, optimizing resource utilization, and minimizing fuel and energy consumption, this research further aligns with the global goal of carbon reduction for transportation and logistics systems.

How to cite: Wong, K. T.: Solving the Travelling Salesman Problem for Efficient Route Planning through Swarm Intelligence-Based Optimization, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13452, https://doi.org/10.5194/egusphere-egu24-13452, 2024.

16:30–16:32
|
PICO4.6
|
EGU24-13833
|
ECS
|
On-site presentation
|
Yuting Gong, Huifang Li, and Jie Li

Land surface temperature (LST) is a critical parameter for understanding the physical properties of the boundary between the earth's surface and the atmosphere, and it has a significant impact on various research areas, including agriculture, climate, hydrology, and the environment. However, the thermal infrared band of remote sensing is often hindered by clouds and aerosols, resulting in gaps in LST data products, which hinders the practical application of these products. Therefore, reconstruction of cloud-covered thermal infrared LST is vital for the measurement of physical properties in land surface at regional and global scales. In this paper, a novel reconstruction method for Moderate Resolution Imaging Spectroradiometer (MODIS) LST data with a 1-km spatial resolution is proposed by a spatiotemporal consistency constraint network (STCCN) model fusing reanalysis and thermal infrared data. Firstly, a new spatio-temporal consistency loss function was developed to minimize the discrepancies between the reconstructed LST and the actual LST, by using a non-local reinforced convolutional neural network. Secondly, ERA5 surface net solar radiation (SSR) data was applied as one of the important factors for network inputs, it can characterize the influence of the Sun on surface warming and correct the LST reconstruction results. The experimental results show that (1) the STCCN model can precisely reconstruct cloud-covered LST, the coefficient of determination (R) is 0.8973 and the mean absolute error (MAE) is 0.8070 K; (2) with the introduction of ERA5 SSR data, the MAE of reconstructed LST decreases by 17.15% while the R is kept close, indicating that it is necessary and beneficial to consider the effects of radiation data on LST; (3) the analysis of spatial and temporal adaptability indicates that the proposed method exhibits strong resilience and flexibility in accommodating variations across different spatial and temporal scales, suggesting its potential for effective and reliable application in different scenarios; (4) referring to the SURFRAD station observations, the reconstructed R ranges from 0.8 to 0.9, and MAE ranges from 1 to 3 K, demonstrating the high effectiveness and validity of the proposed model for reconstructing regional cloud-covered LST.

How to cite: Gong, Y., Li, H., and Li, J.: STCCN: A spatiotemporal consistency constraint network for all-weather MODIS LST reconstruction by fusing reanalysis and thermal infrared data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13833, https://doi.org/10.5194/egusphere-egu24-13833, 2024.

16:32–16:34
|
PICO4.7
|
EGU24-19275
|
On-site presentation
Étienne Plésiat, Robert Dunn, Markus Donat, Thomas Ludwig, and Christopher Kadow

The year 2023 represents a significant milestone in climate history: it was indeed confirmed by the Copernicus Climate Change Service (C3S) as the warmest calendar year in global temperature data records since 1850. With a deviation of 1.48ºC from the 1850-1900 pre-industrial level, 2023 largely surpasses 2016, 2019, 2020, previously identified as the warmest years on record. As expected, this sustained warmth leads to an increase in frequency and intensity of Extreme Events (EE) with dramatic environmental and societal consequences.

To assess the evolution of these EE and establish adaptation and mitigation strategies, it is crucial to evaluate the trends of extreme indices (EI). However, the observational climate data that are commonly used for the calculation of these indices frequently contains missing values, resulting in partial and inaccurate EI. As we delve deeper into the past, this issue becomes more pronounced due to the scarcity of historical measurements.

To circumvent the lack of information, we are using a deep learning technique based on a U-Net made of partial convolutional layers [1]. Models are trained with Earth system model data from CMIP6 and has the capability to reconstruct large and irregular regions of missing data using minimal computational resources. This approach has shown its ability to outperform traditional statistical methods such as Kriging by learning intricate patterns in climate data [2].

In this study, we have applied our technique to the reconstruction of gridded land surface EI from an intermediate product of the HadEX3 dataset [3]. This intermediate product is obtained by combining station measurements without interpolation, resulting in numerous missing values that varies in both space and time. These missing values affect significantly the calculation of the long-term linear trend (1901-2018), especially if we consider solely the grid boxes containing values for the whole time period. The trend calculated for the TX90p index that measures the monthly (or annual) frequency of warm days (defined as a percentage of days where daily maximum temperature is above the 90th percentile) is presented for the European continent on the left panel of the figure. It illustrates the resulting amount of missing values indicated by the gray pixels. With our AI method, we have been able to reconstruct the TX90p values for all the time steps and calculate the long-term trend shown on the right panel of the figure. The reconstructed dataset is being prepared for the community in the framework of the H2020 CLINT project [4] for further detection and attribution studies.

[1] Liu G. et al., Lecture Notes in Computer Science, 11215, 19-35 (2018)
[2] Kadow C. et al., Nat. Geosci., 13, 408-413 (2020)
[3] Dunn R. J. H. et al., J. Geophys. Res. Atmos., 125, 1 (2020)
[4] https://climateintelligence.eu/

How to cite: Plésiat, É., Dunn, R., Donat, M., Ludwig, T., and Kadow, C.: Artificial Intelligence Reconstructs Historical Climate Extremes, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19275, https://doi.org/10.5194/egusphere-egu24-19275, 2024.

16:34–16:36
|
EGU24-19394
|
Virtual presentation
Aditya Handur-Kulkarni, Shanay Mehta, Ayush Ghatalia, and Ritu Anilkumar

The northeastern states of India are faced with heavy-precipitation related disasters such as floods and landslides every monsoon. Further, the region's economy is predominantly dependent on agriculture. Thus, accurate prediction of rainfall plays a vital role in the planning and disaster management programs in the region. Existing methods used for rainfall prediction include Automatic Weather Stations that provide real-time rainfall measurements at specific locations. However, these are point-based estimates. For distributed measurements, a satellite-based estimation can be used. While these methods provide vital information on the spatial distribution of precipitation, they face the caveat that they provide only real-time estimates. Numerical weather forecast models are used for encoding forecasting capabilities by simulating the atmosphere's physical processes through data assimilation of observational data from various sources, including weather stations and satellites. However, these models are incredibly complex and require immense computational strength. The veracity of the numerical models is limited by available computing architecture. Recently, a host of data-driven models, including random forest regression, support vector machine regression and deep learning architectures, have been used to provide distributed rainfall forecasts. However, the relative performance of such models in an orographically complex terrain has not been ascertained via a disciplined study. Through this study, we aim to systematically assess the role of convolutional and recurrent neural network architectures in estimating rainfall. We have used rainfall data from the ERA5 Land reanalysis dataset and data from the following additional meteorological variables that can impact rainfall: dew point temperature, skin temperature, amount of solar radiation, wind components, surface pressure and total precipitation. The data aggregated on a daily scale and spanning three decades was selected for this study. We have used the following architectures of neural network algorithms: U-Net architecture modified for regression representing convolutional neural networks and Long Short-Term Memory (LSTM) architecture representing the recurrent neural networks. Various settings of each architecture, such as the number of layers, optimizers and initialization, are validated to assess their performance on rainfall estimation. The developed rainfall estimation models were validated and evaluated using rigorous statistical metrics, such as root mean square error (RMSE) and coefficient of determination (R-squared). The results of this research are expected to provide valuable insights for local governments, farmers, and other stakeholders in the northeastern states of India. Moreover, the study's methodology can be extended to other regions facing similar climate challenges, thus contributing to advancements in the field of rainfall estimation and climate modelling.

How to cite: Handur-Kulkarni, A., Mehta, S., Ghatalia, A., and Anilkumar, R.: Comparing the Role of Spatially and Temporally capable Deep Learning Architectures in Rainfall Estimation: A Case Study over North East India, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19394, https://doi.org/10.5194/egusphere-egu24-19394, 2024.

16:36–16:38
|
EGU24-19531
|
Virtual presentation
Yash Bhisikar, Nirmal Govindaraj, Venkatavihan Devaki, and Ritu Anilkumar

Gradient-Based Optimisers Versus Genetic Algorithms in Deep Learning Architectures:

A Case Study on Rainfall Estimation Over Complex Terrain

 

Yash Bhisikar1*, Nirmal Govindaraj1*, Venkatavihan Devaki2*, Ritu Anilkumar3

1Birla Institute of Technology And Science, Pilani, K K Birla Goa Campus 

2Birla Institute of Technology And Science, Pilani, Pilani Campus 

3North Eastern Space Applications Centre, Department of Space, Umiam

E-mail: f20210483@goa.bits-pilani.ac.in

* Authors have contributed equally to this study.

Rainfall is a crucial factor that affects planning processes at various scales, ranging from agricultural activities at the village or residence level to governmental initiatives in the domains of water resource management, disaster preparedness, and infrastructural planning. Thus, a reliable estimate of rainfall and a systematic assessment of variations in rainfall patterns is the need of the hour. Recently, several studies have attempted to predict rainfall over various locations using deep learning architectures, including but not limited to artificial neural networks, convolutional neural networks, recurrent neural networks, or a combination of these. However, a major challenge in the estimation of rainfall is the chaotic nature of rainfall, especially the interplay of spatio-temporal components over orographically complex terrain. For complex computer vision challenges, studies have suggested that population search-driven optimisation techniques such as genetic algorithms may be used in the optimisation as an alternative to traditional gradient-based techniques such as Adam, Adadelta and SGD. Through this study, we aim to extend this hypothesis to the case of rainfall estimation. We integrate the use of population search-based techniques, namely genetic algorithms, to optimise a convolutional neural network architecture built using PyTorch. We have chosen the study area of North-East India for this study as it receives significant monsoon rainfall and is impacted by the undulating terrain that adds complexity to the rainfall estimation. We have used 30 years of rainfall data from the ERA5 Land daily reanalysis dataset with a spatial resolution of 11,132 m for the months of June, July, August and September. Additionally, datasets of the following meteorological variables that can impact rainfall were utilised as input features: dew point temperature, skin temperature, net incoming short-wave radiation received at the surface, wind components and surface pressure. All the datasets are aggregated to daily time steps. Several configurations of the U-Net architecture, such as the number of hidden layers, initialisation techniques and optimisation algorithms, have been used to identify the best configuration in the estimation of rainfall for North-East India. Genetic algorithms were used in initialisation and optimisation to assess the ability of population search heuristics using the PyGAD library. The developed rainfall prediction models were validated at different time steps (0-day, 1-day, 2-day and 3-day latency) on a 7:1:2 train, validation, test dataset split for evaluation metrics such as root mean square error (RMSE) and coefficient of determination (R-squared). The evaluation was performed on a pixel-by-pixel basis as well as an image-by-image basis in order to take magnitude and spatial correlations into consideration. Our study emphasises the importance of considering alternate optimising functions and hyperparameter tuning approaches for complex earth observation challenges such as rainfall prediction.

How to cite: Bhisikar, Y., Govindaraj, N., Devaki, V., and Anilkumar, R.: Gradient-Based Optimisers Versus Genetic Algorithms in Deep Learning Architectures: A Case Study on Rainfall Estimation Over Complex Terrain, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19531, https://doi.org/10.5194/egusphere-egu24-19531, 2024.

16:38–16:40
|
EGU24-20025
|
ECS
|
Virtual presentation
Weiying Zhao, Alexey Unagaev, and Natalia Efremova

This study introduces an innovative method for vineyard detection by integrating advanced machine learning techniques with high-resolution satellite imagery, particularly focusing on the use of preprocessed multitemporal Sentinel-2 images combined with a Transformer-based model.

We collected a series of Sentinel-2 images over an entire seasonal cycle from eight distinct locations in Oregon, United States, all within similar climatic zones. The training and validation database sizes are 403612 and 100903, respectively. To reduce the cloud effect, we used the monthly median band values derived from initially cloud-filtered images.  The multispectral (12 bands) and multiscale (10m, 20m, and 60m) time series were effective in capturing both the phenological patterns of the land covers and the overall management activities.

The Transformer model, primarily recognized for its successes in natural language processing tasks, was adapted for our time series identification scenario. Then, we transferred the object detection into a binary classification task. Our findings demonstrate that the Transformer model significantly surpasses traditional 1D convolutional neural networks (CNNs) in detecting vineyards across 16 new areas within similar climatic zones, boasting an impressive accuracy of 87.77% and an F1 score of 0.876. In the majority of these new test locations, the accuracy exceeded 92%, except for two areas that experienced significant cloud interference and presented numerous missing values in their time series data. This model proved its capability to differentiate between land covers with similar characteristics during various stages of growth throughout the season. Compared with attention LSTM and BiLSTM, it has less trainable parameters when getting a similar performance. The model was especially adept at handling temporal variations, elucidating the dynamic changes in vineyard phenology over time. This research underscores the potential of combining advanced machine learning techniques with high-resolution satellite imagery for crop type detection and suggests broader applications in land cover classification tasks. Future research will pay more attention to the missing value problem.

How to cite: Zhao, W., Unagaev, A., and Efremova, N.: Vineyard detection from multitemporal Sentinel-2 images with a Transformer model, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20025, https://doi.org/10.5194/egusphere-egu24-20025, 2024.

16:40–16:42
|
EGU24-22165
|
Virtual presentation
Rúben Santos and Rui Quartau

The aim of this study conducted in Tavira - Portugal, is to show the ability to determine depths without relying on in-situ data. To achieve this goal, a model previously trained with depth data and multispectral images from 2018 was used. This model enables depth determination for any period, providing multispectral images.

For this study, Cube satellite images from the PlanetScope constellation with a spatial resolution of 3.0 m and four spectral bands (blue, green, red, and near-infrared) were used. Corrections due to tidal height were obtained through modeled data provided by the Portuguese Hydrographic Institute for the tide gauge of Faro – Olhão. In-situ depths were obtained through the Digital Elevation Model of Reference (MDER) from the Coastal Monitoring Program of Continental Portugal of the Portuguese Environmental Agency.

The model used to determine depths was previously obtained using the Random Forest (RF) algorithm, trained with a set of reflectances from 15 images acquired between August and October 2018 by the PlanetScope constellation, and a set of depths from the MDER, referring to October 2018.

This RF model allowed the depth determination for a set of 7 images from the same constellation, acquired between August and October 2019. The results were corrected for tidal height to obtain all values in relation to the Hydrographic Zero reference. The Savitzky-Golay filter was applied to smooth the results, and then the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm was applied to eliminate outliers. Finally, the median depth value was determined, resulting in a bathymetric surface morphologically similar to the MDER (2019).

This final surface was compared with the 2019 MDER through differences between the two surfaces (residuals) and the respective statistics were calculated (mean, median, standard deviation, and histogram). A vertical profile between 0.0 and 10.0 meters of depth was also generated. The statistical results of the differences reveal a median of 0.5 meters, a mean of 0.7 meters, and a standard deviation of 1.3 meters. The histogram of differences between the two surfaces follows a normal distribution, with its center located at the median value, which is offset from zero.

The results obtained in this study are promising for obtaining depths in coastal regions through multispectral images without the need for in-situ data. However, we are aware that improving the current model is important to reduce the median and standard deviation of the differences between the determined depth and the reference. Enhancing the model will lead to more accurate results, enabling the determination of seasonal variations and changes caused by extreme events or climate alterations without in-situ data.

How to cite: Santos, R. and Quartau, R.: Predicting bathymetry in shallow regions using a machine learning model and a time series of PlanetScope images, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22165, https://doi.org/10.5194/egusphere-egu24-22165, 2024.

16:42–16:44
|
PICO4.8
|
EGU24-16841
|
On-site presentation
Mohammad Alasawedah, Michele Claus, Alexander Jacob, Patrick Griffiths, Jeroen Dries, and Stefaan Lippens

Photovoltaic farms (PV farms) mapping is essential for establishing valid policies regarding natural resources management and clean energy. As evidenced by the recent COP28 summit, where almost 120 global leaders pledged to triple the world’s renewable energy capacity before 2030, it is crucial to make these mapping efforts scalable and reproducible. Recently, there were efforts towards the global mapping of PV farms [1], but these were limited to fixed time periods of the analyzed satellite imagery and not openly reproducible.  Building on this effort, we propose the use of openEO [2] User Defined Processes (UDP) implemented in openEO platform for mapping solar farms using Sentinel-2 imagery, emphasizing the four foundational FAIR data principles: Findability, Accessibility, Interoperability, and Reusability. The UDPs encapsulate the entire workflow including solar farms mapping, starting from data preprocessing and analysis to model training and prediction. The use of openEO UDPs enables easy reuse and parametrization for future PV farms mapping.  

Open-source data is used to construct the training dataset, leveraging OpenStreetMap (OSM) to gather PV farms polygons across different countries. Different filtering techniques are involved in the creation of the training set, in particular land cover and terrain. To ensure model robustness, we leveraged the temporal resolution of Sentinel-2 L2A data and utilized openEO to create a reusable workflow that simplifies the data access in the cloud, allowing the collection of training samples over Europe efficiently. This workflow includes preprocessing steps such as cloud masking, gap filling, outliers filtering as well as feature extraction. Alot of effort is put in the best training samples generation, ensuring an optimal starting point for the subsequent steps. After compiling the training dataset, we conducted a statistical discrimination analysis of different pixel-level models to determine the most effective one. Our goal is to compare time-series machine learning (ML) models like InceptionTime, which uses 3D data as input, with tree-based models like Random Forest (RF), which employs 2D data along with feature engineering. An openEO process graph is then constructed to organize and automate the execution of the inference phase, encapsulating all necessary processes from the preprocessing to the prediction stage. Finally, the process graph is transformed into a reusable UDP that can be reused by others for replicable PV farms mapping, from single farm to country scale. The use of the openEO UDP enables replications of the workflow to map new temporal assessments of PV farms distribution. The UDP process for the PV farms mapping is integrated with the ESA Green Transition Information Factory (GTIF, https://gtif.esa.int/), providing the ability for streamlined and FAIR compliant updates of related energy infrastructure mapping efforts. 

[1] Kruitwagen, L., et al. A global inventory of photovoltaic solar energy generating units. Nature 598, 604–610 (2021). https://doi.org/10.1038/s41586-021-03957-7 

[2] Schramm, M, et al. The openEO API–Harmonising the Use of Earth Observation Cloud Services Using Virtual Data Cube Functionalities. Remote Sens. 2021, 13, 1125. https://doi.org/10.3390/rs13061125 

How to cite: Alasawedah, M., Claus, M., Jacob, A., Griffiths, P., Dries, J., and Lippens, S.: Photovoltaic Farms Mapping using openEO Platform, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16841, https://doi.org/10.5194/egusphere-egu24-16841, 2024.

16:44–16:46
|
PICO4.9
|
EGU24-17458
|
On-site presentation
Mariana Belgiu, Beatrice Kaijage, and Wietske Bijker

The availability of sufficient annotated samples is one of the main challenges of the supervised methods used to classify crop types from remote sensing images. Generating a large number of annotated samples is a time-consuming and expensive task. Active Learning (AL) is one of the solutions that can be used to optimize the sample annotation, resulting in an efficiently trained supervised method with less effort. Unfortunately, most of the developed AL methods do not account for the spatial information inherent in remote-sensing images. We propose a novel spatially-explicit AL that uses a semi-variogram to identify and discard the spatially adjacent and, consequently, redundant samples. It was evaluated using Random Forest (RF) and Sentinel-2 Satellite Image Time Series (SITS) in two study areas from the Netherlands and Belgium. In the Netherlands, the spatially explicit AL selected a total number of 97 samples as being relevant for the classification task which led to an overall accuracy of 80%, while the traditional AL method selected a total number of 169 samples achieving an accuracy of 82%. In Belgium, spatially explicit AL selected 223 samples and obtained an overall accuracy of 60%, compared to the traditional AL that selected 327 samples which yielded an accuracy of 63%. We concluded that the developed AL method helped RF achieve a good performance mostly for the classes consisting of individual crops with a relatively distinctive growth pattern such as sugar beets or cereals. Aggregated classes such as ‘fruits and nuts’ represented, however, a challenge. The proposed AL method reveals that accounting for spatial information is an efficient solution to map target crops since it facilitates high accuracy with a low number of samples and, consequently, lower computational resources and time and financial resources for annotation.

How to cite: Belgiu, M., Kaijage, B., and Bijker, W.: Spatially explicit active learning for crop-type mapping from satellite image time series, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-17458, https://doi.org/10.5194/egusphere-egu24-17458, 2024.

16:46–16:48
|
PICO4.10
|
EGU24-22153
|
On-site presentation
Keltoum Khechba, Mariana Belgiu, Ahmed Laamrani, Qi Dong, Alfred Stein, and Abdelghani Chehbouni

Integration of Machine Learning (ML) with remote sensing data has been successfully used to create detailed agricultural yield maps at both local and global scales. Despite this advancement, a critical issue often overlooked is the presence of spatial autocorrelation in geospatial data used for training and validating ML models. Usually random cross-validation (CV) methods are employed that fail to account for this aspect. This study aimed to assess wheat yield estimations using both random and spatial CV. In contrast to random CV where the data is split randomly, spatial CV involves splitting the data based on spatial locations, to ensure that spatially close data points are grouped together, either entirely in the training or in the test set, but not both. Conducted in Northern Morocco during the 2020-2021 agricultural season, our research uses Sentinel 1 and Sentinel 2 satellite images as input variables as well as 1329 field data locations to estimate wheat yield. Three ML models were employed: Random Forest, XGBoost, and Multiple Linear Regression. Spatial CV was employed across varying spatial scales. The province represents predefined administrative division, while grid2 and grid1 are equally sized spatial blocks, with a spatial resolution of 20x20km and 10x10 km respectively. Our findings show that when estimating yield with Random CV, all models achieve higher accuracies (R² = 0.58 and RMSE = 840 kg ha-1 for the XGBoost model) as compared to the performance reported when using spatial CV. The10x10 km spatial CV led to the highest R² value equal to 0.23 and an RMSE value equal to 1140 kg ha-1 for the XGBoost model, followed by the 20x20 km grid-based strategy (R² = 0.11 and RMSE = 1227 kg ha-1 for the XGBoost model). Province-based spatial CV resulted in the lowest accuracy with an R² value equal to 0.032 and an RMSE value of 1282 kg ha-1. These results confirm that spatial CV is essential in preventing overoptimistic model performance. The study further highlights the importance of selecting an appropriate CV method to ensure realistic and reliable results in wheat yield predictions as increased accuracy can deviate from real-world conditions due to the effects of spatial autocorrelation.  

How to cite: Khechba, K., Belgiu, M., Laamrani, A., Dong, Q., Stein, A., and Chehbouni, A.: Spatial cross-validation of wheat yield estimations using remote sensing and machine learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22153, https://doi.org/10.5194/egusphere-egu24-22153, 2024.

16:48–18:00