Session ESSI1.2

[Programme]

ESSI1.2 | Strategies and applications of AI and ML in a spatial and spatio-temporal context

PICO

Tue, 16:15

PICO

Strategies and applications of AI and ML in a spatial and spatio-temporal context

Convener: Hanna Meyer | Co-conveners: Christopher KadowECSECS, Jens Klump, Ge Peng, Jeremy Rohmer

PICO

| Tue, 29 Apr, 16:15–18:00 (CEST)

PICO spot 2

PICO: Tue, 29 Apr | PICO spot 2

16:15–16:20

5-minute convener introduction

Applications in Climatology

16:20–16:22

PICO2.1

EGU25-16905

ECS

On-site presentation

An AI-Copilot for JupyterLab for climate data analyses using FrevaGPT

Felix Oertel, Etor Lucio Eceiza, Sebastian Willmann, Bianca Wentzel, Martin Bergemann, and Christopher Kadow

JupyterLab is a web-based interactive development platform that is widely used in the Earth science community. Using Jupyter Notebooks, it is possible to perform data analysis tasks, annotate and visualize results in a way that is easy to reproduce, present and share with others. JupyterLab allows the use of “extensions”, which add functionality to the platform. One of these is Jupyter-AI [1], which allows the use of Large Language Models (LLMs), such as ChatGPT, Claude Sonnet and Ollama, within the JupyterLab environment, through a chat interface or directly within notebooks. By integrating LLMs into JupyterLab, it is possible to leverage their code generation capabilities to assist a user to translate their analysis tasks from an idea to actual executable code in an efficient manner. One drawback of using these LLMs in tasks involving spatio–temporal data is that the models typically do not have access to the data necessary for the analysis task and will often resort to generating fictional data or using placeholders in the code that they create. This requires the user to adapt the provided code to their data, which removes some of the utility provided by the LLM.

In this context we make use of FrevaGPT, an approach for using LLMs in climate data analysis that allows for quick, complex and reproducible analyses of data sets, such as decadal climate model forecasts. Leveraging LLM’s capability to write code and using few-shot prompting (in-context learning) allows the LLM to utilize Freva [2,3] (Free Evaluation Framework), a data search and analysis platform, which provides a standardised interface to spatio-temporal datasets hosted on an HPC cluster [4].

FrevaGPT integrates seamlessly into Jupyter-AI and, by making use of the Freva library, combines the code-generating capabilities of LLMs with contextual understanding of how to access relevant datasets on the HPC cluster. This in addition with FrevaGPT’s ability to execute generated code in an isolated environment on an HPC node, annotating and explaining any intermediate results, as well as automatically correcting errors encountered along the way, could serve as a starting ramp for researchers to efficiently produce new analysis products based on spatio-temporal climate data.

This PICO will include examples of using FrevaGPT within JupyterLab to analyse spatio-temporal datasets from the climate of the past, as well seasonal to decadal climate predictions.

References:

[1] Jupyter-AI GitHub Repository: https://github.com/jupyterlab/jupyter-ai
[2] Kadow, Christopher, Sebastian Illing, Etor E. Lucio-Eceiza, Martin Bergemann, Mahesh Ramadoss, Philipp S. Sommer, Oliver Kunst, et al.. 2021. “Introduction to Freva – A Free Evaluation System Framework for Earth System Modeling”. Journal of Open Research Software 9 (1): 13. https://doi.org/10.5334/jors.253.
[3] Freva GitHub Repository: https://github.com/FREVA-CLINT/freva
[4] Public Freva Instance: https://www.freva.dkrz.de/

How to cite: Oertel, F., Lucio Eceiza, E., Willmann, S., Wentzel, B., Bergemann, M., and Kadow, C.: An AI-Copilot for JupyterLab for climate data analyses using FrevaGPT, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16905, https://doi.org/10.5194/egusphere-egu25-16905, 2025.

16:22–16:24

EGU25-20435

ECS

OSPP

Virtual presentation

Integrating Machine Learning and Time Series Models for Spatiotemporal Temperature Prediction: A Case Study from Apulia, Italy

Nouman Iqbal, Sandra De Iaco, and Monica Palma

Temperature plays a critical role in climate systems and resource management. Understanding spatiotemporal evolution of the temperature is vital for effective climate adaptation and resource management. Traditional models often treat spatial and temporal aspects separately, limiting their ability to capture the full correlation between these dimensions. This study evaluates various time series and machine learning models, including Holt-Winters, SARIMA, TSLM, NNAR, and ANN, using a daily dataset from 30 meteorological stations in Apulia region (Italy) from 1982 to 2023. These models are assessed based on RMSE and MAE metrics. The best models are then integrated with spatiotemporal kriging of the residual data, with results showing that the hybrid approach outperforms traditional methods. This generated high-resolution predictive maps provide valuable insights into temperature trends, supporting better decision-making in agriculture, water management, and climate resilience.

Funding information

Financial support from ICSC–National Research Center in High Performance Computing, Big Data and Quantum Computing, funded by European Union–NextGenerationEU”
Project name: PNRR-HPC; Project code: CN00000013; CUP: C83C22000560007.

How to cite: Iqbal, N., De Iaco, S., and Palma, M.: Integrating Machine Learning and Time Series Models for Spatiotemporal Temperature Prediction: A Case Study from Apulia, Italy, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-20435, https://doi.org/10.5194/egusphere-egu25-20435, 2025.

OSPP judging

16:24–16:26

PICO2.2

EGU25-19037

ECS

OSPP

On-site presentation

Efficient Large Ensemble Generation of Climate Model Output Using Latent Diffusion and Spatio-Temporal Transformers

Johannes Meuer, Maximilian Witte, Claudia Timmreck, and Christopher Kadow

Estimating uncertainty in climate scenarios often requires generating large ensembles of high-resolution simulations, a task that is both computationally and memory intensive. To overcome these challenges, we propose a deep learning framework that combines a variational autoencoder for dimensionality reduction with a denoising diffusion probabilistic model built on a spatio-temporal transformer architecture. The model is trained on large ensembles of low-resolution climate model outputs to capture internal variability and a single high-resolution climate model output to generate high-resolution simulations. This innovative approach enables the dynamic generation of large ensembles of high-resolution simulations with minimal computational overhead, eliminating the need for storing extensive precomputed data. By facilitating the efficient quantification of uncertainty, this framework provides a powerful tool for exploring a wide range of high-resolution climate outcomes, supporting the development of informed climate policies and adaptation strategies.

How to cite: Meuer, J., Witte, M., Timmreck, C., and Kadow, C.: Efficient Large Ensemble Generation of Climate Model Output Using Latent Diffusion and Spatio-Temporal Transformers, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19037, https://doi.org/10.5194/egusphere-egu25-19037, 2025.

OSPP judging

Applications in Land Use/Land Cover

16:26–16:28

PICO2.3

EGU25-1321

ECS

OSPP

On-site presentation

Can Vegetation Breakpoints in Eastern Mongolia grassland be detected using Sentinel-1 coherence time series data?

Shuxin ji

Mongolian society and food production depends heavily on livestock farming, which is usually practiced with nomadic systems. Consequently, movement patterns of herders are crucial in respect of finding sufficient forage and sustainable use of pastures. In this study, a combination of InSAR, optical and weather time series data has been explored as a tool for spatio-temporal grazing monitoring. To detect movement patterns, a machine learning (ML) based method to detect breakpoints in vegetation condition has been developed and compared to the widely-used Breaks For Additive Season and Trend (BFAST) algorithm. The results have been validated using test sites spread across the entire eastern Mongolian steppe ecosystem, covering different grassland use intensities. The results indicate that (1) ML method performed superior compared to BFAST, detecting 41.5% of breakpoints. (2) Breakpoints in summer pastures mainly occurred from April to June, while on winter pastures, they emerged in October, November, and the following February and March. (3) Regarding spatial prediction, the model developed in this study predicts breakpoints in areas distinguish between summer and winter camps, However, there is insufficient data to conclusively attribute the occurrence of pasture breakpoints to herder movements.

How to cite: ji, S.: Can Vegetation Breakpoints in Eastern Mongolia grassland be detected using Sentinel-1 coherence time series data?, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-1321, https://doi.org/10.5194/egusphere-egu25-1321, 2025.

OSPP judging

16:28–16:30

PICO2.4

EGU25-18720

ECS

On-site presentation

Leveraging AI for Material Identification in Unauthorized Dumps for Circular Economy Applications

Adi Mager, Vered Blass, Aryeh Gorun, Yoni Tsur, and Moni Shahar

Aerial imagery has emerged as a powerful tool for environmental analysis and decision-making, enabling us to gain valuable insights. We present a comprehensive approach for performing semantic segmentation on aerial images of illegally dumped construction waste. We focus on the detection and analysis of the waste content to utilize it for circular economy. Leveraging the Segment Anything Model (SAM) developed by Meta, we produced highly accurate masks from aerial drone images. We created a dataset of over 46,000 manually labeled masks, which serve as ground truth for training and evaluation. Then we fine-tuned the ResNet-50 classification model together with the deep learning model. Our methodology combines the prediction of the classification model with these detailed masks to produce the final waste stream map. The map offers a comprehensive understanding of the open area allowing for further potential stocks analysis and economic evaluation. Overall, we achieved 86% detection accuracy on our full dataset, where for common classes the accuracy is higher. The waste identification can be used for economic and environmental decisions-making necessity of cleanup operations. The results also allow better planning of potential untapped stocks and treatment of different waste streams, aiding in local circular economy and waste management strategies. Our model development can serve the waste management and recycling sectors as well as municipal and national policy makers.

How to cite: Mager, A., Blass, V., Gorun, A., Tsur, Y., and Shahar, M.: Leveraging AI for Material Identification in Unauthorized Dumps for Circular Economy Applications, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-18720, https://doi.org/10.5194/egusphere-egu25-18720, 2025.

Discussion Post a comment

16:30–16:32

PICO2.5

EGU25-19281

ECS

On-site presentation

From a regional to field scale - transfer learning for Earth observation based crop yield forecasts

Emanuel Bueechi, Felix Reuß, Miroslav Pikl, Vojtech Lukas, Miroslav Trnka, Lucie Homolova, and Wouter Dorigo

Climate change is threatening food security, necessitating optimized resource management to ensure food availability. Field-scale crop yield forecasts, using machine learning and Earth observation data, have great potential for adaptive farm management, but the development of such models is curbed by the scarcity of field-scale training data. This strongly limits the applicability of traditional machine-learning approaches for field-scale crop yield modeling. However, increasingly popular transfer learning techniques provide a solution to improve this, since they can learn from a different domain than the one they are applied for. Here, we explore transfer learning to forecast crop yields on a field scale by training the model on a regional scale (where we have abundant data in Europe). We use Sentinel-1 and Sentinel-2 data with an artificial neural network to forecast maize, winter wheat, and spring barley yields in southern Czechia. We compared four model setups: two classical machine learning approaches trained and tested on a regional scale and one trained and tested on a field scale as a baseline. We compared these models to two transfer learning models that are trained on a regional scale and tested on a field scale, one with and one without fine-tuning the model using field-scale data. Forecasts were calculated at four lead times (1-4 months) before harvest. We showed that transfer learning with fine-tuning demonstrates superior performance, achieving correlations of approximately 0.75 at a one-month lead time for all crops. It outperformed the field scale-trained model by 0.05-0.12. In addition, transfer learning required significantly less field-level data to achieve a performance comparable to the model trained at the field level: 50% of the data for spring barley and maize, and only 25% for winter wheat. Therefore, this transfer learning approach improves the efficiency of crop yield data utilization and enhances field-level crop yield forecasting.

The work of this study was conducted in the frame of the project “Yield Prediction and Estimation from Earth Observation” (funded by ESA - Contract No. 4000141154/23/I-EF)

How to cite: Bueechi, E., Reuß, F., Pikl, M., Lukas, V., Trnka, M., Homolova, L., and Dorigo, W.: From a regional to field scale - transfer learning for Earth observation based crop yield forecasts, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19281, https://doi.org/10.5194/egusphere-egu25-19281, 2025.

16:32–16:34

EGU25-12795

ECS

OSPP

Virtual presentation

Optimal Use of Multi-Sensor Data for Precision Agriculture: Sentinel-1 and Sentinel-2 Fusion in Crop Classification

maryam choukri, ahmed laamrani, and abdelghani chehbouni

Effective land monitoring and land use classification are critical for proper management of resources especially in heterogeneous and climate diverse areas. Consequently, this study seeks to test the hypothesis that the integration of Sentinel-1 radar and Sentinel-2 optical data enhances the degree of discrimination of crops in major farming areas of Morocco from the years 2020 to 2022. A three-dimensional coordinate system was established which included a series of processing stages that started with cloud masking, scaling of reflectance, and radar optical integration. At each year’s end, temporal averages and composites were created using selected Sentinel-2 spectral bands B2, B3, B4, B8, B11, B12 and Sentinel-1 VV & VH dual polarization channels. Ground truth samples from four major crops; Baley, Crop, D. Wheat and S. Wheat were used as the training set in a Random Forest classifier. The results for the three agricultural zones indicated high overall accuracies greater than 80% for each year, with the application of a combination of radar and optical data sets contributing greatly towards the ability to differentiate the crops located in cloud folded and spectral overlapping areas. Many classes had high consumer accuracy (≥70%) levels, yet several crops, like D. Wheat, had poor producer accuracy, possibly due to the uneven distribution of ground truth data sets. The small amount of Kappa coefficients between 0.50 and 0.60 also indicate moderate agreement similar to the validation data and thus more accurate ground truth and class targeted feature detection is needed. This study emphasizes the relevance notes of the multi-sensor data fusion technology for crop monitoring and also landcover classification which contributes to precision farming and resources management. Future work will focus on including temporal characteristics as well as state-of-the-art machine learning techniques to solve class balance issues and improve classification performance.

How to cite: choukri, M., laamrani, A., and chehbouni, A.: Optimal Use of Multi-Sensor Data for Precision Agriculture: Sentinel-1 and Sentinel-2 Fusion in Crop Classification, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-12795, https://doi.org/10.5194/egusphere-egu25-12795, 2025.

OSPP judging

16:34–16:36

EGU25-19971

Virtual presentation

Geo Intelligence: A Key to Sustainable Paddy and Maize Production in Hassan District

Vinay Shivamurthy, Mansoor Palat Ebrahim, and Vinuta M Betegeri

Agriculture, a cornerstone of human civilization, exerts a significant impact on natural resources to fulfill societal food demands. Climate change, exacerbated by anthropogenic activities and environmental consequences, poses a critical threat to agricultural productivity. While modern agronomic practices have enhanced yields, they have also resulted in detrimental consequences such as habitat loss, reduced biodiversity, and resource depletion.

This study investigates crop suitability in Hassan District, India, by integrating Artificial Intelligence (AI) with Geographic Information Systems (GIS). Eight key geo-climatic and pedological factors, relatively stable over time, were considered. Determining optimal land use for targeted crop cultivation is crucial in the face of climate change and global food security concerns.

Geospatial technologies and Sequential AI have demonstrated significant potential in addressing agricultural and environmental challenges through data-driven approaches. This research assesses the suitability of land for paddy, maize, and gram cultivation during the kharif season in Hassan District. A weighted metric approach was employed within a GIS environment, utilizing a Sequential Artificial Neural Network (ANN) model. Initially, an equal-weighted arithmetic mean was used to evaluate seven criteria encompassing soil, climate, and topographic factors. Likewise, criterion weights were derived from a sequential regression model, reflecting their relative importance in crop suitability prediction.Slope, soil depth, and rainfall emerged as the most influential factors, collectively accounting for 76% of the total weight. The results demonstrated an improvement in site suitability assessment compared to conventional methods, highlighting the efficacy of this integrated AI-GIS approach.

How to cite: Shivamurthy, V., Palat Ebrahim, M., and M Betegeri, V.: Geo Intelligence: A Key to Sustainable Paddy and Maize Production in Hassan District, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19971, https://doi.org/10.5194/egusphere-egu25-19971, 2025.

Applications in Geology

16:36–16:38

PICO2.7

EGU25-11150

ECS

On-site presentation

Tackling Spatial Multiple Features AI/ML Problems in Geology with Hexagons

Marie Katrine Traun, Finn Sandø, and Søren Lund Jensen

As Artificial Intelligence (AI) and Machine Learning (ML) methods evolve at an explosive pace, there is an increased need to handle geological data challenges if we wish to (continue to) ride the AI/ML wave. Most geological data is at its core geospatial data in different shapes and formats. A few examples are polygon-based geological maps, geophysical and remote sensing raster grids and a plethora of sample analyses with coordinate data. Applications of geological data are as varied as the Earth is vast. However, these differing geospatial data formats in geology significantly limit the interoperability of datasets in an analytical ML context. Multivariable analyses of geological data often involve extensive spatial interpolation and projection headaches. Consequently, we must first solve geospatial data challenges to fully tackle inter- and intradisciplinary geoscience problems with ML and AI predictions on multivariable cross-disciplinary geological data. At our company, Scandinavian Highlands, we are building a platform and database structure to break down these geospatial format barriers using a hexagonal discrete global grid system called H3. The H3 grid represents all positions on Earth’s surface by hexagon (and 12 pentagon) cells at different levels of coarseness, ie. resolutions, down to 1 m² cell area. The resolutions are bound together by a systematic parent cell to children cells hierarchy. We process different types of geospatial geological data (raster and vector) to an H3 grid representation at the appropriate resolution for the given dataset. In doing this, we create a database structure where different geological data layers can be seamlessly merged into a single feature “stack” table for AI/ML purposes at either local, regional or global scales and across individual dataset resolutions. In this presentation, we demonstrate the hexagonal multiple feature stack concept in action, from simple grouped/filtered visualisation, regression and descriptive statistics to dimension reduction techniques (e.g. PCA and t-SNE), clustering and other supervised and unsupervised methods. Furthermore, all analysis results can be assessed spatially on the map, grounding them on the Earth’s surface and in real-life decision-making use cases.

How to cite: Traun, M. K., Sandø, F., and Jensen, S. L.: Tackling Spatial Multiple Features AI/ML Problems in Geology with Hexagons, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-11150, https://doi.org/10.5194/egusphere-egu25-11150, 2025.

Discussion Post a comment

16:38–16:40

EGU25-6404

ECS

OSPP

Virtual presentation

A new data-efficient, deep learning-based methodology for geological subsurface reconstructions

Rodrigo Uribe-Ventura, Yoan Barriga-Berrios, Jorge Barriga-Gamarra, Patrice Baby, and Willem Viveen

Deep learning approaches for geological subsurface reconstruction typically require extensive training datasets, limiting their practical application in geosciences where data acquisition is costly and sparse. We present a methodology using sparse convolutional autoencoders that effectively learns from synthetically generated training data while maintaining strong generalization to real-world scenarios. Our model is trained exclusively on synthetic basin boundary configurations and corresponding forward-modeled Vertical Electrical Sounding (VES) responses, thereby eliminating reliance on extensive real-world training datasets. Through transfer learning, the model achieves high reconstruction accuracy with as few as 1000 synthetic training examples. Systematic tests reveal the model preserves strong performance beyond its training distribution, suggesting it learns robust heuristic approximations and remains effective beyond the training range of 3–50 input points.

The trained model was applied to the Huancayo tectonic basin in the Peruvian Andes. There, the 300 to 350-m deep subsurface geometry of the tectonic basin was sucessfully modeled on basis of data input from 41, newly acquired VES logs along two cross sections of 12- and 14-km long. Surprisingly, the reconstruction also revealed previously unidentified fold and thrust systems, for which the model was not explicitely trained, while also maintaining physical consistency with field measurements.

Our results demonstrate that sparse convolutional autoencoders, when trained on synthetic datasets, can effectively bridge the gap between data-hungry deep learning methods and data-sparse geological applications.

How to cite: Uribe-Ventura, R., Barriga-Berrios, Y., Barriga-Gamarra, J., Baby, P., and Viveen, W.: A new data-efficient, deep learning-based methodology for geological subsurface reconstructions, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-6404, https://doi.org/10.5194/egusphere-egu25-6404, 2025.

OSPP judging

16:40–16:42

PICO2.8

EGU25-20280

On-site presentation

Analysis of Geological and Geophysical Data in the Eastern Russian Arctic Using Machine Learning Techniques

Ivan Lisenkov and Anatoly Soloviev

This study presents a comprehensive exploration of the collection and analysis of diverse geological and geophysical datasets from the eastern sector of the Russian Arctic. By leveraging advanced machine learning (ML) techniques, including convolutional neural networks, decision trees, and classical regression models, we provide insights into both data acquisition—encompassing geological, gravimetric, magnetic, and other parameters—and the subsequent analysis and interpretation of these data.

The research is structured around three primary objectives:

Data Collection and Structuring: A systematic approach to the acquisition and organization of information on the geological and geophysical conditions in the eastern Russian Arctic.
Application of Machine Learning Techniques: Employing cutting-edge ML methods to analyze and interpret the collected datasets.
Findings and Practical Implications: Highlighting key results and conclusions, with an emphasis on their practical applications in Arctic geological and geophysical research.

This work aims to introduce conference participants to innovative ML methodologies in geophysical data analysis and emphasizes the significance of employing diverse approaches to enhance understanding and application. The study also underscores the broader potential of these methods for application in other regions and global-scale research.

How to cite: Lisenkov, I. and Soloviev, A.: Analysis of Geological and Geophysical Data in the Eastern Russian Arctic Using Machine Learning Techniques, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-20280, https://doi.org/10.5194/egusphere-egu25-20280, 2025.

16:42–16:44

PICO2.9

EGU25-1121

On-site presentation

A Machine Learning-Based Framework for Mineral Prospectivity Modeling: Predicting Epithermal Gold Mineralization in Northern New Brunswick, Canada

Farzaneh Mami khalifani, David Lentz, and James Walker

Gold deposits in New Brunswick, part of the Canadian Appalachians, formed during various stages of the Appalachian orogeny. Significant regional-scale transcurrent faults that are locally controlling cogenetic magmatic, include the Restigouche, Rocky Brook-Millstream, McCormack-Ramsay Brook, McKenzie Gulch, and Moose Lake faults, played a crucial role in shaping the geological framework and enabling the focusing of mineralizing fluifds in northern New Brunswick. The mineral systems approach is applied here to link conceptual models of mineralization processes with available exploration data, aiming to achieve effective mineral prospectivity mapping (MPM). This method is designed to streamline exploration efforts, minimizing both time and cost, which are key priorities in the mineral exploration industry. A machine learning-based data-driven approach was utilized to evaluate 18 predictor maps with a pixel size of 200 meters. These MPM maps integrated diverse features, including geochemical indicators for Au, As, Sb, Zn, Pb, Cu, and Mo in till to define geochemical anomalies, airborne radiometric data for K, eU, and eTh, as well as aeromagnetic and LiDAR datasets, to interpret geological characteristics, structural features, faults, intrusive and extrusive units, and lithological contacts. A series of edge enhancement filters, including Reduced to Pole (RTP), first vertical derivative (FVD), tilt derivative (TDR), and analytic signal (AS), were applied to the dataset, followed by a 3D inversion. Our results show that bimodal felsic to mafic intrusive and extrusive igneous systems exhibit a strong magnetic response, a conclusion validated through correlation with drill core assay data. Moreover, this study utilized principal component analysis (PCA) of till data to determine pathfinder and indicator elements associated with gold mineralization. A MPM model was created for epithermal gold mineralization using a Support Vector Machine (SVM), incorporating the known gold occurrences and deposits of the area. The performance of the resulting MPM maps was evaluated using the area under the receiver operating characteristic curves (AUC-ROC). The study concludes that SVM is a robust tool for mineral exploration, providing a data-driven approach to identifying new mineral deposits with greater accuracy and efficiency.

How to cite: Mami khalifani, F., Lentz, D., and Walker, J.: A Machine Learning-Based Framework for Mineral Prospectivity Modeling: Predicting Epithermal Gold Mineralization in Northern New Brunswick, Canada, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-1121, https://doi.org/10.5194/egusphere-egu25-1121, 2025.

Applications in Earth and Environmental Sciences

16:44–16:46

EGU25-7462

Virtual presentation

Environmental Data Exploration and Modelling Using Extreme Learning Machine

Mikhail Kanevski

Extreme Learning Machine (ELM) is a fast and efficient learning algorithm designed for single-layer feedforward neural networks. It stands out by randomly initializing input weights and biases, which remain fixed during training, and learning the output weights using a closed-form solution. This approach eliminates the need for iterative optimization, significantly accelerating the training process. ELM is known for its generalization performance and versatility. Kernelized ELM enhance its capability to model complex nonlinear systems. However, achieving optimal performance requires careful tuning of hyperparameters, such as the number of hidden neurons and the regularization parameter.

ELM has been widely applied in environmental risk and natural hazard assessments, climate and meteorological modelling, hydrology, renewable energy analysis and time series forecasting. Recent advancements have extended the standard ELM model to include multilayer architectures, deep learning methodologies, unsupervised learning, and multiple kernel ELMs, broadening its applicability to more challenging and diverse problems.

This research investigates the application of ELM for intelligent environmental data exploration and modelling. The study focuses on addressing problems in spatial and spatio-temporal data exploration, analysis, and modelling, including feature engineering and selection, multi-scale analysis, data normalization and anisotropy, nonlinearity, multivariate analysis and uncertainty quantification.

The quality of ELM-based modelling is assessed through the examination of unexplained variability in data and a comprehensive analysis of residuals. Various ELM configurations are applied throughout all phases of the research, enabling a flexible approach. Due to its computational efficiency, ELM facilitates numerous simulations and experiments, providing deeper insights into the data and the resulting models. Both simulated and real-world environmental datasets, including pollution, precipitation, and permafrost data, are utilized. Finally, the performance of ELM is compared with other machine learning algorithms in order to evaluate its effectiveness and reliability.

How to cite: Kanevski, M.: Environmental Data Exploration and Modelling Using Extreme Learning Machine, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-7462, https://doi.org/10.5194/egusphere-egu25-7462, 2025.

16:46–16:48

PICO2.11

EGU25-4051

ECS

On-site presentation

Score-based Diffusion Models for the Space-Time Interpolation of Sea Surface Turbidity

Thi-Thuy-Nga Nguyen, Mahima Lakra, Frédéric Jourdin, and Ronan Fablet

This study explores the application of score-based generative diffusion models for mapping sea surface Suspended Particulate Matter (SPM) of the Dutch Wadden Sea using satellite-derived images, focusing on their comparative efficacy against state of the art deterministic methods such as 4DVarNet, UNet, and DInEOF. Although deterministic deep learning approaches provide robust reconstructions, they often struggle with probabilistic uncertainty and extreme values of overly complex real-world scenarios. Our findings indicate that diffusion models, when conditioned with 4DVarNet and DInEOF, offer improved performance over DInEOF and UNet. Although slightly less accurate than 4DVarNet, this discrepancy is not a significant concern, as the primary goal extends beyond merely maintaining accurate reconstructions. Instead, our approach aims to provide a comprehensive view of the distribution through the samples. Our results show that diffusion models are able to generate the tail of the distribution, thereby capturing extreme values more effectively. And they assist in identifying areas of high uncertainty, particularly when the samples show inconsistencies. Furthermore, unlike typical 2D diffusion models, this study employs a 3D approach, incorporating 2D spatial and 1D temporal dimensions, allowing the model to capture dynamic physical changes over time and enhance the accuracy of probabilistic predictions of the image time series.

How to cite: Nguyen, T.-T.-N., Lakra, M., Jourdin, F., and Fablet, R.: Score-based Diffusion Models for the Space-Time Interpolation of Sea Surface Turbidity, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-4051, https://doi.org/10.5194/egusphere-egu25-4051, 2025.

16:48–16:50

PICO2.12

EGU25-6262

ECS

On-site presentation

Bridging Data Gaps in Water Quality Modelling: A Machine Learning Framework for Absence Point Generation in Geospatial Binary Classifications

Seyed Amir Naghibi, Kourosh Ahmadi, and Ronny Berndtsson

Geospatial monitoring of water quality is essential for managing and protecting groundwater resources, particularly in agricultural regions where nitrate contamination poses significant environmental and public health risks. This study presents a novel methodology for generating absence points in geospatial binary classifications applied to nitrate levels in groundwater across Odense, Denmark. We developed machine learning designed to generate absence points using multiple approaches for binary classification: random, buffer-based, similarity-based, and Maxent-based methods. The integration of maximum entropy into the absence generation workflow allowed us to identify low-susceptibility zones, improving the accuracy of binary classification. The dataset comprised geospatial nitrate concentration levels derived from environmental, hydrological, and anthropogenic variables. Spatial data included high-resolution land-use maps and hydrological parameters. Model evaluation was conducted using Random Forest, with results indicating that the Maxent-based approach consistently outperformed other methods across all metrics, including precision (0.96), AUC (0.96), and TSS (0.91). This method proved particularly effective in handling the challenges associated with presence-only data and produced the most reliable predictions for nitrate contamination in groundwater. The findings underscore the importance of leveraging advanced absence generation techniques to enhance model performance in geospatial classification modeling.

How to cite: Naghibi, S. A., Ahmadi, K., and Berndtsson, R.: Bridging Data Gaps in Water Quality Modelling: A Machine Learning Framework for Absence Point Generation in Geospatial Binary Classifications, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-6262, https://doi.org/10.5194/egusphere-egu25-6262, 2025.

16:50–16:52

PICO2.13

EGU25-16038

On-site presentation

Spatial autocorrelation in machine learning for modelling soil organic carbon

Alexander Kmoch, Jeonghwan Choi, Clay Taylor Harrison, and Evelyn Uuemaa

Spatial autocorrelation, the relationship between nearby samples of a spatial random variable, is often overlooked in machine learning models, leading to biased results. We investigated various methods to account for, address, and integrate spatial autocorrelation for modelling and prediction of soil organic carbon (SOC) using random forest models. We created and evaluated five different RF models to incorporate spatial structure through methods like buffer distances, KNN/RFSI coordinates, GWRFR, and kriging/RFRK. These were compared against a baseline models that did not have any added spatial components. Cross-validation showed slight improvements in accuracy for models considering spatial autocorrelation, while Shapley Additive Explanations confirmed the importance of spatial variables. However, no decrease in spatial autocorrelation of residuals was observed. The raster-based models exhibited enhanced prediction detail, but high-resolution validation data availability limited thorough validation. The findings emphasize the value of incorporating spatial autocorrelation for improved SOC prediction in machine learning models. We applied the models to predict SOC for the whole of Estonia in 10m raster resolution. Computational differences provided additional insights into pragmatic choices of models.

How to cite: Kmoch, A., Choi, J., Harrison, C. T., and Uuemaa, E.: Spatial autocorrelation in machine learning for modelling soil organic carbon, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16038, https://doi.org/10.5194/egusphere-egu25-16038, 2025.

Discussion Post a comment

16:52–18:00

Interactive presentations at PICO screens