NP4.1
Big Data and AI in the Earth Sciences

NP4.1

Big Data and AI in the Earth Sciences
Co-sponsored by IEEE GRSS
Convener: Peter Baumann | Co-conveners: Otoniel José Campos EscobarECSECS, Sandro Fiore, Mikhail Kanevski, Kwo-Sen Kuo
vPICO presentations
| Thu, 29 Apr, 15:30–17:00 (CEST)

vPICO presentations: Thu, 29 Apr

Chairpersons: Peter Baumann, Otoniel José Campos Escobar
15:30–15:32
|
EGU21-417
|
ECS
|
Luis Angel Vega Ramirez, Ronald Michael Splez Madero, Juan Contreras Perez, David Caress, David A. Clague, and Jennifer B. Paduan

The mapping of faults and fractures is a problem of high relevance in Earth Sciences. However, their identification in digital elevation models is a time-consuming task given the resulting networks' fractal nature. The effort is especially challenging in submarine environments, given their inaccessibility and difficulty in collecting direct observations. Here, we propose a semi-automated method for detecting faults in high-resolution gridded bathymetry data (~1 m horizontal and ~0.2 m vertical) of the Pescadero Basin in the southern Gulf of California, which were collected by MBARI's D. Allan B autonomous underwater vehicle. This problem is well suited to be explored by machine learning and deep-learning methods. The method learns from a model trained to recognize fault-line scarps based on key morphological attributes in the neighboring Alarcón Rise. We use the product of the mass diffusion coefficient with time, scarp height, and root-mean-square error as training attributes. The method consists of projecting the attributes from a three-dimensional space to a one-dimensional space in which normal probability density functions are generated to classify faults. The LDA implementation results in various cross-sectional profiles along the Pescadero Basin show that the proposed method can detect fault-line scarps of different sizes and degradation stages. Moreover, the method is robust to moderate amounts of noise (i.e., random topography and data collection artifacts) and correctly handles different fault dip angles. Experiments show that both isolated and linkage fault configurations are detected and tracked reliably.

How to cite: Vega Ramirez, L. A., Splez Madero, R. M., Contreras Perez, J., Caress, D., Clague, D. A., and Paduan, J. B.: A new Method for Fault-Scarp Detection Using Linear Discriminant Analysis (LDA) in High-Resolution Bathymetry Data From the Alarcón Rise and Pescadero Basin, Gulf of California., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-417, https://doi.org/10.5194/egusphere-egu21-417, 2021.

15:32–15:34
|
EGU21-501
|
Highlight
Antonios Konstantaras, Theofanis Frantzeskakis, Emmanouel Maravelakis, Alexandra Moshou, and Panagiotis Argyrakis

This research aims to depict ontological findings related to topical seismic phenomena within the Hellenic-Seismic-Arc via deep-data-mining of the existing big-seismological-dataset, encompassing a deep-learning neural network model for pattern recognition along with heterogeneous parallel processing-enabled interactive big data visualization. Using software that utilizes the R language, seismic data were 3D plotted on a 3D Cartesian plane point cloud viewer for further investigation of the formed three-dimensional morphology. As a means of mining information from seismic big data, a deep neural network was trained and refined for pattern recognition and occurrence manifestation attributes of seismic data of magnitudes greater than Ms 4.0. The deep learning neural network comprises of an input layer with six input neurons for the insertion of year, month, day, latitude, longitude and depth, followed by six hidden layers with a hundred neurons each, and one output layer of the estimated magnitude level. This approach was conceptualised to investigate for topical patterns in time yielding minor, interim and strong seismic activity, such as the one depicted by the deep learning neural network, observed in the past ten years on the region between Syrna and Kandelioussa. This area’s coordinates are around 36,4 degrees in latitude and 26,7 degrees in longitude, with the deep learning neural network achieving low error rates, possibly depicting a pattern in seismic activity.

References

Axaridou A., I. Chrysakis, C. Georgis, M. Theodoridou, M. Doerr, A. Konstantaras, and E. Maravelakis. 3D-SYSTEK: Recording and exploiting the production workflow of 3D-models in cultural heritage. IISA 2014 - 5th International Conference on Information, Intelligence, Systems and Applications, 51-56, 2014.

Konstantaras A. Deep Learning and Parallel Processing Spatio-Temporal Clustering Unveil New Ionian Distinct Seismic Zone. Informatics, 7 (4), 39, 2020.

Konstantaras A.J. Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters. Earth Science Informatics. 9 (1), 95-100, 2016.

Konstantaras A.J. Classification of distinct seismic regions and regional temporal modelling of seismicity in the vicinity of the Hellenic seismic arc. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 6 (4), 1857-1863, 2012.

Konstantaras A., F. Vallianatos, M.R. Varley, J.P. Makris. Soft-Computing modelling of seismicity in the southern Hellenic Arc. IEEE Geoscience and Remote Sensing Letters, 5 (3), 323-327, 2008.

Konstantaras A., M.R. Varley, F. Vallianatos, G. Collins and P. Holifield. Recognition of electric earthquake precursors using neuro-fuzzy methods: methodology and simulation results. Proc. IASTED Int. Conf. Signal Processing, Pattern Recognition and Applications (SPPRA 2002), Crete, Greece, 303-308, 2002.

Maravelakis E., Konstantaras A., Kilty J., Karapidakis E. and Katsifarakis E. Automatic building identification and features extraction from aerial images: Application on the historic 1866 square of Chania Greece. 2014 International Symposium on Fundamentals of Electrical Engineering (ISFEE), Bucharest, 1-6, 2014. doi: 10.1109/ISFEE.2014.7050594.

Maravelakis E., A. Konstantaras, K. Kabassi, I. Chrysakis, C. Georgis and A. Axaridou. 3DSYSTEK web-based point cloud viewer. IISA 2014 - 5th International Conference on Information, Intelligence, Systems and Applications, 262-266, 2014.

Maravelakis E., Bilalis N., Mantzorou I., Konstantaras A. and Antoniadis A. 3D modelling of the oldest olive tree of the world. International Journal Of Computational Engineering Research. 2 (2), 340-347, 2012.

How to cite: Konstantaras, A., Frantzeskakis, T., Maravelakis, E., Moshou, A., and Argyrakis, P.: Heterogeneous Parallel Processing Enabled Deep Learning Pattern Recognition of Seismic Big Data in Syrna and Kandelioussa, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-501, https://doi.org/10.5194/egusphere-egu21-501, 2021.

15:34–15:36
|
EGU21-2105
|
Hristos Tyralis, Georgia Papacharalampous, and Andreas Langousis

Random forests is a supervised machine learning algorithm which has witnessed recently an exponential increase in its implementation in water resources. However, the existing implementations have been restricted in applications of Breiman’s (2001) original algorithm to regression and classification models, while numerous developments could be also useful for solving diverse practical problems. Here we popularize random forests for the practicing hydrologist and present alternative random forests based algorithms and related concepts and techniques, which are underappreciated in hydrology. We review random forests applications in water resources and provide guidelines for the full exploitation of the potential of the algorithm and its variants. Relevant implementations of random forests related software in the R programming language are also presented.

How to cite: Tyralis, H., Papacharalampous, G., and Langousis, A.: Random forests in water resources, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2105, https://doi.org/10.5194/egusphere-egu21-2105, 2021.

15:36–15:38
|
EGU21-6697
|
Mikhail Kanevski

Nowadays a wide range of methods and tools to study and forecast time series is available. An important problem in forecasting concerns embedding of time series, i.e. construction of a high dimensional space where forecasting problem is considered as a regression task. There are several basic linear and nonlinear approaches of constructing such space by defining an optimal delay vector using different theoretical concepts. Another way is to consider this space as an input feature space – IFS, and to apply machine learning feature selection (FS) algorithms to optimize IFS according to the problem under study (analysis, modelling or forecasting). Such approach is an empirical one: it is based on data and depends on the FS algorithms applied. In machine learning features are generally classified as relevant, redundant and irrelevant. It gives a reach possibility to perform advanced multivariate time series exploration and development of interpretable predictive models.

Therefore, in the present research different FS algorithms are used to analyze fundamental properties of time series from empirical point of view. Linear and nonlinear simulated time series are studied in detail to understand the advantages and drawbacks of the proposed approach. Real data case studies deal with air pollution and wind speed times series. Preliminary results are quite promising and more research is in progress.

How to cite: Kanevski, M.: Empirical analysis of time series using feature selection algorithms, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6697, https://doi.org/10.5194/egusphere-egu21-6697, 2021.

15:38–15:40
|
EGU21-7409
|
ECS
|
Highlight
|
Otoniel José Campos Escobar and Peter Baumann

Multi-dimensional arrays (also known as raster data, gridded data, or datacubes) are key, if not essential, in many science and engineering domains. In the case of Earth sciences, a significant amount of the data that is produced falls into the category of array data. That being said, the amount of data that is produced daily from this field is huge. This makes it hard for researchers to analyze and retrieve any valuable insight from it. 1-D sensor data, 2-D satellite imagery, 3-D x/y/t image time series and x/y/z subsurface voxel data, 4-D x/y/z/t atmospheric and ocean data often produce dozens of Terabytes of data every day, and the rate is only expected to increase in the future. In response, Array Databases systems were specifically designed and constructed to provide modeling, storage, and processing support for multi-dimensional arrays. They offer a declarative query language for flexible data retrieval and some, e.g., rasdaman, provide federation processing and standard-based query capabilities compliant with OGC standards such as WCS, WCPS, and WMS. However, despite these advances, the gap between efficient information retrieval and the actual application of this data remains very broad, especially in the domain of artificial intelligence AI and machine learning ML.

In this contribution, we present the state-of-art in performing ML through Array Databases. First, a motivating example is introduced from the Deep Rain Project which aims at enhancing rainfall prediction accuracy in mountainous areas by implementing ML code on top of an Array Database. Deep Rain also explores novel methods for training prediction models by implementing server-side ML processing inside the database. A brief introduction of the Array Database rasdaman that is used in this project is also provided featuring its standard-based query capabilities and scalable federation processing features that are required for rainfall data processing. Next, the workflow approach for ML and Array Databases that is employed in the Deep Rain project is described in detail listing the benefits of using an Array Database with declarative query language capabilities in the machine learning pipeline. A concrete use case will be used to illustrate step by step how these tools integrate. Next, an alternative approach will be presented where ML is done inside the Array Database using user-defined functions UDFs. Finally,  a detailed comparison between the UDF and workflow approach is presented explaining their challenges and benefits.

How to cite: Campos Escobar, O. J. and Baumann, P.: Towards AI in Array Databases, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7409, https://doi.org/10.5194/egusphere-egu21-7409, 2021.

15:40–15:42
|
EGU21-8259
|
ECS
|
Andreas Gerhardus and Jakob Runge

The quest to understand cause and effect relationships is at the basis of the scientific enterprise. In cases where the classical approach of controlled experimentation is not feasible, methods from the modern framework of causal discovery provide an alternative way to learn about cause and effect from observational, i.e., non-experimental data. Recent years have seen an increasing interest in these methods from various scientific fields, for example in the climate and Earth system sciences (where large scale experimentation is often infeasible) as well as machine learning and artificial intelligence (where models based on an understanding of cause and effect promise to be more robust under changing conditions.)

In this contribution we present the novel LPCMCI algorithm for learning the cause and effect relationships in multivariate time series. The algorithm is specifically adapted to several challenges that are prevalent in time series considered in the climate and Earth system sciences, for example strong autocorrelations, combinations of time lagged and contemporaneous causal relationships, as well as nonlinearities. It moreover allows for the existence of latent confounders, i.e., it allows for unobserved common causes. While this complication is faced in most realistic scenarios, especially when investigating a system as complex as Earth's climate system, it is nevertheless assumed away in many existing algorithms. We demonstrate applications of LPCMCI to examples from a climate context and compare its performance to competing methods.

Related reference:
Gerhardus, Andreas and Runge, Jakob (2020). High-recall causal discovery for autocorrelated time series with latent confounders. In Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020). 

How to cite: Gerhardus, A. and Runge, J.: LPCMCI: Causal Discovery in Time Series with Latent Confounders, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8259, https://doi.org/10.5194/egusphere-egu21-8259, 2021.

15:42–15:44
|
EGU21-8584
|
ECS
|
Christoph Käding and Jakob Runge

The Earth’s climate is a highly complex and dynamical system. To better understand and robustly predict it, knowledge about its underlying dynamics and causal dependency structure is required. Since controlled experiments are infeasible in the climate system, observational data-driven approaches are needed. Observational causal inference is a very active research topic and a plethora of methods have been proposed. Each of these approaches comes with inherent strengths, weaknesses, and assumptions about the data generating process as well as further constraints.
In this work, we focus on the fundamental case of bivariate causal discovery, i.e., given two data samples X and Y the task is to detect whether X causes Y or Y causes X. We present a large-scale benchmark that represents combinations of various characteristics of data-generating processes and sample sizes. By comparing most of the current state-of-the-art methods, we aim to shed light onto the real-world performance of evaluated methods. Since we employ synthetic data, we are able to precisely control the data characteristics and can unveil the behavior of methods when their underlying assumptions are met or violated. Further, we give a comparison on a set of real-world data with known causal relations to complete our evaluation.

How to cite: Käding, C. and Runge, J.: A Benchmark for Bivariate Causal Discovery Methods, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8584, https://doi.org/10.5194/egusphere-egu21-8584, 2021.

15:44–15:46
|
EGU21-8844
|
ECS
|
Vadim Rezvov, Mikhail Krinitskiy, Alexander Gavrikov, and Sergey Gulev

Surface winds — both wind speed and vector wind components — are fields of fundamental climatic importance. The character of surface winds greatly influences (and is influenced by) surface exchanges of momentum, energy, and matter. These wind fields are of interest in their own right, particularly concerning the characterization of wind power density and wind extremes. Surface winds are influenced by small-scale features such as local topography and thermal contrasts. That is why accurate high-resolution prediction of near‐surface wind fields is a topic of central interest in various fields of science and industry. Statistical downscaling is the way for inferring information on physical quantities at a local scale from available low‐resolution data. It is one of the ways to avoid costly high‐resolution simulations. Statistical downscaling connects variability of various scales using statistical prediction models. This approach is fundamentally data-driven and can only be applied in locations where observations have been taken for a sufficiently long time to establish the statistical relationship. Our study considered statistical downscaling of surface winds (both wind speed and vector wind components) in the North Atlantic. Deep learning methods are among the most outstanding examples of state‐of‐the‐art machine learning techniques that allow approximating sophisticated nonlinear functions. In our study, we applied various approaches involving artificial neural networks for statistical downscaling of near‐surface wind vector fields. We used ERA-Interim reanalysis as low-resolution data and RAS-NAAD dynamical downscaling product (14km grid resolution) as a high-resolution target. We compared statistical downscaling results to those obtained with bilinear/bicubic interpolation with respect to downscaling quality. We investigated how network complexity affects downscaling performance. We will demonstrate the preliminary results of the comparison and propose the outlook for further development of our methods.

This work was undertaken with financial support by the Russian Science Foundation grant № 17-77-20112-P.

How to cite: Rezvov, V., Krinitskiy, M., Gavrikov, A., and Gulev, S.: Comparison of AI-based approaches for statistical downscaling of surface wind fields in the North Atlantic, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8844, https://doi.org/10.5194/egusphere-egu21-8844, 2021.

15:46–15:48
|
EGU21-12157
|
ECS
|
Highlight
|
Benjamin Kellenberger, Thor Veen, Eelke Folmer, and Devis Tuia

Recently, Unmanned Aerial Vehicles (UAVs) equipped with high-resolution imaging sensors have become a viable alternative for ecologists to conduct wildlife censuses, compared to foot surveys. They cause less disturbance by sensing remotely, they provide coverage of otherwise inaccessible areas, and their images can be reviewed and double-checked in controlled screening sessions. However, the amount of data they generate often makes this photo-interpretation stage prohibitively time-consuming.

In this work, we automate the detection process with deep learning [4]. We focus on counting coastal seabirds on sand islands off the West African coast, where species like the African Royal Tern are at the top of the food chain [5]. Monitoring their abundance provides invaluable insights into biodiversity in this area [7]. In a first step, we obtained orthomosaics from nadir-looking UAVs over six sand islands with 1cm resolution. We then fully labelled one of them with points for four seabird species, which required three weeks for five annotators to do and resulted in over 21,000 individuals. Next, we further labelled the other five orthomosaics, but in an incomplete manner; we aimed for a low number of only 200 points per species. These points, together with a few background polygons, served as training data for our ResNet-based [2] detection model. This low number of points required multiple strategies to obtain stable predictions, including curriculum learning [1] and post-processing by a Markov random field [6]. In the end, our model was able to accurately predict the 21,000 birds of the test image with 90% precision at 90% recall (Fig. 1) [3]. Furthermore, this model required a mere 4.5 hours from creating training data to the final prediction, which is a fraction of the three weeks needed for the manual labelling process. Inference time is only a few minutes, which makes the model scale favourably to many more islands. In sum, the combination of UAVs and machine learning-based detectors simultaneously provides census possibilities with unprecedentedly high accuracy and comparably minuscule execution time.

Fig. 1: Our model is able to predict over 21,000 birds in high-resolution UAV images in a fraction of time compared to weeks of manual labelling.

 

References

1. Bengio, Yoshua, et al. "Curriculum learning." Proceedings of the 26th annual international conference on machine learning. 2009.

2. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

3. Kellenberger, Benjamin, et al. “21,000 Birds in 4.5 Hours: Efficient Large-scale Seabird Detection with Machine Learning.” Remote Sensing in Ecology and Conservation. Under review.

4. LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436-444.

5. Parsons, Matt, et al. "Seabirds as indicators of the marine environment." ICES Journal of Marine Science 65.8 (2008): 1520-1526.

6. Tuia, Devis, Michele Volpi, and Gabriele Moser. "Decision fusion with multiple spatial supports by conditional random fields." IEEE Transactions on Geoscience and Remote Sensing 56.6 (2018): 3277-3289.

7. Veen, Jan, Hanneke Dallmeijer, and Thor Veen. "Selecting piscivorous bird species for monitoring environmental change in the Banc d'Arguin, Mauritania." Ardea 106.1 (2018): 5-18.

How to cite: Kellenberger, B., Veen, T., Folmer, E., and Tuia, D.: Deep Learning Enhances the Detection of Breeding Birds in UAV Images, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12157, https://doi.org/10.5194/egusphere-egu21-12157, 2021.

15:48–15:50
|
EGU21-12486
|
ECS
|
Highlight
|
Martin Sudmanns, Hannah Augustin, Lucas van der Meer, Andrea Baraldi, and Dirk Tiede

The Sen2Cube.at is a Sentinel-2 semantic Earth observation (EO) data and information cube that combines an EO data cube with an AI-based inference engine by integrating a computer-vision approach to infer new information. Our approach uses semantic enrichment of optical images and makes the data and information directly available and accessible for further use within an EO data cube. The architecture is based on an expert system, in which domain-knowledge can be encoded in semantic models (knowledgebase) and applied to the Sentinel-2 data as well as semantically enriched, data-derived information (factbase).  

The initial semantic enrichment in the Sen2Cube.at system is general-purpose, user- and application-independent, derived directly from optical EO images as an initial step towards a scene classification map. These information layers are automatically generated from Sentinel-2 images with the SIAM software (Satellite Image Automated Mapper). SIAM is a knowledge-based and physical-model-based decision tree that produces a set of information layers in a fully automated process that is applicable worldwide and does not require any samples. A graphical inference engine allows application-specific Web-based semantic querying based on the generic information layer as a replicable and explainable approach to produce information. The graphical inference engine is a new Browser-based graphical user interface (GUI) developed in-house with a semantic querying language. Users formulate semantic models in a graphical way and can execute them on any area-of-interest and time interval, which will be evaluated by the core of the inference engine attached to the data cube. This also enables non-expert users to formulate analyses without requiring programming skills.  

While the methodology is software-independent, the prototype is based on the Open Data Cube and additional in-house developed components in the Python programming language. Scaling is possible depending on the available infrastructure resources due to the system’s Docker-based container architecture. Through its fully automated semantic enrichment, innovative graphical querying language in the GUI for semantic querying and analysis as well as the implementation as a scalable infrastructure, this approach is suited for big data analysis of Earth observation data. It was successfully scaled to a national data cube for Austria, containing all available Sentinel-2 images from the platforms A and B. 

How to cite: Sudmanns, M., Augustin, H., van der Meer, L., Baraldi, A., and Tiede, D.: The Sen2Cube.at national semantic Earth observation data cube for Austria, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-12486, https://doi.org/10.5194/egusphere-egu21-12486, 2021.

15:50–15:52
|
EGU21-13051
|
ECS
|
Mikhail Borisov and Mikhail Krinitskiy

Total cloud score is a characteristic of weather conditions. At the moment, there are algorithms that automatically calculate cloudiness based on a photograph of the sky These algorithms do not know how to find the solar disk, so their work is not absolutely accurate.

To create an algorithm that solves this data, the data used, obtained as a result of sea research voyages, is used, which is marked up for training the neural network.

As a result of the work, an algorithm was obtained based on neural networks, based on a photograph of the sky, in order to determine the size and position of the solar disk, other algorithms can be used to work with images of the visible hemisphere of the sky.

How to cite: Borisov, M. and Krinitskiy, M.: Artificial neural networks in the problem of determining the position and size of the solar disk in wide-angle photographs of the sky, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13051, https://doi.org/10.5194/egusphere-egu21-13051, 2021.

15:52–15:54
|
EGU21-14467
|
ECS
|
Highlight
Mariam Hussain and Seon Ki Park

Bangladesh experiences extreme weather events such as heavy rainfall due to monsoon, tropical cyclones, and thunderstorms resulting in floods every year. Regular flood events significantly affect in agricultural industries and human lives for economic losses. One of the reasons for these weather phenomena to sustain is latent heat release from Bay of Bengal (BoB) and Southeast Tropical Indian Ocean (SETIO). As the country has limited observations from stations and oceans, modeling for numerical weather prediction (NWP) are challenging for local operations. For operational NWP, computational resources and time are also concerns for a developing country like Bangladesh. Besides, recent machine learning (ML) techniques are widely applied to study various meteorological events with efficient results. Therefore, this research aims to estimate predictability and accuracy of supervised ML for tropical cyclones by assessing air temperature at 2 meter (AT) and sea surface temperature (SST). For AT and SST, the study utilizes monthly data at 0.25 × 0.25o horizontal resolution provided by the ECMWF reanalysis (ERA5). The gridded data is downscaled to area of interests such as coastal regions, BoB and SEITO with a study period of 40 years from 1979 to 2018. Furthermore, Bangladesh Meteorological Department (BMD) provides AT for 36 years from 1979 to 2015. The experiments segregate into two sections: (1) data normalizations via linear regression (LR) and multi-linear regression (MLR) and (2) supervised ML techniques applications in Matlab 2018b. The pre-processed data for LR show that AT from coastal regions such as Chittagong (CG), Barishal (BR), and Khulna (KL) divisions have stronger correlations (R) to SST in BOB with R = 0.910, 0.850, and 0.846 respectively than SEITO (R = 0.698, 0.675 and 0.678 respectively). Moreover, for these three regions, the correlation of MLR is 0.916 and 0.745 for BoB and SEITO with residual standard error (RSE) 1.312 and 1.218 respectively. For supervised ML applications, coarse decision tree (CDT) predict SST based on AT with train (80%) and test (20%) of the ERA5 data. Finally, the results from CDT model indicate that SST predictions are possible with 98.5% accuracy based on coastal stations. The trained CDT also validated model prediction utilizing observed AT (BMD observations) to forecast monthly SST and found 85% accuracy for monthly time series. In conclusions, CDT can predict SST from station data and assess if there is any possibility for tropical cyclone formation. The future works include further assessment for various categories of tropical cyclone and predict their intensity based on SSTs. This research aims to contribute in disaster mitigation by improving early warning systems. The possibility of cyclone formations will help for preparedness in saving property damages in Bangladesh.

How to cite: Hussain, M. and Park, S. K.: Supervised Machine Learning Techniques to Assess Tropical Cyclones in Bay of Bengal and Bangladesh, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14467, https://doi.org/10.5194/egusphere-egu21-14467, 2021.

15:54–15:56
|
EGU21-14470
|
ECS
|
Nikita Veremev

Within the framework of meteorology and oceanology, the importance of the cloud mass and the type of clouds cannot be underestimated. When describing and studying weather, precipitation and the movement of air masses over the ocean, the amount and type of clouds determines the flows of precipitation, their intensity, helps to predict the weather and the content of various impurities in the air, which makes the study of the properties of cloud cover one of the key aspects of meteorological and oceanological research.

The types of clouds are determined by the specialist, visually comparing the picture of the sky over the ocean with the guideline documents, the use of which reduces the possibility of the human factor affecting the determination of these parameters.

For an accurate study, study of the dynamics and dependence of climatic models on the conditions of cloud types, long-term measurements of the same type and the continuity of their methods are required. However, all these data are very unevenly distributed over the Earth's surface, and the number of ship observations is greatly reduced.

Thus, taking into account the importance of reliable determination of data related to cloudiness and the problems of their accuracy, the relevance and need to automate the determination of cloud types are obvious.

As a result of the work, an algorithm was obtained that allows classifying cloud types based on photographs taken during long-term sea expeditions.

How to cite: Veremev, N.: The use of artificial neural networks in the problem of classifying cloud types in wide-angle images of the visible hemisphere of the sky., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14470, https://doi.org/10.5194/egusphere-egu21-14470, 2021.

15:56–15:58
|
EGU21-14553
|
ECS
|
Alexis Guillot, Shaodi You, Hans van't Woud, Matthijs Perenboom, Amanda Kruijver, and Bernard Foing

The use of artificial intelligence and specifically deep learning (DL) approaches in the domain of remote sensing is increasing. Such methods provide excellent results and show great potential for future applications. Earth observation sensors are able to deliver data with higher spatial, spectral and temporal resolutions. In this project, we use Sentinel-2 multispectral data and couple this input with a crowd annotated very high resolution (VHR) map which is generated in the video-game Cerberus, developed by the company BlackShore. In Cerberus, players are able to map features, like buildings, forest and specific types of crop fields, that are subsequently used as input for the Machine Learning (ML) pipeline. The ML pipeline is applied to classify crop fields in a larger region.

The main objective of this research is to study the accuracy of a model in detecting and describing the type of crop and whether the addition of a temporal dimension increases the accuracy. We will be experimenting with different methods that take their root in DL. The study region shown to Cerberus-players is Oromia in Ethiopia, south of the capital Addis Ababa. Using Sentinel-2 data, we aim to extend the generated maps to cover Ethiopia.

First, we will implement two DL methods; Random Forest (RF), and a 3D Convolutional Neural Network (CNN) that do not make use of the temporal dimension in order to have a baseline of the expected accuracy from a single multi-spectral image. Next, we will investigate four models that make use of time series: 1) a hybrid convolutional neural network-random forest (CNN-RF); 2) a 3D CNN that takes as input the output of a stack of 3D CNNs; 3) a model based on Recurrent Neural Networks (RNNs) performing pixel-based classification; and 4) an innovative method that combines the strength of RNNs, CNNs and Generative Adversarial Networks. 

We are now implementing the methods and shall report on results at EGU April 2021. For future research, it could be a very interesting case to study the possibility of generalizing the combined approach of crowd annotated training data with extended classification over larger regions and generalizing to other areas.

How to cite: Guillot, A., You, S., van't Woud, H., Perenboom, M., Kruijver, A., and Foing, B.: Crowd, Crops and Machines: How crowdsourced annotations can help towards crops classification, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14553, https://doi.org/10.5194/egusphere-egu21-14553, 2021.

15:58–16:00
|
EGU21-15729
|
ECS
Miguel-Ángel Fernández-Torres, J. Emmanuel Johnson, María Piles, and Gustau Camps-Valls

Automatic anticipation and detection of extreme events constitute a major challenge in the current context of climate change. Machine learning approaches have excelled in detection of extremes and anomalies in Earth data cubes recently, but are typically both computationally costly and supervised, which hamper their wide adoption. We alternatively present here an unsupervised, efficient, generative approach for extreme event detection, whose performance is illustrated for drought detection in Europe during the severe Russian heat wave in 2010. The core architecture of the model is generic and could naturally be extended to the detection of other kinds of anomalies. First, it computes hierarchical appearance (spatial) and motion (temporal) representations of several informative Essential Climate Variables (ECVs), including soil moisture, land surface temperature, as well as features describing vegetation health. Then, these representations are combined using Gaussianization Flows that yield a spatio-temporal anomaly score. This allows the proposed model not only to detect droughts areas, but also to explain why they were produced, monitoring the individual contributions of each of the ECVs to the indicator at its output.

How to cite: Fernández-Torres, M.-Á., Johnson, J. E., Piles, M., and Camps-Valls, G.: Spatio-Temporal Gaussianization Flows for Extreme Event Detection, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15729, https://doi.org/10.5194/egusphere-egu21-15729, 2021.

16:00–16:02
|
EGU21-15841
|
ECS
Wei Li, Georg Rümpker, Horst Stöcker, Megha Chakraborty, Darius Fenner, Johannes Faber, Kai Zhou, Jan Steinheimer, and Nishtha Srivastava

This study presents a deep learning based algorithm for seismic event detection and simultaneous phase picking in seismic waveforms. U-net structure-based solutions which consists of a contracting path (encoder) to capture feature information and a symmetric expanding path (decoder) that enables precise localization, have proven to be effective in phase picking. The network architecture of these U-net models mainly comprise of 1D CNN, Bi- & Uni-directional LSTM, transformers and self-attentive layers. Althought, these networks have proven to be a good solution, they may not fully harness the information extracted from multi-scales.

 In this study, we propose a simple yet powerful deep learning architecture by combining multi-class with attention mechanism, named MCA-Unet, for phase picking.  Specially, we treat the phase picking as an image segmentation problem, and incorporate the attention mechanism into the U-net structure to efficiently deal with the features extracted at different levels with the goal to improve the performance on the seismic phase picking. Our neural network is based on an encoder-decoder architecture composed of 1D convolutions, pooling layers, deconvolutions and multi-attention layers. This architecture is applied and tested to a field seismic dataset to check its performance.

How to cite: Li, W., Rümpker, G., Stöcker, H., Chakraborty, M., Fenner, D., Faber, J., Zhou, K., Steinheimer, J., and Srivastava, N.: MCA-Unet: Multi-class Attention-aware U-net for Seismic Phase Picking, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15841, https://doi.org/10.5194/egusphere-egu21-15841, 2021.

16:02–16:04
|
EGU21-15917
|
ECS
|
Highlight
|
Samir Zamarialai, Thijs Perenboom, Amanda Kruijver, Zenglin Shi, and Bernard Foing

Remote sensing (RS) imagery, generated by e.g. cameras on satellites, airplanes and drones, has been used for a variety of applications such as environmental monitoring, detection of craters, monitoring temporal changes on planetary surfaces.

In recent years, researchers started applying Computer Vision [TP1] methods on RS data. This led to a steady development of remote sensing classification, providing good results on classification and segmentation tasks on RS data.  However, there are still problems with current approaches. Firstly, the main focus is on high-resolution RS imagery. Apart from the fact that these data are not accessible to everyone, the models fail to generalize on lower resolution data. Secondly, the models fail to generalize on more fine-grained classes. For example, models tend to generalize very well on detecting buildings in general, however they fail to distinguish if a building belongs to a fine-grained subclass like residential or commercial buildings. Fine-grained classes often appear very similar to each other, therefore, models have problems to distinguish between them. This problem occurs both in high-resolution and low-resolution RS imagery, however the drop in accuracy is much more significant when using lower resolution data.

For these reasons, we propose a Multi-Task Convolutional Neural Network (CNN) with three objective functions for segmentation of RS imagery. This model should be able to generalize on different resolutions and receive better accuracy than state-of the-art approaches, especially on fine-grained classes.

The model consists of two main components. The first component is a CNN that transforms the input image to a segmentation map. This module is optimized with a pixel-wise Cross-Entropy loss function between the segmentation map of the model and the ground truth annotations. If the input image is of lower resolution, this segmentation map will miss out on the complete structure of input images. The second component is another CNN to build a high-resolution image from the low-resolution input image in order to reconstruct fine-grained structure information. This module essentially guides the model to learn more fine-grained feature representations. The transformed image from this module will have much more details like sharper edges and better color. The second CNN module is optimized with a Mean-Squared-Error loss function between the original high-resolution image and the transformed image. Finally, the two images created by the model are then evaluated by a third objective function that aims to learn the distance of similarity between the segmented input image and the super-high resolution segmentation. The final objective function consists of a sum of the three objectives mentioned above. After the model is finished with training, the second module should be detached, meaning high-resolution imagery is only needed during the training phase.

At the moment we are implementing the model. Afterwards, we will benchmark the model against current state of the art approaches. The status will be presented at EGU 2021.­

How to cite: Zamarialai, S., Perenboom, T., Kruijver, A., Shi, Z., and Foing, B.: Monitoring Temporal Developments from Remote Sensing Data using AI Fine-Grained Segmentation, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15917, https://doi.org/10.5194/egusphere-egu21-15917, 2021.

16:04–16:06
|
EGU21-15941
|
ECS
Megha Chakraborty, Georg Rümpker, Horst Stöcker, Wei Li, Johannes Faber, Darius Fenner, Kai Zhou, and Nishtha Srivastava

This study attempts to use Deep Learning architectures to design an efficient real time magnitude classifier for seismic events. Various combinations of Convolutional Neural Networks (CNNs) and Bi- & Uni-directional Long-Short Term Memory (LSTMs) and Gated Recurrent Unit (GRUs) are tried and tested to obtain the best performing model with optimum hyperparameters. In order to extract maximum information from the seismic waveforms, this study uses not only the time series data but also its corresponding Fourier Transform (spectrogram) as input. Furthermore, the Deep Learning architecture is combined with other machine learning algorithms to generate the final magnitude classifications. This study is likely to help seismologists in improving the Earthquake Early Warning System to avoid issuing false warnings, which not only alarms people unnecessarily but can also result in huge financial losses due to stoppage of industrial machinery etc.

How to cite: Chakraborty, M., Rümpker, G., Stöcker, H., Li, W., Faber, J., Fenner, D., Zhou, K., and Srivastava, N.: Real Time Magnitude Classification of Earthquake Waveforms using Deep Learning, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15941, https://doi.org/10.5194/egusphere-egu21-15941, 2021.

16:06–16:08
|
EGU21-15944
|
ECS
|
Laura Mansfield, Peer Nowack, and Apostolos Voulgarakis

In order to make predictions on how the climate would respond to changes in global and regional emissions, we typically run simulations on Global Climate Models (GCMs) with perturbed emissions or concentration fields. These simulations are highly expensive and often require the availability of high-performance computers. Machine Learning (ML) can provide an alternative approach to estimating climate response to various emissions quickly and cheaply. 

We will present a Gaussian process emulator capable of predicting the global map of temperature response to different types of emissions (both greenhouse gases and aerosol pollutants), trained on a carefully designed set of simulations from a GCM. This particular work involves making short-term predictions on 5 year timescales but can be linked to an emulator from previous work that predicts on decadal timescales. We can also examine uncertainties associated with predictions to find out where where the method could benefit from increased training data. This is a particularly useful asset when constructing emulators for complex models, such as GCMs, where obtaining training runs is costly. 

How to cite: Mansfield, L., Nowack, P., and Voulgarakis, A.: Predicting climate model response to changing emissions, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15944, https://doi.org/10.5194/egusphere-egu21-15944, 2021.

16:08–16:10
|
EGU21-16410
|
ECS
|
Highlight
|
Niklas Griessbaum, Mike Rilee, James Frew, and Kwo-Sen Kuo

When working with ungridded remote sensing data, such as swath surface reflectance like Moderate Resolution Imaging Spectroradiometer (MODIS) MOD09 or Visible Infrared Imaging Radiometer Suite (VIIRS) VNP09, extracting targeted information of interest from a collection of granules can be a challenging exercise. Given a region of interest (ROI), it is tedious both to determine the subset of granules that intersect the ROI, as well as identifying, within the granules, the individual instantaneous field of views (IFOVs) contained by the ROI.

The SpatioTemporal Adaptive-Resolution Encoding (STARE) is an indexing scheme that recursively divides the Earth's surface into quadtree hierarchies, allowing triangular elements ("trixels") of varying sizes (resolutions) to be identified with unique index values. STARE is also a software library that operates on STARE indices. It can efficiently determine the spatial relationship between two trixels, by evaluating their index values, if the trixels share a common path in the STARE tree structure. By representing geographical regions as the sets of trixels with adaptive resolutions that tesselating them, STARE provides an elegant method to determine geospatial coincidence of arbitrarily shaped geographic regions, with accuracy up to ~7-8 cm in length. 

In this presentation, we introduce STARELite, a SQLite STARE extension and its use for cataloguing volumes of remote sensing granules that researchers often possess in their local storage. In this application, STARELite is used to determine subsets of granules intersecting arbitrary ROIs. Further, STARELite can be used for the inverse search problem: Determining all spatially coincident granules of an individual granule. STARELite leverages other components of the STARE ecosystem; namely STARE sidecars, which hold the trixel index values of each iFOV and a set of trixels representing the cover of each granule; STAREMaster, which is used to generate STARE sidecar files; and STARPandas, a Python Pandas extension used to bootstrap STARELite databases.

Given the limitations of SQLite, STARELite is to be understood as a proof of concept for the integration of STARE into relational databases in general. 

How to cite: Griessbaum, N., Rilee, M., Frew, J., and Kuo, K.-S.: Integrating STARE with relational databases, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16410, https://doi.org/10.5194/egusphere-egu21-16410, 2021.

16:10–17:00