Data Science and machine learning for Cryosphere and Climate 

Understanding and predicting climate variability is vital if we are to properly prepare for the impact of climate change in an increasingly warmer world, including rising sea level as a result of melting ice and iceberg discharge. Fortunately, technological developments mean that 1) our numerical models of the cryospheric and climate systems are increasingly able to capture their inherent complexity, and 2) we are able to acquire much more detailed observations of our polar regions by satellite than ever before. This also brings an important challenge however: how can we extract the maximum possible meaning from these data while minimizing the increase in uncertainty that added volume/complexity/heterogeneity brings?

In this session we invite submissions on research that applies Data Science techniques to answer research questions in Glaciology and Polar Climate studies. This includes, but is not limited to, studies using machine learning and AI, advanced statistics (e.g. extreme value analysis or changepoint methods), surrogate modelling (emulators), network analysis and innovative software/computing solutions. These could be applied to any, or any combination of, data sources including remote sensing, numerical model output and field/ground/lab observations. We are particularly interested in contributors interested in a wider discussion about Data Science and its application in Climate and Cryospheric research and in contributions which reveal new insight that would not be possible using traditional methods.

Convener: Amber Leeson | Co-conveners: Celia A. Baumhoer, James Lea, Michel Tsamados
vPICO presentations
| Fri, 30 Apr, 11:00–15:00 (CEST)

vPICO presentations: Fri, 30 Apr

Chairpersons: Amber Leeson, Celia A. Baumhoer
Using the coupled machine learning-evolutionary optimization algorithms and climate change projection models to assess the distribution of groundwater-origin aufeis in the North-East of Northern Hemisphere and their dynamic in a changing climate
Olga Makarieva, Aiding Kornejady, Andrey Shikhov, Esmaeil Silakhori, Nataliia Nesterova, Abbas Goli Jirandeh, Andrey Ostashov, Hadi Alizadeh, and Anastasiya Zemlyanskova
Zhongyang Hu, Peter Kuipers Munneke, Stef Lhermitte, Maaike Izeboud, and Michiel van den Broeke

Presently, surface melt over Antarctica is estimated using climate modeling or remote sensing. However, accurately estimating surface melt remains challenging. Both climate modeling and remote sensing have limitations, particularly in the most crucial areas with intense surface melt.  The motivation of our study is to investigate the opportunities and challenges in improving the accuracy of surface melt estimation using a deep neural network. The trained deep neural network uses meteorological observations from automatic weather stations (AWS) and surface albedo observations from satellite imagery to improve surface melt simulations from the regional atmospheric climate model version 2.3p2 (RACMO2). Based on observations from three AWS at the Larsen B and C Ice Shelves, cross-validation shows a high accuracy (root mean square error = 0.898 mm.w.e.d−1, mean absolute error = 0.429 mm.w.e.d−1, and coefficient of determination = 0.958). The deep neural network also outperforms conventional machine learning models (e.g., random forest regression, XGBoost) and a shallow neural network. To compute surface melt for the entire Larsen Ice Shelf, the deep neural network is applied to RACMO2 simulations. The resulting, corrected surface melt shows a better correlation with the AWS observations in AWS 14 and 17, but not in AWS 18. Also, the spatial pattern of the surface melt is improved compared to the original RACMO2 simulation. A possible explanation for the mismatch at AWS 18 is its complex geophysical setting. Even though our study shows an opportunity to improve surface melt simulations using a deep neural network, further study is needed to refine the method, especially for complicated, heterogeneous terrain.

How to cite: Hu, Z., Kuipers Munneke, P., Lhermitte, S., Izeboud, M., and van den Broeke, M.: Estimating Surface Melt on the Larsen Ice Shelf Using a Deep Neural Network: Opportunities and Challenges, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-908,, 2021.

Connor Shiggins, James Lea, Dominik Fahrner, and Stephen Brough

High resolution digital elevation models (DEMs) allow for the detection of icebergs and their size distribution, potentially giving insights into spatial and temporal changes in calving dynamics and iceberg cover. Here we present a fully automated tool for iceberg detection in glaciated fjords, utilising timestamped ArcticDEM tile data within the Google Earth Engine cloud computing platform. The automated tool requires only definition of a region of interest (ROI) through the following workflow:

1. Automatically filter timestamped ArcticDEM tiles to obtain only high-quality images with high data coverage within a ROI

2. Apply elevation correction to account for the geoid and tidal state, ensuring sea level is the equivalent to 0 m elevation

3. Apply an iceberg detection elevation threshold (any object at/or above 0.9 m)

4. Automatically delineate icebergs based on elevations above this threshold

5. Iceberg area, volume (total, below and above surface), freeboard height, mass and the ArcticDEM acquisition date are appended to each iceberg

This workflow allows for rapid, fully automated analysis of all available ArcticDEM tiles within a given ROI. The workflow does not require manual supervision, and can be easily related back to the original ArcticDEM data through Google Earth Engine. As an example, we apply our workflow to a 33 km2 ROI at Nuup Kangerlua (Godthåbsfjorden), southwest Greenland, detecting a total of 57,735 icebergs from 6 images with an execution time of 19 minutes. This workflow will provide a user-friendly platform for users of any coding ability requiring a large data set of icebergs with an area size greater than approximately 40 m2. Results obtained from these data will be utilised to identify potential seasonal to multi-annual timescale changes in calving behaviour, though is dependent on ArcticDEM data availability. 

How to cite: Shiggins, C., Lea, J., Fahrner, D., and Brough, S.: Rapidly detecting icebergs using ArcticDEM and Google Earth Engine, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4366,, 2021.

Erik Loebel, Mirko Scheinert, Julia Christmann, Konrad Heidler, Martin Horwath, and Angelika Humbert

The calving of tidewater glaciers has a strong impact on the stresses of outlet glaciers and their discharge. However, it is still underrepresented in current ice-sheet models incorporating the dynamics of marine-terminating glaciers. This has an impact on simulation results when projecting future sea-level contributions of the Greenland ice sheet. The increasing availability and quality of remote sensing imagery enable us to realize a continuous and precise mapping of relevant parameters such as calving front locations. However, the huge amount of data also accentuates the necessity for intelligent analysis strategies.

In this contribution, we apply an automated workflow to extract calving front positions from multi-spectral Landsat-8 imagery utilizing deep learning. The core of the proposed workflow comprises a convolutional neural network (CNN) for image segmentation exploiting the full range of Landsat-8 multi-spectral capabilities, a statistical textural feature analysis performed on the high-resolution panchromatic band as well as topography model data. The proposed method is evaluated by an independent set of diverse test images as well as by comparing with already available ESA-CCI, MEaSUREs and PROMICE data products. With an estimated prediction error of fewer than two pixels (which equals a spatial resolution of 60 m), automatically extracted calving front locations show very small or even non-distinguishable differences to manually delineated locations. The importance of multi-spectral, textural and topographic features used as input for the CNN is estimated by a permute-and-relearn approach emphasizing their benefit, especially in challenging ice-melange, cloud, and illumination conditions. Jointly with the proposed methodology we present an exceedingly dense dataset for 20 of the most important Greenlandic outlet glaciers for the period from 2013 to 2021.

Eventually, the derived calving front positions are incorporated into the Ice Sheet and Sea-Level System Model (ISSM). For this, we engage a level set method. This method allows deriving a continuous function in time and space from discrete information at satellite acquisition time steps. As the satellite data is mainly available for fast-flowing outlet glaciers, we use simulated front positions for all remaining ice margins. An alpha-shape method seamlessly links the temporal changing calving fronts to the Greenlandic ice sheet.

How to cite: Loebel, E., Scheinert, M., Christmann, J., Heidler, K., Horwath, M., and Humbert, A.: Automated extraction of calving front locations from multi-spectral satellite imagery using deep learning: methodology and application to Greenland outlet glaciers, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4528,, 2021.

Fuming Xie, Shiyin Liu, Yu Zhu, Yongpeng Gao, Kunpeng Wu, and Miaomiao Qi

Heat exchange in glacier region is strongly affected by the interaction between solar radiation and glacial surface, and albedo is an important index to quantitatively describe energy balance in this interaction process. Under the background of global warming, the observation and modeling of albedo are of great significance in the aspects including identification of snow and ice darkening or pollution, reconstruction of glacier mass balance and inversion of supraglacial debris expansion. However, insufficient observations, coupled with low spatial resolution of satellite derived products (250-1000m), make it difficult to analyze spatial changes at the glacier scale. A convolution neural network (CNN) contains one or more of the convolution layer, in which inputs are neighborhoods of pixels, resulting in a network that is not fully-connected, has great potential to the image segmentation but is also suited to identifying spatial patterns. Therefore, in this study, a CNN model—U-NET was trained to improve the spatial resolution of albedo products. In the U-NET, we took the shortwave black-sky albedo derived from moderate resolution imaging spectroradiometer (MODIS) boarded on Terra/Aqua satellite with a spatial resolution of 500m as response variable, and raw spectral information, band ratios, and color-to-grayscale conversion from Landsat 8 optical satellite imagery and the topographical components derived from SRTM DEM products as feature variables. The predicted albedo has been validated using observations form radiometer mounted on an automatic weather station at Yazgil glacier in Hunza valley, Karakoram. The results show that the accuracy of U-NET predicted albedo (RMSE = 0.071) is similar to that of MODIS albedo (RMSE = 0.074), which proved that U-NET has great application potential. The high spatial resolution albedo estimated by the model enhances its use in the analysis of spatial changes at the glacier scale, especially for small glaciers, but the optimization of its temporal resolution needs to be further studied.

How to cite: Xie, F., Liu, S., Zhu, Y., Gao, Y., Wu, K., and Qi, M.: Spatial downscaling method of glacier surface albedo based on deep learning, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4625,, 2021.

Jeremy Carter, Amber Leeson, Andrew Orr, Christoph Kittel, and Melchior van Wessem

Understanding the surface climatology of the Antarctic ice sheet is essential if we are to adequately predict its response to future climate change. This includes both primary impacts such as increased ice melting and secondary impacts such as ice shelf collapse events. Given its size, and inhospitable environment, weather stations on Antarctica are sparse. Thus, we rely on regional climate models to 1) develop our understanding of how the climate of Antarctica varies in both time and space and 2) provide data to use as context for remote sensing studies and forcing for dynamical process models. Given that there are a number of different regional climate models available that explicitly simulate Antarctic climate, understanding inter- and intra model variability is important.

Here, inter- and intra-model variability in Antarctic-wide regional climate model output is assessed for: snowfall; rainfall; snowmelt and near-surface air temperature within a cloud-based virtual lab framework. State-of-the-art regional climate model runs from the Antarctic-CORDEX project using the RACMO, MAR and MetUM models are used, together with the ERA5 and ERA-Interim reanalyses products. Multiple simulations using the same model and domain boundary but run at either different spatial resolutions or with different driving data are used. Traditional analysis techniques are exploited and the question of potential added value from more modern and involved methods such as the use of Gaussian Processes is investigated. The advantages of using a virtual lab in a cloud based environment for increasing transparency and reproducibility, are demonstrated, with a view to ultimately make the code and methods used widely available for other research groups.

How to cite: Carter, J., Leeson, A., Orr, A., Kittel, C., and van Wessem, M.: Regional Climate Model Inter-Comparison for Antarctica within a Data Science Framework, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4892,, 2021.

Marion Leduc-Leballeur, Catherine Ritz, Giovanni Macelloni, and Ghislain Picard

The actual temperature profile is a determinant of ice rheology, which controls ice deformation and flow, and sliding over the underlying bedrock. Importantly, the ice flow in turn affects its temperature profile through strain heating, which makes observed temperature profiles a powerful input for ice sheet model validation.

Up to now temperature profile was available in few boreholes or from glaciological models. Recently, Macelloni et al. (2016) opened up new opportunities for probing ice temperature from space with the low-frequency passive sensors. Indeed, at L-band frequency, the very low absorption of ice and the low scattering by particles (grain size, bubbles in ice) allow a large penetration in the dry snow and ice (several hundreds of meters). Macelloni et al. (2019) performed the first retrieval of the ice sheet temperature in Antarctica by using the European Space Agency (ESA)’s Soil Moisture and Ocean Salinity (SMOS) L-band observations. They used the minimization of the difference between SMOS brightness temperature and microwave emission model simulations that includes a glaciological model.

Here, in the framework of the ESA 4D-Antarctica project, we propose a new method based on a Bayesian approach in order to improve the accuracy of the retrieved ice temperature and to provide an uncertainty estimation along the profiles. As a first step, a one-dimensional ice temperature profile model (Robin 1955) is used, which limits the retrieval to the Antarctic Plateau. Then, the new temperature emulator based on the three-dimensional glaciological GRISLI (Quiquet et al., 2018) will be used to enable retrievals over the entire continent (cf. Ritz’s presentation in this session for the GRISLI emulator description).

The Bayesian inference takes as free parameters: ice thickness, surface ice temperature, snow accumulation and geothermal heat flux (GHF). Their prior probability distribution is defined as normal, centered around a priori values taken from literature, and truncated to stay in a realistic range. The observed brightness temperature distribution is normal and a normal likelihood function is used to quantify the matching between the observed and simulated brightness temperature. The parameter space investigation is achieved through a Markov Chain Monte Carlo (MCMC) method. Here, the differential evolution adaptive Metropolis (DREAM) algorithm is used, which runs multiple different Markov chains in parallel and uses a discrete proposal distribution to evolve the sampler to the posterior distribution (Laloy and Vrugt, 2012).

For each SMOS brightness temperature observation, 1000 iterations are run on 5 parallel chains. The 2500 first iterations are discarded (aka. burn-in) and only the last 2500 are used for the final ice temperature profile estimation. The posterior probability distribution captures the most likely parameter set (i.e. a surface temperature, snow accumulation and GHF combination), and so, the most likely ice temperature profiles associated to this SMOS observation. It also provides the standard deviation which is an accurate estimate of the temperature uncertainty along the depth obtained with the method.

How to cite: Leduc-Leballeur, M., Ritz, C., Macelloni, G., and Picard, G.: A Bayesian approach to infer ice sheet temperature in Antarctica from satellite observations, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5000,, 2021.

David Ashmore, Douglas Mair, Jonathan Higham, Stephen Brough, James Lea, and Isabel Nias

The increasing volume and spatio-temporal resolution of satellite-derived ice velocity data has created new exploratory opportunities for the quantitative analysis of glacier dynamics. One potential technique, Proper Orthogonal Decomposition (POD), also known as Empirical Orthogonal Functions, has proven to be a powerful and flexible technique for revealing coherent structures in a wide variety of environmental flows: mapping hydraulic vortex shedding patterns, the dynamics of fluidised granular beds, and the magnetohydrodynamics of sunspots.

POD exactly describes a series of snapshots from a flow field with the product of ranked spatially orthogonal Eigenfunctions, or “modes” of spatial weighting, and one-dimensional “temporal” coefficients (Eigenvectors). In many cases the variance of the flow field is well described by just a few dominant modes. The orthogonal nature of each mode, by definition, means that the relative contribution of independent forcing mechanisms on the flow can, in theory, be separated.

In this study we investigate the applicability of POD to freely available TanDEM-X/TerraSAR-X derived ice velocity datasets of Sermeq Kujalleq (Jakobshavn Glacier), Greenland. We outline the POD procedure using the singular value decomposition of a rearranged and resampled velocity matrix and investigate the factors responsible for the dominant modes. We find dominant modes interpreted as relating to the stress-reconfiguration at the glacier terminus and the development of the glacier hydrological system, but also find that the POD is sensitive to data resampling and quality. With the proliferation of publicly available optical and radar derived velocity products (e.g. MEaSUREs/ESA CCI) we suggest POD, and potentially other modal decomposition techniques, will become increasingly useful in future studies of ice dynamics.

How to cite: Ashmore, D., Mair, D., Higham, J., Brough, S., Lea, J., and Nias, I.: Eigen-glaciers: elucidating hidden features in the flow of Sermeq Kujalleq (Jakobshavn Glacier), Greenland., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5151,, 2021.

Celia A. Baumhoer, Andreas Dietz, Mariel Dirscherl, and Claudia Kuenzer

Antarctica’s coastline is constantly changing by moving glacier and ice shelf fronts. The extent of glaciers and ice shelves influences the ice discharge and sea level contribution of the Antarctic Ice Sheet. Therefore, it is crucial to assess where ice shelf areas with strong buttressing forces are lost. So far, those changes have not been assessed for entire Antarctica within comparable time frames.

We present a framework for circum-Antarctic coastline extraction based on a U-Net architecture. Antarctic coastal-change is calculated by using a deep learning derived coastline for the year 2018 in combination with earlier manual derived coastlines of 1997 and 2009. For the first time, this allows to compare circum-Antarctic changes in glacier and ice shelf front position for the last two decades. We found that the Antarctic Ice Sheet area decreased by -29,618±1,193 km2 in extent between 1997-2008 and gained an area of 7,108±1,029km2 between 2009 and 2018. Retreat dominated for the Antarctic Peninsula and West Antarctica and advance for the East Antarctic Ice Sheet over the entire investigation period. The only exception in East Antarctica was Wilkes Land experiencing simultaneous calving front retreat of several glaciers between 2009-2018. Biggest tabular iceberg calving events occurred at Ronne and Ross Ice Shelf within their natural calving cycle between 1997-2008. Future work includes the continuous mapping of Antarctica’s coastal-change on a more frequent temporal scale.  

How to cite: Baumhoer, C. A., Dietz, A., Dirscherl, M., and Kuenzer, C.: Two decades of Antarctic coastal-change revealed by satellite imagery and deep learning, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5344,, 2021.

Catherine Ritz, Christophe Dumas, Marion Leduc-Leballeur, Giovanni Macelloni, Ghislain Picard, and Aurélien Quiquet

Ice temperature within the ice is a crucial characteristic to understand the Antarctic ice sheet evolution because temperature is coupled to ice flow. Since temperature is only measured at few locations in deep boreholes, we only rely on numerical modelling to assess ice sheet-wide temperature. However, the design of such models leads to a number of challenges. One important difficulty is that the temperature field strongly depends on the geothermal flux which is still poorly known (see White paper by Burton-Johnson and others,2020 ). Another point is that up to now there is no fully suitable model, especially for inverse approaches: i) analytical solutions are only valid in slowly flowing regions; ii) models solving only the heat equation by prescribing geometry and ice flow do not take into account the past changes in ice thickness and ice flow and do not couple ice flow and temperature. Conversely, 3D thermomechanical models that simulate the evolution of the ice sheet take into account all the relevant processes but they are too computationally expensive to be used in inverse approaches. Moreover, they do not provide a perfect fit between observed and simulated geometry (ice thickness, surface elevation) for the present-day ice sheets and this affects the simulated temperature field.

GRISLI (Quiquet et al. 2018), belongs to this family of thermomechanically coupled ice sheet models An emulator, based on deep neural network (DNN), has been developed in order to speed-up the simulation of present-day ice temperature. We use GRISLI outputs that come from 4 simulations, each covers 900000 years (8 glacial-interglacial cycles) to get rid of the initial configuration influence. The simulations differ by the geothermal flux map used as boundary condition. Finally a database is built where each ice column for each simulation is a sample used to train the DNN. For each sample, the input layer (precursor) is a vector of the present-day characteristics: ice thickness, surface temperature, geothermal flux, accumulation rate, surface velocity and surface slope. The predicted output (output layer) is the vertical profile of temperature. In the training, the weights of the network are optimized by comparison with the GRISLI temperature.

The first results are very encouraging with a RMSE of ~ 0.6 °C (calculated from the difference between the emulated temperatures and GRISLI temperatures over all the samples and all the depths). Once trained, the computational time of GRISLI-DNN for generating temperature field of whole Antarctica (16000 columns) is about 20 s.

The first application (in the framework of the ESA project 4D-Antarctica, see Leduc-Leballeur presentation in this session) will be to use this emulator associated with SMOS satellite observations to infer the 3D temperature field and improve our knowledge of geothermal flux. Indeed, it has been shown that SMOS data, coupled with glaciological and electromagnetic models, give an indication of temperature in the upper 1000 m of the ice sheet. Our emulator could also be used for initialization of computationally expensive ice sheet models.

How to cite: Ritz, C., Dumas, C., Leduc-Leballeur, M., Macelloni, G., Picard, G., and Quiquet, A.: Developing an emulator to calculate present temperature field in the Antarctic Ice Sheet, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-6029,, 2021.

Diarmuid Corr, Amber Leeson, Malcolm McMillan, and Ce Zhang

Mass loss from Greenlandic and Antarctic ice sheets are predicted to be the dominant contribution to global sea level rise in coming years. Supraglacial lakes and channels are thought to play a significant role in ice sheet mass balance by causing the speed-up of grounded ice and weakening, floating ice shelves to the point of collapse. Identifying the location, distribution and life cycle of these hydrological features on both the Greenland and Antarctic ice sheets is therefore important in understanding their present and future contribution to global sea level rise. Supraglacial hydrological features can be easily identified by eye in optical satellite imagery. However, given that there are many thousands of these features, and they appear in many hundreds of satellite images, automated approaches to mapping these features in such images are urgently needed.


Current automated approaches in mapping supraglacial hydrology tend to have high false positive and false negative rates, which are often followed by manual corrections and quality control processes. Given the scale of the data however, methods such as those that require manual post-processing are not feasible for repeat monitoring of surface hydrology at continental scale. Here, we present initial results from our work conducted as part of the 4D Greenland and 4D Antarctica projects, which increases the accuracy of supraglacial lake and channel delineation using Sentinel-2 and Landsat-7/8 imagery, while reducing the need for manual intervention. We use Machine Learning approaches including a Random Forest algorithm trained to recognise water, ice, cloud, rock, shadow, blue-ice and crevassed regions. Both labelled optical imagery and auxiliary data (e.g. digital elevation models) are used in our approach. Our methods are trained and validated using data covering a range of glaciological and climatological conditions, including images of both ice sheets and those acquired at different points during the melt-season. The workflow, developed under Google Cloud Platform, which hosts the entire archive of Sentinel-2 and Landsat-8 data, allows for large-scale application over Greenlandic and Antarctic ice sheets, and is intended for repeated use throughout future melt-seasons.

How to cite: Corr, D., Leeson, A., McMillan, M., and Zhang, C.: Automated mapping of supraglacial hydrology using Machine Learning, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7866,, 2021.

James Lea, Connor Shiggins, Stephen Brough, Stephen Livingstone, and Robert McNabb

ArcticDEM data products include timestamped high spatial resolution (2 and 10 m) digital elevations models (DEMs) covering the period 2009-2017, offering the potential for monitoring ice surface change, structural evolution, geomorphological and proglacial change. However, their varying quality, spatial and temporal data coverage, large file size and requirement for co-registration provide challenges to user accessibility and interrogation of these datasets. Inclusion of these data in the cloud computing based Google Earth Engine (GEE) platform provides opportunities for rapid analysis, though poses its own barriers to access for users through the necessity for familiarity with either JavaScript or Python coding environments. Here we present tools that allow ArcticDEM data to be rapidly queried by users with no coding background through an intuitive graphical user interface, with the aim of improving the accessibility of these datasets for the glacial and earth surface process communities.


The tools are intended to provide a means for users to perform basic data extraction from available DEMs of a given area. These include the extraction of elevation changes occurring along user defined transects, and simple DEM differencing of areas of interest. As part of data pre-processing in GEE, tiles are co-registered using dX, dY and dZ corrections provided within the ArcticDEM metadata, while areas of poor data quality are automatically detected and masked out. A full range of metadata associated with each DEM are also appended to outputs, that will allow users to undertake post-processing of results where needed. While provisional results indicate that the tools perform well, due to inaccuracies in co-registration metadata they are not yet suitable for applications where high levels of precision are required (e.g. snow depth) and in areas of very steep terrain (e.g. rock face changes). We hope to address these issues in the future, though it should be noted that such modifications are likely to significantly increase computation time.

How to cite: Lea, J., Shiggins, C., Brough, S., Livingstone, S., and McNabb, R.: ArcticDEM in Google Earth Engine: tools for rapid analysis of multi-temporal data covering glacial environments, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7958,, 2021.

Qiao Li, James Lea, and Stephen Brough

Supraglacial lakes (SGLs) are a major component of Greenland’s surface hydrology and mass balance. Monitoring their evolution at multi-day to sub-daily timescales has traditionally been performed by relatively low-resolution sensors such as MODIS Terra, though opportunities exist for using higher spatial resolution sensors at high latitudes.

In this study, we take advantage of frequent orbital crossovers of Sentinel 2 and Landsat 8 imagery at high latitudes to monitor lakes at multi-day to sub-day temporal resolution, and spatial resolutions up to/over an order of magnitude higher than MODIS Terra (10 m to 30 m, compared to ~250 m for MODIS Terra). Through leveraging the cloud computing resources of Google Earth Engine (GEE), we have developed a workflow to track the evolution of lakes for all available Sentinel 2 and Landsat 8 images over a melt season.

Our workflow builds on the approach of Moussavi et al. (2020) that was developed for Antarctica, implementing it within GEE to explore its sensitivity and suitability for application to the catchment of the North East Greenland Ice Stream (NEGIS) for the 2019 melt season. To improve the efficiency of analysis, we analyse 282 large lakes (>0.125 km^2) that were previously identified through analysis of MODIS Terra imagery. All lake outlines are appended with image ID and lake area metadata to facilitate subsequent analysis, and allow each lake outline to be traced back to the original image that it was derived from. Our approach is able to monitor lake growth and drainage at unprecedented spatial and temporal resolutions over a large area, allowing the widespread characterization of seasonal lake evolution.

How to cite: Li, Q., Lea, J., and Brough, S.: Identifying multi-day to sub-daily supraglacial lake change in Greenland from Sentinel 2 and Landsat 8 imagery using Google Earth Engine, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7965,, 2021.

Michael Hollaway, Peter Henrys, Rebecca Killick, Amber Leeson, and John Watkins

     Numerical models are essential tools for understanding the complex and dynamic nature of the natural environment and how it will respond to a changing climate. With ever increasing volumes of environmental data and increased availability of high powered computing, these models are becoming more complex and detailed in nature. Therefore the ability of these models to represent reality is critical in their use and future development. This has presented a number of challenges, including providing research platforms for collaborating scientists to explore big data, develop and share new methods, and communicate their results to stakeholders and decision makers. This work presents an example of a cloud-based research platform known as DataLabs and how it can be used to simplify access to advanced statistical methods (in this case changepoint analysis) for environmental science applications.

     A combination of changepoint analysis and fuzzy logic is used to assess the ability of numerical models to capture local scale temporal events seen in observations. The fuzzy union based metric factors in uncertainty of the changepoint location to calculate individual similarity scores between the numerical model and reality for each changepoint in the observed record. The application of the method is demonstrated through a case study on a high resolution model dataset which was able to pick up observed changepoints in temperature records over Greenland to varying degrees of success. The case study is presented using the DataLabs framework, demonstrating how the method can be shared with other users of the platform and the results visualised and communicated to users of different areas of expertise.

How to cite: Hollaway, M., Henrys, P., Killick, R., Leeson, A., and Watkins, J.: Sharing transferable methods in environmental data science: A Fuzzy changepoint approach to numerical model evaluation over Greenland., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8813,, 2021.

Molly Wieringa and Cecilia Bitz

Current sea ice prediction systems exhibit significant room for improvement compared to idealized estimates of sea ice predictability, a gap that could be closed by improving the initial conditions provided to prognostic models. Sea ice volume, the area-weighted integral of sea ice thickness (SIT), in particular, demonstrates long initial value predictability; in other words, accurate forecasting of Arctic sea ice requires highly accurate SIT initial conditions. Continuous records of SIT are, unfortunately, few and far between. To address this conundrum, we have explored applications of the Data Assimilation Research Testbed (DART) to constrain the Los Alamos Sea Ice Model (CICE5) within the Community Earth System Model using satellite-derived SIT observations from 2003 to present day. Our data assimilation system has been fine-tuned using new and highly accurate freeboard measurements from NASA’s ICESat-2 mission. Using SIT information alone, we generate two assimilation products: the first using DART with CICE5 and the second with an offline assimilation method. We compare these products to one another and to the community standard SIT record, PIOMAS. Future work will introduce multivariate assimilation of SIT with other sea ice variables, including sea ice concentration, sea ice skin temperature, and sea surface temperature.

How to cite: Wieringa, M. and Bitz, C.: A data assimilation application for improving estimates of Arctic sea ice thickness variability and change since the turn of the 21st century, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8952,, 2021.

Chairpersons: James Lea, Michel Tsamados
Veronica Tollenaar, Harry Zekollari, Stef Lhermitte, David Tax, Vinciane Debaille, Steven Goderis, Philippe Claeys, and Frank Pattyn

Meteorites provide an unparalleled view on the origin and evolution of the solar system. Antarctica is the most productive region for collecting meteorites, as the visually contrasting meteorites are easily detectable and tend to concentrate at specific areas exposing blue ice. Blue ice areas act as meteorite stranding zones if the flow of the ice sheet and specific geographical and climatological settings combine favorably. Previously, possible meteorite stranding zones were identified by chance or through visual examination of remote sensing data, which limits the discovery of new locations for future meteorite searching campaigns.

In this study, various state-of-the-art datasets are combined in a machine learning approach to estimate the likeliness of a blue ice area to be a meteorite stranding zone. Input data for a generative classifier consists of ca. 13,000 reprojected meteorite finding locations (positive observations) and 2,000,000 unlabeled observations, for which the presence of meteorites is unknown. Four features have been selected, representing the typical conditions in which meteorites are found: exposure of blue ice (radar backscatter), cold surface conditions and negative surface mass balance (surface temperature and surface slope), and almost stagnant ice flow (surface velocities). With these features, the probability of the presence of meteorites is computed for each unlabeled observation at blue ice areas. These probabilities are computed by evaluating the multidimensional density distributions of the observations on the unlabeled observations and combining these with the prior probabilities of the two classes (positive and unlabeled). As the set of training data does contain only positive and unlabeled observations, the prior probabilities are scaled. The amount of scaling is decided by maximizing the harmonic mean between precision and sensitivity, which are estimated in a cross-validation using negative observations of sites known to be absent of meteorites. In the post-processing, the pixels that likely contain meteorites are clustered, resulting in several hundreds of meteorite stranding zones.

Results show that the first continent-wide meteorite stranding zone classification is ca. 70-80% accurate (first estimate, based on independent test data). The post-processed results reveal the existence of major unexplored meteorite stranding zones, some of which are in close proximity to existing research stations. The quest to collect the meteorites remaining at the surface of the ice sheet, the number of which is estimated to exceed those already collected to date, will greatly benefit from our newly provided meteorite map.

How to cite: Tollenaar, V., Zekollari, H., Lhermitte, S., Tax, D., Debaille, V., Goderis, S., Claeys, P., and Pattyn, F.: A data-driven approach in the search for Antarctic meteorites, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9530,, 2021.

Alexis Caro, Fernando Gimeno, Antoine Rabatel, Thomas Condom, and Jean Carlos Ruiz

This study presents a glacier clustering for the Chilean Andes (17.6-55.4°S) realized with the Partitioning Around Medoids (PAM) algorithm and using topographic and climatic variables over the 1980-2019 period. We classified ~24,000 glaciers inside thirteen different clusters (C1 to C13). These clusters show specific conditions in terms of annual and monthly amounts of precipitation, temperature, and solar radiation. In the Northern part of Chile, the Dry Andes (17-36°S) gather five clusters (C1-C5) that display mean annual precipitation and temperature differences up to 400 mm/yr and 8°C, respectively, and a mean elevation difference reaching 1800 m between glaciers in C1 and C5 clusters. In the Wet Andes (36-56°S) the highest differences were observed at the Southern Patagonia Icefield (50°S), with mean annual values for precipitation above 3700 mm/yr (C12, maritime conditions) and below 1000 mm/yr in the east of Southern Patagonia Icefield (C10), and with a difference in mean annual temperature near 4°C and mean elevation contrast of 500 m.

This classification confirms that Chilean glaciers cannot be grouped only latitudinally as it has been commonly considered, hence contributing to a better understanding of recent glacier volume changes at regional and watershed scales. An example of this was observed in the Maipo watershed (33°S), where the Echaurren Norte glacier is located, which is the reference glacier for Chile and WGMS because it has the oldest time series of mass balance monitoring in the Andes, followed by the Piloto Este glacier, since the 70's. Indeed, we identified that Echaurren Norte glacier only has similarities with 5% of the glacierized surface area of the Maipo watershed. Echaurren Norte glacier is within a glacier cluster that presents warmer and wetter climate conditions (3.1°C, 574 mm/yr) than the average of the watershed, a cluster that contains also 68% of the glacierized surface composed of rock glaciers.

How to cite: Caro, A., Gimeno, F., Rabatel, A., Condom, T., and Ruiz, J. C.: Glacier Clusters identification across Chilean Andes using Topo-Climatic variables, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10852,, 2021.

Nora Gourmelon, Thorsten Seehaus, AmirAbbas Davari, Matthias Braun, Andreas Maier, and Vincent Christlein

The calving fronts of lake or marine terminating glaciers provide information about the state of glaciers. A change in its position can affect the flow of the entire glacier system, and the loss of ice mass as icebergs calve-off and discharge into the ocean has a multi-scale impact on the global hydrological cycle. The calving fronts can be manually delineated in Synthetic Aperture Radar (SAR) images. However, this is a time-consuming, tedious and expensive task. As deep learning approaches have achieved tremendous success in various disciplines, such as medical image processing and computer vision, the project Tapping the Potential of Earth Observation (TAPE) is amongst other things dedicated to applying deep learning techniques to calving front detection. So far, all our experiments have employed U-Net based architectures, as the U-Net is state-of-the-art in semantic image segmentation. A major challenge of front detection is the class imbalance: The front has significantly fewer pixels than the remaining parts of the SAR image. Hence, we developed variants of the U-Net specifically addressing this challenge including an Attention U-Net, a probabilistic Bayesian U-Net, as well as a U-Net with a distance map-based binary cross-entropy (BCE) loss function and a Mathews correlation coefficient (MCC) as early stopping criterion. In future work, we plan to investigate multi-task learning and a segmentation of the SAR image into different classes (i.e. ocean, glacier and rocks) to enhance the quality and efficiency of the front detection.

How to cite: Gourmelon, N., Seehaus, T., Davari, A., Braun, M., Maier, A., and Christlein, V.: Tapping the Potential of Earth Observation - Calving Front Detection in SAR Images using Deep Learning Techniques, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-11280,, 2021.

William Gregory, Isobel Lawrence, and Michel Tsamados

Observations of sea ice freeboard from satellite radar altimeters are crucial in the derivation of sea ice thicknessestimates, which in turn inform on sea ice forecasts, volume budgets, and productivity rates. Current spatio-temporalresolution of radar freeboard is limited as 30 days are required in order to generate pan-Arctic coverage fromCryoSat-2, or 27 days from Sentinel-3 satellites. This therefore hinders our ability to understand physical processesthat drive sea ice thickness variability on sub-monthly time scales. In this study we exploit the consistency betweenCryoSat-2, Sentinel-3A and Sentinel-3B radar freeboards in order to produce daily gridded pan-Arctic freeboardestimates between December 2018 and April 2019. We use the Bayesian inference approach of Gaussian Process Regressionto learn functional mappings between radar freeboard observations in space and time, and to subsequently retrievepan-Arctic freeboard, as well as uncertainty estimates. The estimated daily fields are, on average across the 2018-2019season, equivalent to CryoSat-2 and Sentinel-3 freeboards to within 2 mm, and cross-validation experiments show thaterrors in predictions are, on average, within 3 mm across the same period. This method presents as a robust frameworkwhich can be used to model a wide range of statistical problems, from interpolation of altimetry data sets, to timeseries forecasting.

How to cite: Gregory, W., Lawrence, I., and Tsamados, M.: A Bayesian approach towards daily pan-Arctic sea ice freeboard estimates from combined CryoSat-2 and Sentinel-3 satellite observations, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-11462,, 2021.

Ariane Castagner, Stephan Gruber, and Alexander Brenning
Excess ice can be found in the form of massive ice and within icy sediments and is an important variable to quantify as it strongly influences the geomorphic response of landscapes to permafrost thaw. The melting of excess ice in the Western Canadian Arctic has led to thaw subsidence and an increase in the number and size of thaw slumps observed across the Northwest Territories which cause issues to Northern infrastructure and affect fluvial and lacustrine watersheds. The Inuvik-Tuktoyaktuk Highway (ITH) is the first all-weather road to reach the Canadian Arctic Coast and its planning and construction has resulted in a significant cryostratigraphic dataset of 566 boreholes, which forms the basis of this contribution. Although visible ice is often recorded in boreholes, it is not a reliable measure of excess ice content on its own and there is currently no reliable method to estimate the excess ice content of boreholes based on commonly available geotechnical data. In this study, a 16-borehole subset of the ITH dataset for which samples were processed for volumetric excess ice content is used to train a beta regression model that predicts the excess ice content of stratigraphic intervals in the study area based on interval depth, visible ice content, surficial geology, and material types. The resulting predictions are compared to recorded massive ice intervals in the same boreholes and show that excess ice within icy sediments can significantly contribute to potential thaw strain and should be considered alongside massive ice when making thaw strain estimates.

How to cite: Castagner, A., Gruber, S., and Brenning, A.: Vertical distribution of excess ice in icy sediments and its statistical estimation from geotechnical data (Tuktoyaktuk Coastlands, Northwest Territories), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14048,, 2021.

Bryan Riel, Brent Minchew, and Tobias Bischoff

Reliable projections of sea level rise depend on accurate representations of how fast-flowing glaciers slip along their beds. Specifically, ice sheet models require a quantitative sliding law that relates basal drag to sliding velocity and glacier geometry, yet the proper form of the law remains uncertain. Here, we present a novel deep learning-based framework for learning the time evolution of basal drag from time-dependent ice surface velocity and elevation observations. We train a pair of probabilistic neural networks through a combination of time-dependent surface observations, governing equations for ice flow, and known physical constraints. Neural network outputs are stochastic predictions of time-varying basal drag that do not require any prior assumptions on the form of the sliding law. This training strategy is well-suited to large volumes of remote sensing data while providing a natural way to integrate our existing understanding of the physics of ice flow into the learning process.

We test this framework on 1D and 2D ice flow simulations and demonstrate that, under certain stress conditions, recovery of the underlying sliding law parameters and their uncertainties can be derived from the stochastic predictions of time-varying basal drag. We also apply these methods to Rutford Ice Stream and Pine Island Glacier in Antarctica to investigate subglacial hydrological effects for the former and evidence for regularized Coulomb sliding for the latter.

How to cite: Riel, B., Minchew, B., and Bischoff, T.: Data-Driven Inference of the Mechanics of Slip Along Glacier Beds Using Physics-Informed Neural Networks, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14245,, 2021.

Cyril Palerme and Malte Müller

There is a growing demand for accurate sea-ice forecasts in the Arctic due to increasing maritime traffic. Although the capabilities of numerical models steadily improve, sea-ice forecasts produced by numerical prediction systems are affected by biases. In order to reduce forecast errors, statistical methods can be used for calibration.

In this study, two calibration methods have been developed for calibrating sea-ice drift forecasts from an operational prediction system (TOPAZ4) in the Arctic. These methods are based on random forest algorithms, a machine learning technique suitable for assessing non-linear relationships between a set of predictors and a target variable. While all the algorithms developed in this study use the same set of predictors, two set of algorithms have been developed using either buoy or synthetic-aperture radar (SAR) observations for the target variable. Furthermore, different algorithms have been developed for predicting the direction and the speed of sea-ice drift, as well as for different lead times. The random forest algorithms use predictor variables from sea-ice concentration observations during the initialization of the forecasts, sea-ice forecasts from the TOPAZ4 prediction system, wind forecasts from the European Centre for Medium-Range Weather Forecasts, and some geographical information.

The performances of the calibrated forecasts have been evaluated and compared to those from the TOPAZ4 forecasts using buoy observations from the International Arctic Buoy Programme. Depending on the calibration method, the mean absolute error is reduced, on average, between 5.9 % and 8.1 % for the direction, and between 7.1 % and 9.6 % for the speed of sea-ice drift. However, there is a large spatial variability in the performances of these algorithms, and the random forest algorithms have particularly poor performances in the Canadian Archipelago, an area characterized by narrow channels and the presence of landfast ice.

How to cite: Palerme, C. and Müller, M.: Calibration of sea ice drift forecasts using random forest algorithms, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14317,, 2021.

Tamsin Edwards and the ISMIP6 and GlacierMIP projects and friends

The land ice contribution to global mean sea level rise has not yet been predicted with ice sheet and glacier models for the latest set of socio-economic scenarios (SSPs), nor with coordinated exploration of uncertainties arising from the various computer models involved. Two recent international projects (ISMIP6 and GlacierMIP) generated a large suite of projections using multiple models, but mostly used previous generation scenarios and climate models, and could not fully explore known uncertainties.

Here we estimate probability distributions for these projections for the SSPs using Gaussian Process emulation of the ice sheet and glacier model ensembles. We model the sea level contribution as a function of global mean surface air temperature forcing and (for the ice sheets) model parameters, with the 'nugget' allowing for multi-model structural uncertainty. Approximate independence of ice sheet and glacier models is assumed, because a given model responds very differently under different setups (such as initialisation).

We find that limiting global warming to 1.5°C would halve the land ice contribution to 21st century sea level rise, relative to current emissions pledges: the median decreases from 25 to 13 cm sea level equivalent (SLE) by 2100. However, the Antarctic contribution does not show a clear response to emissions scenario, due to competing processes of increasing ice loss and snowfall accumulation in a warming climate.

However, under risk-averse (pessimistic) assumptions for climate and Antarctic ice sheet model selection and ice sheet model parameter values, Antarctic ice loss could be five times higher, increasing the median land ice contribution to 42 cm SLE under current policies and pledges, with the 95th percentile exceeding half a metre even under 1.5°C warming.

Gaussian Process emulation can therefore be a powerful tool for estimating probability density functions from multi-model ensembles and testing the sensitivity of the results to assumptions.

How to cite: Edwards, T. and the ISMIP6 and GlacierMIP projects and friends: Gaussian Process emulation of multi-model ice sheet and glacier projections, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14561,, 2021.

Dorsa Nasrollahi Shirazi, Michel Tsamados, Isobel Lawrence, Sanggyun Lee, Thomas Johnson, Claude De Rijke-Thomas, Jack Landy, David Brockley, and Ryan Nichol

The Copernicus operational Sentinel-3A since February 2016 and Sentinel-3B since April 2018 build on the CryoSat-2 legacy in terms of their synthetic aperture radar (SAR) mode altimetry providing high-resolution radar freeboard elevation data over the polar regions up to 81N. This technology combined with the Ocean and Land Colour Instrument (OLCI) imaging spectrometer offers the first space-time collocated optical imagery and radar altimetry dataset. We use these joint datasets for validation of several existing surface classification algorithms based on Sentinel-3 altimeter echo shapes. We also explore the potential for novel AI techniques such as convolutional neural networks (CNN) for winter and summer sea ice surface classification (i.e. melt pond fraction, lead fraction, sea ice roughness). For lead surface classification we analyse the winters of 2018/19 and 2019/20 and for summer sea ice feature classification we focus on the Sentinel-3A &3B tandem phase of the summer 2018. We compare our CNN models with other existing surface classification algorithms.

How to cite: Nasrollahi Shirazi, D., Tsamados, M., Lawrence, I., Lee, S., Johnson, T., De Rijke-Thomas, C., Landy, J., Brockley, D., and Nichol, R.: Collocated OLCI optical imagery and SAR radar altimetry from Sentinel3 for enhanced sea ice surface classification, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14968,, 2021.

Daniel Cheng, Eric Larour, and Wayne Hayes

Sea level contributions from the Greenland Ice Sheet are influenced by the rapid changes in glacial terminus positions. While manual delineation is labor intensive, recent developments in the field of automated calving front extraction have allowed for high spatio-temporal resolution analysis of Greenlandic glaciers. Specifically, we analyze new developments and results from the Calving Front Machine (CALFIN). CALFIN uses machine learning in the form of deep neural networks to automatically generate 25,000+ calving front positions from 1972 to 2020 across 80+ Greenlandic basins, using Landsat and Sentinel-1 imagery. With this data, we perform a correlative analysis between area changes, centerline length changes, discharge, thickness, bed topography, and temperature, among others. Trends on the local and regional scales are examined for insights in conjunction with existing studies in the field. Ultimately, the current implementation offers a new opportunity to explore trends on the extent of Greenland's margins, and supplies new constraints for simulations of the evolution of the mass balance of the Greenland Ice Sheet and its contributions to future sea level rise. We welcome any critiques, suggestions, or questions regarding the dataset and/or our methods. This work was conducted as a collaboration between NASA’s Jet Propulsion Laboratory and the University of California, Irvine.

How to cite: Cheng, D., Larour, E., and Hayes, W.: Dense Glacial Termini Time Series Analysis: Insights from Calving Front Machine (CALFIN), EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15039,, 2021.

Daniel Clarkson, Emma Eastoe, and Amber Leeson

The Greenland ice sheet has experienced significant melt over the past 6 decades, with extreme melt events covering large areas of the ice sheet. Melt events are typically analysed using summary statistics, but the nature and characteristics of the events themselves are less frequently analysed. Our work aims to examine melt events from a statistical perspective by modelling 20 years of MODIS surface temperature data with a Spatial Conditional Extremes model. We use a Gaussian mixture model for the distribution of temperatures at each location with separate model components for ice and meltwater temperatures. This is used as a marginal model in the full spatial model and gives a more location-specific threshold to define melt at each location. The fitted model allows us to simulate melt events given that we observe an extreme temperature at a particular location, allowing us to analyse the size and magnitude of melt events across the ice sheet.

How to cite: Clarkson, D., Eastoe, E., and Leeson, A.: Statistical modelling of extreme temperatures on the Greenland Ice Sheet, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15046,, 2021.

Andreas Stokholm, Leif Pedersen, René Forsberg, and Sine Hvidegaard

In recent years the Arctic has seen renewed political and economic interest, increased maritime traffic and desire for improved sea ice navigational tools. Despite a rise in digital technology, maps of sea ice concentration used for Arctic maritime operations are still today created by humans manually interpreting radar images. This process is slow with low map release frequency, uncertainties up to 20 % and discrepancies up to 60 %. Utilizing emerging AI Convolutional Neural Network (CNN) semantic image segmentation techniques to automate this process is drastically changing navigation in the Arctic seas, with better resolution, accuracy, release frequency and coverage. Automatic Arctic sea ice products may contribute to enabling the disruptive Northern Sea Route connecting North East Asia to Europe via the Arctic oceans.

The AI4Arctic/ASIP V2 data set, that combines 466 Sentinel-1 HH and HV SAR images from Greenland, Passive Microwave Radiometry from the AMSR2 instrument, and an equivalent sea ice concentration chart produced by ice analysts at the Danish Meteorological Institute, have been used to train a CNN U-Net Architecture model. The model shows robust capabilities in producing highly detailed sea ice concentration maps with open water, intermediate sea ice concentrations as well as full sea ice cover, which resemble those created by professional sea ice analysts. Often cited obstacles in automatic sea ice concentration models are wind-roughened sea ambiguities resembling sea ice. Final inference scenes show robustness towards such ambiguities.

How to cite: Stokholm, A., Pedersen, L., Forsberg, R., and Hvidegaard, S.: Convolutional Neural Networks for Sea Ice Concentration Charting for Maritime Navigation in the Arctic, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15193,, 2021.

Julia Kaltenborn, Viviane Clay, Amy R. Macfarlane, Joshua Michael Lloyd King, and Martin Schneebeli

Snow-layer classification is an essential diagnostic task for a wide variety of cryospheric science and climate research applications. Traditionally, these measurements are made in snow pits, requiring trained operators and a substantial time commitment. The SnowMicroPen (SMP), a portable high-resolution snow penetrometer, has been demonstrated as a capable tool for rapid snow grain classification and layer type segmentation through statistical inversion of its mechanical signal. The manual classification of the SMP profiles requires time and training and becomes infeasible for large datasets.

Here, we introduce a novel set of SMP measurements collected during the MOSAiC expedition and apply Machine Learning (ML) algorithms to automatically classify and segment SMP profiles of snow on Arctic sea ice. To this end, different supervised and unsupervised ML methods, including Random Forests, Support Vector Machines, Artificial Neural Networks, and k-means Clustering, are compared. A subsequent segmentation of the classified data results in distinct layers and snow grain markers for the SMP profiles. The models are trained with the dataset by King et al. (2020) and the MOSAiC SMP dataset. The MOSAiC dataset is a unique and extensive dataset characterizing seasonal and spatial variation of snow on the central Arctic sea-ice.

We will test and compare the different algorithms and evaluate the algorithms’ effectiveness based on the need for initial dataset labeling, execution speed, and ease of implementation. In particular, we will compare supervised to unsupervised methods, which are distinguished by their need for labeled training data.

The implementation of different ML algorithms for SMP profile classification could provide a fast and automatic grain type classification and snow layer segmentation. Based on the gained knowledge from the algorithms’ comparison, a tool can be built to provide scientists from different fields with an immediate SMP profile classification and segmentation. 


King, J., Howell, S., Brady, M., Toose, P., Derksen, C., Haas, C., & Beckers, J. (2020). Local-scale variability of snow density on Arctic sea ice. The Cryosphere, 14(12), 4323-4339,

How to cite: Kaltenborn, J., Clay, V., Macfarlane, A. R., King, J. M. L., and Schneebeli, M.: A Comparison of Machine Learning Algorithms for the Segmentation and Classification of Snow Micro Penetrometer Profiles on Arctic Sea Ice, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15637,, 2021.

Thorsten Seehaus, Kamal Gopikrishnan Nambiar, Veniamin Morgenshtern, Philipp Hochreuther, and Matthias Braun
Screening clouds, cloud shadows, and snow is a critical pre-processing step that needs to be performed before any meaningful analysis can be done on satellite image data. The state of the art 'F-Mask' algorithm, which is based on multiple pixel-level threshold tests, segments the image into clear land, cloud, cloud shadow, snow, and water classes. However, we observe that the results of this algorithm are not very accurate in polar and tundra regions. The unavailability of labeled Sentinel-2 training datasets with these classes makes the traditional supervised machine learning techniques difficult to implement. Experiments with large, noisy training data on standard deep learning classification tasks like CIFAR-10 and ImageNet have shown neural networks learn clean labels faster than noisy labels. 
We present a multi-level self-learning approach that trains a model to perform semantic segmentation on Sentinel-2 L1C images. We use a large dataset with labels annotated using the F-mask algorithm for the training, and a small human-labeled dataset for validation. The validation dataset contains numerous examples where the F-mask classification would have given incorrect labels. At the first step, a deep neural network with a modified U-Net architecture is trained using a dataset automatically labeled with the F-mask algorithm. The performance on the validation dataset is used to select the best model from the step, which would then be used to generate more training labels from previously unseen data. In each of the subsequent steps, a new model is trained using the labels generated using the model from the previous step. The amount of data used for training increases with each step and the application of techniques like data augmentation and dropout improves the generalization of the trained model. We show that the final model from our approach can outperform its teacher, i.e. F-Mask algorithm. 

How to cite: Seehaus, T., Gopikrishnan Nambiar, K., Morgenshtern, V., Hochreuther, P., and Braun, M.: Deep learning based F-Mask alternative for Sentinel-2 images in polar regions, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15914,, 2021.

Tom R. Andersson, J. Scott Hosking, Eleanor Krige, Maria Pérez-Ortiz, Brooks Paige, Andrew Elliott, Chris Russell, Stephen Law, Daniel C. Jones, Jeremy Wilkinson, Tony Phillips, Steffen Tietsche, Beena Balan Sarojini, Ed Blanchard-Wrigglesworth, Yevgeny Aksenov, and Rod Downie

Arctic sea ice forecasting is a major scientific effort with fundamental challenges at play. To address such challenges, we have developed a physics-informed, data-driven sea ice forecasting system, IceNet, which outperformed a leading dynamical model (ECMWF SEAS5) in monthly-averaged forecasts of pan-Arctic sea ice concentration. IceNet adopted a U-Net deep learning architecture and was trained on over 2,000 years of CMIP6 climate simulation data. Despite its state-of-the-art seasonal forecasting skill at lead times of 2-6 months, IceNet has two main limitations. First, it could not outperform the dynamical model in short-range (1-month) forecasts. This is partly caused by IceNet operating on monthly-averages, which smears the initial conditions and weather phenomena that can dominate predictability at short time scales. Second, IceNet is afflicted by the ‘spring predictability barrier’ that affects all long range forecasts of summer. This predictability barrier arises primarily due to the importance of melt-season ice thickness conditions on summer sea ice. Here we present our early findings from IceNet2, which attempts to alleviate these issues by operating on daily-averages and including sea ice thickness as an input variable. IceNet2 paves the way for our efforts to aid the Arctic conservation community by developing the first public, operational sea ice forecasting AI.

How to cite: Andersson, T. R., Hosking, J. S., Krige, E., Pérez-Ortiz, M., Paige, B., Elliott, A., Russell, C., Law, S., Jones, D. C., Wilkinson, J., Phillips, T., Tietsche, S., Sarojini, B. B., Blanchard-Wrigglesworth, E., Aksenov, Y., and Downie, R.: A daily to seasonal Arctic sea ice forecasting AI, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15981,, 2021.