Incorporating Training Data Uncertainty in Machine Learning Models for Satellite Imagery

Hamed Alemohammad

doi:https://doi.org/10.5194/egusphere-egu23-10528

[Back] [Session ITS2.1/NP0.4]

EGU23-10528, updated on 13 Mar 2023

https://doi.org/10.5194/egusphere-egu23-10528

EGU General Assembly 2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Incorporating Training Data Uncertainty in Machine Learning Models for Satellite Imagery

Hamed Alemohammad

Clark University, United States of America (halemohammad@clarku.edu)

Supervised machine learning (ML) models rely on labels in the training data to learn the patterns of interest. In Earth science applications, these labels are usually collected by humans either as labels annotated on imagery (such as land cover class) or as in situ measurements (such as soil moisture). Both annotations and in situ measurements contain uncertainties resulting from factors such as class misinterpretation and device error. These training data uncertainties propagate through the ML model training and result in uncertainties in the model outputs. Therefore, it is essential to quantify these uncertainties and incorporate them in the model [1].

In this research, we will present results of inputting semantic segmentation label uncertainties into the model training and show that it improves model performance. The experiment is run using the LandCoverNet training dataset which contains global land cover labels based on time-series of Sentinel-2 multispectral imagery [2]. These labels are human annotations derived using a consensus algorithm based on the input labels from three independent annotators. The training dataset contains the consensus label and consensus score, and we treat the latter as a measure of uncertainty for each labeled pixel in the data. Our model architecture is a Convolutional Neural Network (CNN) trained on a subset of LandCoverNet with the rest of the dataset used for validation. We compare the results of this experiment with the same model trained on the dataset without the uncertainty information and show the improvement in the accuracy of the model.

[1] Elmes, A., Alemohammad, H., Avery, R., Caylor, K., Eastman, J., Fishgold, L., Friedl, M., Jain, M., Kohli, D., Laso Bayas, J., Lunga, D., McCarty, J., Pontius, R., Reinmann, A., Rogan, J., Song, L., Stoynova, H., Ye, S., Yi, Z.-F., Estes, L. (2020). Accounting for Training Data Error in Machine Learning Applied to Earth Observations. Remote Sensing, 12(6), 1034. https://doi.org/10.3390/rs12061034

[2] Alemohammad, H., Booth, K. (2020). LandCoverNet: A global benchmark land cover classification training dataset. NeurIPS 2020 Workshop on AI for Earth Sciences. http://arxiv.org/abs/2012.03111

How to cite: Alemohammad, H.: Incorporating Training Data Uncertainty in Machine Learning Models for Satellite Imagery, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-10528, https://doi.org/10.5194/egusphere-egu23-10528, 2023.