Deep neural networks for total organic carbon prediction and data-driven sampling
- GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany (egonzalez@geomar.de)
Over the past decade deep learning has been used to solve a wide array of regression and classification tasks. Compared to classical machine learning approaches (k-Nearest Neighbours, Random Forests,… ) deep learning algorithms excel at learning complex, non-linear internal representations in part due to the highly over-parametrised nature of their underling models; thus, this advantage often comes at the cost of interpretability. In this work we used deep neural network to construct global total organic carbon (TOC) seafloor concentration map. Implementing Softmax distributions on implicitly continuous data (regression tasks) we were able to obtain probability distributions to asses prediction reliability. A variation of Dropout called Monte Carlo Dropout is also used during the inference step providing a tool to model prediction uncertainties. We used these techniques to create a model information map which is a key element to develop new data-driven sampling strategies for data acquisition.
How to cite: González Ávalos, E. and Burwicz, E.: Deep neural networks for total organic carbon prediction and data-driven sampling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22587, https://doi.org/10.5194/egusphere-egu2020-22587, 2020
Comments on the display
AC: Author Comment | CC: Community Comment | Report abuse
Hola Everardo, looks like a nice presentation I am looking forward to seeing you chat about it tomorrow.
Hi Everardo, I really enjoyed going through your presentation.
Have you considered to benchmark you uncertainty quantification against those given by "classical" methods? We did that for random forest, which had given us some unexpected results.
Fouedjio, F., & Klump, J. (2019). Exploring prediction uncertainty of spatial data in geostatistical and machine learning approaches. Environmental Earth Sciences, 78(1), 38. https://doi.org/10.1007/s12665-018-8032-z
Hello Jens,
Random Forests and MonteCarlo Dropout inference do share a lot in common. It is certainly a worthwhile benchmark we could implement once our model performance is satisfactory. Do you have any experience using RF with a large number of inputs (~500 in our case)?
In the study cited above, we used ~600 locations. We then compared RF against Kriging, mainly because Kriging is well understood in mineral exploration and can be seen as a benchmark.