- 1Embrapa Agricultura Digital, Brazil (vinicius.melicio@colaborador.embrapa.br)
- 2Universidade Federal do ABC (vinicius.melicio@ufabc.edu.br)
Limited data and high sampling costs challenge soil carbon modeling. While previous generative AI methods, such as Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs), are commonly used, this study benchmarks Flow Matching's effectiveness for modeling complex soil data distributions. We introduce an Unconditional Flow Matching framework using the LUCAS soil dataset. Our procedures encompass: (a) training models without labels; (b) generating synthetic data, and (c) applying identical clustering protocols to the datasets generated in (a) and (b). Model performance is assessed through statistical divergence and cluster consistency between observed and synthetic data distributions. The goal is to determine if Flow Matching provides a more robust and accurate method for generating realistic soil carbon datasets.
How to cite: do Carmo Melicio, V., Mourão, V. H. M., Barioni, L. G., and Gois, J. P.: Unsupervised Manifold Learning: Validating Unconditional Flow Matching for Soil Carbon Data Topology, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21845, https://doi.org/10.5194/egusphere-egu26-21845, 2026.