Advancing Crop Yield Predictions: The Potential of Diffusion Models in Machine Learning for Agriculture

Amit Kumar Srivastava; Krishnagopal Halder; Yue Shi; Liangxiu Han; Radwa EI Shawi; Jan Timko; Wenzhi Zheng; Gang Zhao; Karam Alsafadi; Manmeet Singh; Dominik Behrend; Thomas Gaiser; Frank Ewert

doi:https://doi.org/10.5194/egusphere-egu25-16009

[Back] [Session HS4.10]

EGU25-16009, updated on 15 Mar 2025

https://doi.org/10.5194/egusphere-egu25-16009

EGU General Assembly 2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Advancing Crop Yield Predictions: The Potential of Diffusion Models in Machine Learning for Agriculture

Amit Kumar Srivastava

^1,8, Krishnagopal Halder¹, Yue Shi², Liangxiu Han², Radwa EI Shawi

³, Jan Timko³, Wenzhi Zheng

⁴, Gang Zhao⁵, Karam Alsafadi⁶, Manmeet Singh⁷, Dominik Behrend

⁸, Thomas Gaiser⁸, and Frank Ewert^1,8

Amit Kumar Srivastava et al.

¹Leibniz Centre for Agricultural Landscape Research (ZALF), Data Analysis and Simulation - Multiscale Modelling and Forecasting, Muencheberg, Germany (amitkumar.srivastava@zalf.de)
²Department of Computing, and Mathematics, Faculty of Science and Engineering, Manchester Metropolitan University John Dalton Building, Chester Street, Manchester, M1 5GD, UK
³Data Systems Group Institute of Computer Science University of Tartu Narva mnt 18, Tartu 51008, Estonia
⁴College of Agricultural Science and Engineering, Hohai University NO 8, West Focheng Road, Nanjing, Jiangsu Province, China
⁵College of Soil and Water Conservation Science and Engineering, North A&F University, Yangling, Shaanxi, China
⁶College of the Environment and Ecology, Xiamen, Fujian 361102, China
⁷Department of Earth and Planetary Sciences, Jackson School of Geosciences, Austin, USA
⁸Institute of Crop Science and Resource Conservation, Katzenburgweg 5, 53115 University of Bonn, Germany

The dual challenges of climate change and a growing population exceeding 9 billion by 2030 necessitate precise regional crop yield prediction models to optimize management, ensure food security, and guide agricultural decisions. Machine learning (ML), leveraging big data and high-performance computing, provides powerful tools for addressing these complexities but faces challenges such as inconsistent data quality and variable algorithm performance. While ML algorithms like Convolutional Neural Networks (CNNs), Random Forests (RF), and Long Short-Term Memory (LSTM) networks show promise in crop yield prediction, their performance can be hindered by data noise and incompleteness. Diffusion (a probabilistic generative model), with its iterative denoising capabilities, offers resilience to these issues and holds significant potential to improve accuracy and reliability in crop forecasting, though their use in this domain remains largely untapped.

This study compared XGBoost (XGB), a state-of-the-art tree-based ML model, with our proposed Diffusion-reg (DR) model. The input data for the models was compiled from multiple sources, including crop calendar data from MIRCA2000, net primary production (NPP) data from WAPOR, soil data from the Soil-Grids database, and maize crop yield data from the FAO database. Climate variables such as precipitation, air temperature, and solar radiation were obtained from ERA5, with all data aggregated into decadal periods. Additionally, Leaf Area Index (LAI) and Normalized Difference Vegetation Index (NDVI) data from MODIS were collected at 16-day intervals. In the subsequent step, maize yield data at the country level from the FAO was spatially disaggregated to produce pixel-scale estimates (250 m resolution, aligned with the soil input data resolution). This process focused exclusively on cropland areas within the five major maize-producing countries in Sub-Saharan Africa.

The evaluation of model performance metrics highlights the consistent superiority of the DR model over XGB across all analyzed countries. The R² values, which measure the proportion of variance explained by the models, indicate higher predictive accuracy for Diffusion-reg in every instance. For example, in Ethiopia, the DR achieves an almost perfect R² of 0.98 compared to XGB’s 0.95, while the largest gap is observed in South Africa, with R² values of 0.86 for DR and 0.76 for XGB. These results highlight the DR model’s ability to effectively capture complex data patterns, even in regions with higher predictive challenges.
Further, the RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error) metrics reinforce the DR model’s superior predictive precision. Across all countries, DR consistently exhibits lower error values, with Ethiopia showing the best performance (RMSE: 0.02, MAE: 0.01). Although South Africa records the highest RMSE (0.25) and MAE (0.13) for the DR model, these metrics still significantly outperform those of XGB. Similar trends in Uganda and Mozambique, where the DR model achieves substantial reductions in error, further validate its robustness and reliability.
In summary, the DR model consistently outperforms XGBoost in diverse regional contexts, highlighting its potential for broader application in predictive tasks requiring high accuracy and resilience.

How to cite: Srivastava, A. K., Halder, K., Shi, Y., Han, L., EI Shawi, R., Timko, J., Zheng, W., Zhao, G., Alsafadi, K., Singh, M., Behrend, D., Gaiser, T., and Ewert, F.: Advancing Crop Yield Predictions: The Potential of Diffusion Models in Machine Learning for Agriculture, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16009, https://doi.org/10.5194/egusphere-egu25-16009, 2025.