- 1Agroecosystem Sustainability Center, Institute for Sustainability, Energy, and Environment, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- 2National Center for Supercomputing Applications, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- 3Department of Natural Resources and Environmental Sciences, College of Agricultural, Consumer and Environmental Sciences, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- 4Land-CRAFT, Department of Agroecology, Aarhus University, 8000, Aarhus C, Denmark
Accurate large-scale crop yield estimation is increasingly critical for agricultural management and understanding the dynamics of food security under climate change. The complex nature of crop growth, influenced by multiple environmental factors across temporal scales, requires advanced approaches for yield prediction. While recent advances in remote sensing provide diverse data sources for enhanced crop monitoring capabilities, effectively integrating heterogeneous data sources at large scales remains challenging for accurate yield prediction. In this study, we developed a temporal multi-modal fusion framework for soft wheat yield prediction at the sub-national level across the European Union from 2001 to 2019. Our framework integrated time-series data from optical remote sensing observations, climate data, and vegetation productivity indicators, along with static soil properties. A Transformer encoder was used to extract temporal patterns of crop growth, and the temporal features were fused with soil features to capture spatial patterns for large-scale wheat yield prediction. The proposed framework achieved much better performance (RMSE = 0.75 t·ha-1) compared with benchmark models including LSTM (RMSE = 0.82 t·ha-1) and Random Forest (RMSE = 1.09 t·ha-1). The study indicates that late fusion strategies are more effective in preserving modality-specific temporal patterns, enhancing the accuracy by 5.9% (RMSE) compared to early fusion. Ablation studies reveal the incremental benefits of multi-modal data integration, with soil properties notably improving prediction performance by 15.0-23.9% (RMSE). Feature importance analysis through explainable machine learning indicates that remote-sensing-related variables contribute more significantly to yield prediction than climatic variables. The novel multi-modal fusion framework developed in this study for large-scale crop yield prediction provides insights into understanding crop-environment relationships in wheat yield formation.
How to cite: Lin, Z., Guan, K., and Wang, S.: Temporal Multi-modal Fusion Framework for Predicting Wheat Yield across the EU from Multi-source Satellite and Environmental Data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-7753, https://doi.org/10.5194/egusphere-egu25-7753, 2025.