- European Centre for Medium-Range Weather Forecasts, Forecast, Reading, United Kingdom of Great Britain – England, Scotland, Wales (fredrik.wetterhall@ecmwf.int)
Machine learning offers a vast range of applications, including weather and hazard forecasting. The ability of these methods to more easily and efficiently extract information from diverse and novel data types enables the transition ty. This study demonstrates the feasibility of this transition using an operational forecasting system. Data on human and natural ignitions were integrated along with observed fire activity. This enabled the data-driven models to reduce the persistent overprediction of fire danger in fuel-limited biomes. This resulted in fewer false alarms and more informative outputs compared with traditional methods.
A key factor driving this improvement has been the availability of global datasets for fuel dynamics and fire detection These datasets were not accessible during the development of earlier physics-based models. Three models with increasing complexity (random forest, XGBoost and neural networks) were used in a set of ablation experiments to evaluate the importance of data compared to the complexity of machine learning (ML) architecture , progressively incorporating additional data sources during model training. Combining all data sources yields the best fire activity predictions, both globally and regionally. From this ideal scenario, prediction skill degrades by roughly 30% when using only weather or ignition data and 15% with only fuel data (for the XGboost). Similar decreases are obtained also with the other ML architectures. Fuel data is especially important, as it captures the effects of weather on vegetation. Using any two out of the three data sources improves prediction quality, reducing the degradation to between 17% and 13% relative to only using one source.
We found that the enhanced predictive skill of ML models stems largely from the comprehensive characterization of fire processes provided by these datasets, rather than from the complexity of the ML methods themselves. Our findings highlight the critical importance of high-quality training data in improving forecast accuracy. While the rapid advancement of ML techniques generates good and feasible results, there is a risk of undervaluing the essential role of data acquisition and, where necessary, its creation through physical modeling. Our results underscore that investing in robust datasets is indispensable and should not be overlooked in the pursuit of complex algorithms.
How to cite: Wetterhall, F., Di Giuseppe, F., McNorton, J., and Lombardi, A.: Global data-driven prediction of fire activity requires good quality data, EMS Annual Meeting 2025, Ljubljana, Slovenia, 7–12 Sep 2025, EMS2025-687, https://doi.org/10.5194/ems2025-687, 2025.