- 1Seminar for Statistics, ETH Zurich, Zurich, Switzerland
- 2RISIS-GSEM, Université de Genève, Geneva, Switzerland
- 3Institute of Statistics, Karlsruhe Institute of Technology, Karlsruhe, Germany
Receiver Operating Characteristic (ROC) and Precision–Recall (PR) curves are widely used to assess the discrimination ability of forecasts for binary events, such as threshold exceedances or warnings of extreme events. In weather forecasting, forecasts are provided as spatial fields, yielding location-wise ROC and PR curves that are often aggregated to facilitate comparison, although the effect of the aggregation strategy on performance assessment remains poorly understood.
We investigate how different aggregation strategies for ROC and PR curves affect the assessment of discrimination ability. In particular, we identify conditions under which aggregation strategies satisfy two desirable properties for fair comparison: preservation of dominance between forecasts and preservation of concavity of the curves. We review commonly used aggregation approaches from the literature, analyze their theoretical properties, and highlight potential pitfalls that may lead to misleading interpretations. Based on these findings, we provide practical guidelines for the interpretation of aggregated ROC and PR curves. The proposed framework is illustrated using AI-based global weather forecasts, showing how different aggregation strategies can lead to different rankings.
How to cite: Pic, R., Zhang, Z., Ziegel, J., and Engelke, S.: Spatial aggregation of ROC and PR curves, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2503, https://doi.org/10.5194/egusphere-egu26-2503, 2026.