ICUC12-169, updated on 21 May 2025
https://doi.org/10.5194/icuc12-169
12th International Conference on Urban Climate
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
A Hybrid Machine Learning Framework for Identifying Representative Climate Days in Bimodal Weather Distributions
Naga Venkata Sai Kumar Manapragada and Jonathan Natanian
Naga Venkata Sai Kumar Manapragada and Jonathan Natanian
  • Technion Israel Institute of Technology, Faculty of Architecture and Town Planning, The Environmental Performance and Design Lab (EPDL), Technion City, Haifa 3200003, Israel

Representative climate days are specific days that reflect typical weather conditions for a given location. Representative climate days are essential for reducing the computational demands of urban building energy and microclimate models, which typically require annual simulations. Traditionally, to identify representative climate days from typical meteorological year (TMY) weather files, unsupervised Machine Learning (ML) methods are employed. Traditional ML methods, though effective for high-dimensional data, struggle with bimodal weather distributions, where two distinct climatic regimes occur within the same period. This study introduces a novel hybrid ML framework for accurately clustering bimodal weather data through a multiphase unsupervised clustering process.

The framework starts by applying principal component analysis (PCA) to TMY data for reducing dimensionality. Next, misclustered days—identified using the silhouette score—undergo iterative re-clustering using k-Means and Gaussian Mixture Models (GMM) until fewer than 30 remain. Finally, representative climate days are determined from the properly clustered groups using a medoid-based weighted sampling approach. The potential of this hybrid framework is demonstrated using TMY data of the Tel Aviv, comprising of bimodal distribution. The k-Means, GMM, and hierarchical agglomerative clustering achieved higher silhouette scores through multiphase clustering over traditional approach. While k-Means-based multiphase clustering achieved the highest silhouette score, GMM demonstrated superior clustering performance by preserving month-to-month continuity, which is crucial for capturing seasonal variations. By maintaining seasonal continuity in representative days, this approach enhances the reliability of climate-based urban performance simulations, supporting more accurate and computationally efficient modelling.

How to cite: Manapragada, N. V. S. K. and Natanian, J.: A Hybrid Machine Learning Framework for Identifying Representative Climate Days in Bimodal Weather Distributions, 12th International Conference on Urban Climate, Rotterdam, The Netherlands, 7–11 Jul 2025, ICUC12-169, https://doi.org/10.5194/icuc12-169, 2025.

Supporters & sponsors