An assessment of the quality of physics analysis and forecast products of the European regional seas in 2021 using K-means clustering algorithm
- Department of Marine Systems, Tallinn University of Technology, Tallinn, Estonia (urmas.raudsepp@taltech.ee)
A new approach for the assessment of the quality of physics analysis and forecast products (model hereafter) of the Copernicus Marine Service (CMEMS) is proposed. The method is based on the machine learning K-means clustering algorithm. The main goal of the method is to perform clustering of the bivariate model sea water temperature and salinity errors. The model errors are defined by subtracting a measured value from the corresponding model value. We use the data from in situ near real-time observations of CMEMS. Sea water temperature and salinity products are evaluated with simultaneously measured temperature and salinity data by forming a two-dimensional error space (model minus measurements) and performing a clustering procedure in it. This method enables to consider all available measurements and assigns a quantitative quality measure to each spatial location and time instant where and when the measurements exist.
The quality assessment of physics analysis and forecast products is performed for the Baltic Sea, the Atlantic - European North West Shelf, the Atlantic - Iberian Biscay Irish region, the Mediterranean Sea and the Black Sea for the year 2021. For each regional sea, there are about 100 000 to 1 000 000 simultaneous temperature and salinity data pairs available for comparison. K-means clustering of model errors was done using five clusters for each region.
An error cluster of good quality of the model (location of dominant centroid with temperature and salinity bias close to zero) made up about 50% for the Baltic Sea, 65% for the Atlantic - European North West Shelf, 70% for Mediterranean Sea and Atlantic Iberian Biscay Irish region and 90% for the Black Sea of all comparison data pairs. We would like to note that shallow coastal areas were poorly covered by measurement data, which disabled assessment of model quality there. In the Baltic Sea, spatial distribution of model errors showed that simulated temperature and salinity fields in the Gulf of Finland had lower quality than in the rest of the Baltic Sea sub basins. In the Gulf of Finland, a significant share of model errors belonged to two clusters with overestimated salinity and temperature (dS=1.8, dT=2.0 °C and dS=0.5, dT=0.8 °C). In the Atlantic - European North West Shelf and in the Atlantic - Iberian Biscay Irish region, temperature and salinity were underestimated (dT=-2.7 °C, dS=-0.3 and dT=-1.8 °C, dS=-0.2, respectively) between a depth of 1000 m and 1300 m. In the Atlantic - Iberian Biscay Irish region, the Mediterranean Sea and the Black Sea, a separate cluster emerged in each region, which indicated a severe mismatch of the model and the measured data. A good quality of physics analysis and forecast products of the CMEMS is achieved using data assimilation of measured salinity and temperature profiles, which overlap with the data used in this assessment study.
How to cite: Raudsepp, U., Maljutenko, I., Verjovkina, S., and Lagemaa, P.: An assessment of the quality of physics analysis and forecast products of the European regional seas in 2021 using K-means clustering algorithm, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8218, https://doi.org/10.5194/egusphere-egu22-8218, 2022.