R project for cluster analysis of data on the dynamics of the covid-19 epidemic with k-mean and hierarchical clustering usind DTW-distances.
You can download data via csv by the link with needed countries.
- For data smoothing was used Nadaraya–Watson estimator
- For data scaling was used min-max method
After preparation data looks like:
For distances between countries was used Dynamic Time Warping (DTW) approach. It's better approach for time series comparing to classical methods because it's capable of finding disease peaks that can be shifted relative to each other and calculating distance between them not between relative pairs Xi and Yi as classical methods do. So it gives us more accurate distances between countries.
Example of calculated DTW-distances between two time series (countries):
- Multidimensional scaling + knn
- Hierarchical clustering with finding optimal linkage method and optimal number of clusters using gap statistics
Comparison between used clustering methods (mds + knn and hclust) using Rand Index and cintigency table.
Example of hierarchical clustering results researching 25 European countries (used service www.mapchart.net/europe.html):