How to perform cluster analysis and select the number of clusters in R language?

1 year ago

Olivia Parker

1 minute

In R language, you can perform cluster analysis and select the number of clusters using some common packages, such as cluster and factoextra. Here is a simple example code demonstrating how to conduct cluster analysis and select the number of clusters.

# 导入数据
data <- read.csv("data.csv")

# 进行聚类分析
cluster <- kmeans(data, centers = 3)  # 选择簇数为3

# 可视化聚类结果
library(factoextra)
fviz_cluster(cluster, data = data)

# 选择最优簇数
library(NbClust)
nb <- NbClust(data, distance = "euclidean", min.nc = 2, max.nc = 10, method = "kmeans")
print(nb)

In the code above, data is first imported, then kmeans function is used for cluster analysis with a cluster number of 3 selected. Next, the fviz_cluster function from the factoextra package is used to visualize the clustering results. Finally, the NbClust package is used for cluster number selection by setting the minimum and maximum cluster number range with min.nc and max.nc parameters, and specifying the clustering method with the method parameter, choosing the kmeans method here. The optimal cluster number can then be printed and selected.