R Programming/Clustering
From Wikibooks, open books for an open world
|
|
This section is a stub. You can help Wikibooks by expanding it. |
Contents |
Basic clustering [edit]
K-Means Clustering [edit]
You can use the kmeans() function.
First create some data:
> dat <- matrix(rnorm(100), nrow=100, ncol=10)
To apply kmeans(), you need to specify the number of clusters:
> cl <- kmeans(dat, 3) # here 3 is the number of clusters > table(cl$cluster) 1 2 3 38 44 18
Hierarchical Clustering [edit]
The basic hierarchical clustering function is hclust(), which works on a dissimilarity structure as produced by the dist() function:
> hc <- hclust(dist(dat)) # dat matrix from the example above > plot(hc)
The resulting tree can be cut using the cutree() function.
Cutting it at a given height:
> cl <- cutree(hc, h=5.1) > table(cl) cl 1 2 3 4 5 23 33 29 4 11
Cutting it to obtain given number of clusters:
> cl <- cutree(hc, k=5) > table(cl) cl 1 2 3 4 5 23 33 29 4 11
Available alernatives [edit]
- See packages class, amap and cluster
- See The R bioinformatic page on clustering