Data Mining Algorithms In R/Packages/RWeka/Weka clusterers

From Wikibooks, open books for an open world
Jump to navigation Jump to search


R interfaces to Weka clustering algorithms.


Cobweb(x, control = NULL)

FarthestFirst(x, control = NULL)

SimpleKMeans(x, control = NULL)

XMeans(x, control = NULL)

DBScan(x, control = NULL)


x, an R object with the data to be clustered.

control, an object of class Weka_control, or a character vector of control options, or NULL (default).


There is a predict method for predicting class ids or memberships from the fitted clusterers.

Cobweb implements the Cobweb (Fisher, 1987) and Classit (Gennari et al., 1989) clustering algorithms.

FarthestFirst provides the “farthest first traversal algorithm” by Hochbaum and Shmoys, which works as a fast simple approximate clusterer modeled after simple k-means.

SimpleKMeans provides clustering with the k-means algorithm.

XMeans provides k-means extended by an “Improve-Structure part” and automatically determines the number of clusters.

DBScan provides the “density-based clustering algorithm” by Ester, Kriegel, Sander, and Xu. Note that noise points are assigned to NA.


A list inheriting from class Weka_clusterers with components including:

clusterer, a reference (of class jobjRef) to a Java object obtained by applying the Weka buildClusterer method to the training instances using the given control options.

class_ids, a vector of integers indicating the class to which each training instance is allocated (the results of calling theWeka clusterInstance method for the built clusterer and each instance).


   cl1 <- SimpleKMeans(iris[, -5], Weka_control(N = 3))
   table(predict(cl1), iris$Species)
   cl2 <- XMeans(iris[, -5],
   c("-L", 3, "-H", 7, "-use-kdtree", "-K", "weka.core.neighboursearch.KDTree -P"))
   table(predict(cl2), iris$Species)