Data Mining Algorithms In R/Packages/RWeka/Weka filters

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Description[edit | edit source]

R interfaces to Weka filters.

Usage[edit | edit source]

Normalize(formula, data, subset, na.action, control = NULL)

Discretize(formula, data, subset, na.action, control = NULL)

Arguments[edit | edit source]

formula, a symbolic description of a model. Note that for unsupervised filters the response can be omitted.

data, an optional data frame containing the variables in the model.

subset, an optional vector specifying a subset of observations to be used in the fitting process.

na.action, a function which indicates what should happen when the data contain NAs.

control, an object of class Weka_control, or a character vector of control options, or NULL (default).

Details[edit | edit source]

Normalize implements an unsupervised filter that normalizes all instances of a dataset to have a given norm. Only numeric values are considered, and the class attribute is ignored.

Discretize implements a supervised instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization is by Fayyad & Irani’s MDL method (the default).

Note that these methods ignore nominal attributes, i.e., variables of class factor.

Value[edit | edit source]

A data frame

Example[edit | edit source]

   w <- read.arff(system.file("arff","weather.arff", package = "RWeka"))
   m1 <- Normalize(~., data = w)
   m2 <- Discretize(play ~., data = w)