DChip/Example Method Description

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Example method description using dChip

Array normalization, expression value calculation and clustering analysis were performed using DNA-Chip Analyzer (www.dchip.org; Li & Wong 2001a). The Invariant Set Normalization method (Li & Wong 2001b) was used to normalize arrays at probe cell level to make them comparable, and the model-based method (Li & Wong 2001b) was used for probe-selection and computing expression values. These expression levels were attached with standard errors as measurement accuracy, which were subsequently used to compute 90% confidence intervals of fold changes in two-sample or two-group comparisons (Li & Wong 2001b). The lower confidence bounds of fold changes were conservative estimate of the real fold changes. Genes with increased or decreased expression after treatments by more than 2 fold (lower confidence bound) were selected for further study.

Hierarchical clustering analysis (Eisen et al. 1998) is used to group genes with same expression pattern. A genes is selected for clustering if (1) its expression values in the 20 samples has coefficient of variation (standard deviation / mean) between 0.5 to 10 (2) it is called “Present” by MAS5 (or GCOS or dChip) software in more than 5 samples. Then the expression values for a gene across the 20 samples are standardized to have mean 0 and standard deviation 1 by linear transformation, and the distance between two genes is defined as 1 - r where r is the standard correlation coefficient between the 20 standardize values of two genes. Two genes with the closest distance are first merged into a super-gene and connected by branches with length representing their distance, and are deleted for future merging. The expression level of the newly formed super-gene is the average of standardized expression levels of the two genes (average-linkage) for each sample. Then the next pair of genes (super-genes) with the smallest distance are chosen to merge and the process is repeated until all genes are merged into one cluster. The dendrogram in Figure ? illustrates the final clustering tree, where genes close to each other have high similarity in their standardized expression values across the 20 samples.