Designing microarray experiments
To minimize experimental variation, it is desirable to have the same person perform all the experiments in the same microarray core facility. However if an experiment involves many samples we have to do them on different days. Arrays generated at different days may have “batch difference”, since different reagents are used to amplify and label the samples. This may be detected by unsupervised clustering. Although “Array list file” can be used to alleviate the situation, it is better to consider such effects before doing experiment. For example, if an experiment compares two conditions with multiple samples in each condition, it is less desirable to have all samples of condition A amplified and hybridized in one day, and all samples of condition B done in another day. In such case even if batch effects happen, we cannot tell since they are mixed with real biological variations that we are interested in. Thus it is more reasonable to consider a balanced design where samples of all conditions are randomly distributed into different sample amplification and hybridization days.
Another way to reduce the experimental variation is to have replicate samples. If variations are introduced in an unbiased manner in the experimental or analysis steps after the replication point, averaging the final expression values can better estimate the “expression level” at the replication point (variance of the average is inversely proportional to the number of replicates). Replicates can be done from early point to late point following this rough scale: different individuals (cell line strains), independently grown cell lines (pure strain animals), different tissue sample from the same individual, split tissue samples, split mRNA, split IVT, and scanning one array multiple times. We have observed that replicates at split-IVT level usually agree well in terms of expression values and cluster very tightly. Therefore such replications may not help us to better estimate and reduce the variation introduced before the IVT-splitting point.
The practical choice of the point of replication should suit experimental purpose. For example, when an investigator only has very small amount of RNA, the choices are using double-round amplification (which may have 5' bias) or pooling RNA from different animals in the same litter. Pooling more samples is good as long as the gene expression variation among the pooled animals is expected to be smaller than the gene expression difference among the studied biological conditions, but this choice may be more expensive. Even when the sample amount is enough, the choice exists between whether to process the sample of each animal and hybridize it to a different array, or pool the sample of different animals first and then split the pooled samples into several aliquots, and do IVT and hybridization separately into replicate arrays. This situation is more complex and the answer is probably dependent on specific experiment (the number of replicates, what point to replicate and pool, and the variation of genes at these points).