Recipes for the Design of Experiments/Chapter 0: Preliminaries
Chapter 0: Preliminaries
0.1 Introduction to the Design of Experiments (Yage D, Felipe O)
Every day when you make an observation or draw a conclusion you are getting information from a system. This can be anything from studying the effects of a the chemical composition of a drug to watching an old woman feeding bread crumbs to birds at the park. If there is greater interest you may continue to observe, and take note of differences and what causes them. For example you may notice that a certain ratio of chemicals in the drug has a negative effect. Or that the amount of birds feeding decreases as winter approaches. In both of these cases you have observed a change in the system and attributed a cause to it; now you have a better understanding of the system. The more you understand about what affects the system the better you understand the system itself.
When you are interested in a particular effect of a system you may run an experiment. In its simplest form and experiment is changing some aspect of the system and recording the effect it had on it. As you learn more about these cause and effect relationships you can begin to make theories and hypothesis and design experiment to get accurate results from your tests. Herein lies the importance of proper experimentation. So that we can get valid results and draw the correct conclusions, the experiment must be designed with a specific goal in mind. The goal of an experiment is to find the effect an independent variable has on a dependent variable, or in other words find cause and effect relationships and their magnitudes.
We have all been taught the basics of experimentation in the form of the scientific method. That is to say you have a hypothesis which becomes a binary condition for the experiment. You either prove that the hypothesis is true, or you are unable to validate it. In either case information is gained from the system. Classically one of the most common types of experimental methods is OFAT (one factor at a time), in simple terms it means change one variable and measure the effect it had. Historically this method was prevalent, and was used by many notable researchers such as Edison. Then during WWI a new method of analysis was developed by Ronald Fisher. This method is known as ANOVA (Analysis of Variance), in short it is a powerful method that can be used to analyze the differences in group means. This tool allows for greater flexibility in experiments, due to giving the ability to test various null hypothesis at once. This truly opened a new chapter in how experiments were designed, as well as the analysis of results. Shortly after Fisher emerged George Box and then followed Taguchi. With Taguchi experimental design, depending on the number of factors and levels the amount of necessary experimental runs can be significantly reduced. The tools and methods developed for experimentation will be further discussed in the following chapters.
In general there are three methods of experimentation. Optimization (find the best arrangement of independent factors for the desired dependent factor), sensitivity analysis (find the magnitude of effect a factor(s) has on the dependent variable), and surrogate modeling (when the desired factor cannot be easily measured). In this book we will be exploring the proper methods for statistical experimental design for testing hypotheses. Examples will be provided using R the open source statistical computing and graphics software, such that readers can use them in their own work.
[SMD: Great start! I like the listing of the 3 types of experimentation. Can you expand this introduction to talk a bit about the history of experimentation from Edison to Fisher to Box? Admittedly, you don't know about Box yet, so just leave a tag for later. I'm interested in seeing what you have to say about Edison and Fisher, with perhaps some input from your readings.]
There are 3 components to an experiment: factors, levels, and responses. These are terms that we might use in everyday life, however, in experimental design, they have their specific meanings. As defined in the New Oxford American Dictionary, "factor" is the " a circumstance, fact, or influence that contributes to a result or outcome", which is exactly what it is in an experiment: a factor changes to affect the outcome. The outcome, in an experiment, is referred to as the "response", and the changes in factor are called "levels". Therefore, as a factor changes from level to level, the experimental unit in an experiment should generate particular responses to each level of the factor. A new term appeared in the previous sentence is the "experimental unit", which is the object used to test the specific experimental conditions on. Let's consider a simple example. A farmer wanted to know how much irrigation his new crop type needed to produce the highest yields. He decided to have multiple acres of this new crop type, and give different amount of irrigation to each acre, and measure the yields of each acre once the crop is ripe. In this example, since the amount of irritation changes from acre to acre, it is the factor, and the different amounts are the levels of this factor. The response, in another word, the outcome of the experimental unit (crop type), is of course the yield. Now that we know about components of an experiment, we can start designing experiments. There are 3 basic principles of experimental design: randomization, replication, and blocking, where randomization further includes 3 areas: random selection, random assignment, and random run order. Random selection means that experimental units should be selected randomly from the population. Using our example of the farmer, if the farmer will always use crops seeds from 1 company, then the crop seeds from that company should be the population, and the experimental unit should be selected randomly from it meaning that he should select across batches instead of using seeds from the same batch as the experimental unit. Once randomly select the experimental units, we should randomly assign treatments to them. The new term "treatment" here is just another name for the levels of the factor. After random treatment assignment, we can finally run the experiment, of course, in a random order. Through randomization, we minimize systematic bias, and ensure the validity of the inference we may draw between the factor and the response. The second basic principle is replication, which is the process of applying the same treatment to multiple samples. In our yield-irrigation example, if the farmer gives the same amount of irrigation to multiple acres of crop, then these acres of crop is called "replicates". Having replicates allows us to obtain a more accurate view of the observation data, because we can estimate the experimental error from the difference in replicates, and thus the variation due to error. If the variation due to error is smaller than that due to the treatment change, then we can draw a inference between the factor and the response. The third principle is blocking. In an experiment, there is the controlled factor, our factor of interest, and other factors that are not of interest, but do affect the response of experimental unit, and these factors are called "nuisance factor". These factors are often blocked in the experiment, and we achieve that by keeping experimental conditions, other that factor of interest, on each unit the same. By blocking nuisance factors, we can observe a more precise relationship between factor of interest and response free from the interference brought by nuisance factor. With the 3 basic principles ensuring accuracy and precision of the conclusion we may draw from an experiment, we design experiment also with the strategy of experimentation, which are 3 common approaches to plan an experiment: best guess approach, one factor at a time (OFAT), and factorial design. We will be discussing these in the following sections of this chapter.
0.2 Best Guess Experimental Design (Diana R, Molly R)
Perhaps the most basic approach to experimental design is the best-guess approach. This strategy is exactly what it sounds like: the experimenters make a ‘best guess’ as to what they think is the optimal combination of experimental factors. While this is an informal strategy, it is used fairly often, and is correct more often than would be expected. This is because the people conducting the experiment often make an educated guess that is grounded in their prior knowledge of the factors being examined.
However, this approach is not without its downsides. When it comes to finding an optimal solution using best guess there are two possible outcomes: either the result is acceptable, or it is not. If the result is obviously not satisfactory, where does the experimenter go from there? Their second-best guess? That could go on for quite a while, and may never result in an optimal solution. But what if the result of the best guess experiment is satisfactory? Does the experimenter stop there, or continue experimenting? There is no guarantee that the satisfactory solution is the optimal solution. The best-guess approach to experimental design is a good starting point, but is clearly not an ideal strategy for more involved experiments.
The coffee industry shows how a best-guess approach is used in a real-world application. There are different varieties of coffee that may be offered in the market (different regions) and within these varieties different types of roast. There is indeed uncertainty to which one of these combinations will provide the optimal sales. Suppose there are some regions in the Colombian Country that offer the best variety of coffee (Antioquia, Armenia, Santander, Huila, Narino and Sierra Nevada) and four different roasting types (Light, Medium, Medium Dark and Dark Roast). If the company wants to obtain the maximum sales, the most logical approach would be to try all of the different combinations and produce a batch of each type of coffee. Yet, producing and selling all of them may not be economically feasible, thus a best-guess might be the best approach. Based on the international ratings for this country’s coffee, there are particular regions in Colombia that have won first place in coffee tasting competitions all around the world. Huila is one of those regions and it is known that a medium roast is the best type of roasting suggested for this grain. Also, further market research has indicated that the organic coffee is required by many countries in Europe, Canada and some regions in United States. As the region of Sierra Nevada produces one of the best organic coffees in the region, then this could also be offered. As this grain requires a higher body, a Dark roast seems to be the best option. This best-guess approach used “maximum sales” as response variable and (i) the variety of the coffee and (ii) the roasting type as factors. Again, the best-guess approach can provide a solution to a problem, although it is better to use a more defined strategy to ensure an optimal solution is reached.</nowiki>
0.3 One Factor at a Time (OFAT) Experimental Design (Munira S, Fabiana T)
The one-factor-at-a-time (OFAT) experimental design is an experimental design where only one factor is altered in each experiment, while the other factors are held constant.
- Start with your initial values. One possibility might be starting with a best guess.
- Among the factors considered, choose a factor x.
- Run experiments with different levels of x to optimize the response variable, while holding the other factors constant.
- Set factor x at the level that optimizes the response variable.
- Choose a factor y that isn't correlated with factor x.
- Repeat this method through all factors, every time choosing a factor that isn't correlated with the one previously varied.
As an example, consider we want to optimize the yield of a cultivation by changing the type of fertilizer (A or B) and the type of vegetable (carrots or eggplants). Starting with the carrots, say we obtain a better yield with fertilizer A, so we keep fertilizer A and change the vegetable type, obtaining a yield for the eggplant which is lower than the carrot’s one. In this case, we would choose the combination carrot-fertilizer A as the better one.
Advantages - OFAT is fine in circumstances where data is easy to obtain with regard to the cost to obtain the data, the time to run the experiments or the availability of data. It would also be beneficial in situations where the goal is to improve the current system. Also, because OFAT doesn't incorporate the interaction effects between factors, it should be fine in experiments where the factors aren't correlated.
Disadvantages - In circumstances where data isn't easy to obtain it's more efficient to change multiple factors at a time, therefore, OFAT would not be a good choice. This observation is valid also when experimental runs are particularly expensive or time consuming. However, the main limitation of the OFAT method lies in its inability to incorporate the interaction effects between factors, thus it can easily fail in experiments where there are interaction effects. Also, an OFAT experiment can miss the optimal values of factors, which could be revealed instead through a factorial design experiment. For instance, considering the example of the cultivation previously described, if the combination of eggplant and fertilizer B had a higher yield than the one chosen (carrot and fertilizer A), we would have missed the best solution, which could have been revealed by a factorial design experiment.
[SMD: You have a nice example, but can you add some other factor levels so that you can show changing levels multiple times? Please think about the grocery shelf example we reviewed on Day 1 of class. Please rethink your disadvantages above. I think a real disadvantage of OFAT is the amount of time or number of experimental runs it takes to complete an experiment and even at that, you may not get to the optimum. ]
0.4 Factorial Design (Kaan U, Michael W)
Factorial Design refers to the experimental method for observing the changes of a response (dependent) variable in which all possible combinations of two or more factors are tested in unique randomized runs instead of only varying one factor at a time in a predetermined order. This method was pioneered by RA Fisher in the 1920s and significantly changed the way that experimental design was approached.
A factor is an independent categorical variable and factor’s different values are identified as levels. These could be both quantitative and qualitative, although most are simply a single word or number that correlates with a treatment or parameter involved in the experiment. Continuous variables are not often the factors, but the response variable is usually continuous.
If we compare the Factorial Design with methods explained earlier, advantages can be listed as below:
- Factorial Design allows the experimenter to ensure that they have found the global optimum of their combinations instead of a local optimum, which is the primary risk of the previously discussed OFAT or Best Guess experimental design methods.
- Another advantage over the OFAT experimental design method is that each factor is examined multiple times in each experiment, which saves resources and increases precision of the model.
- Finally, this method of design can avoid experimental results being confounded by what the researcher expects to happen.
0.4.1 Structure / Design Combinations
The total number of experimental runs or another saying treatments in a generalized (equal levels per factor) factorial experiment can be computed through the calculation n^m, where n is the number of levels in each factor, and m is the number of factors. In an experiment which does not contain equal numbers of levels per factor, the total number of runs is given by the equation n*m*k*p… where each variable is the number of levels in its respective factor.
There are two types of factorial designs depending on the coverage of combinations :
- If the experiment considers all of the possible combinations of all factor levels, it is defined as full factorial design.
- However due to high costs or a very high number of combinations only a fraction of these can be tested, and it is defined as fractional factorial design.
0.4.2 How to interpret the results of the experiment ?
The results of a factorial design can be used in order to compute both the main effects (me) of factors on the response variable and the result of multiple factors working together to produce an effect known as an interaction effect. Main effect of one factor can be calculated as the mean differences between all levels of that factor. Interaction effects are calculated by averaging across the factors (average results diagonally across the factor, from Factor A level 1 to Factor B level 2, or Factor A level 2 to factor B level 1.
Main effects and interaction effects are independent from each other (orthogonal) That means in some cases there might be no main effect, but some interaction effects or alternatively, a main effect may exist, but no interactions will be present. Interactions can occur between all variables, so in a factorial design with factors A - B - C, the possible interactions are A+B, A+C, B+C, and A+B+C.
In order to determine the statistical significance of the results of the factorial experiment, an ANOVA might be conducted. This will show the possibility that the significance is by chance, or not by chance. This allows researchers to either accept the null hypothesis: that there is not a statistically significant main or interaction effect of factors, or reject the null and say that there is a statistically significant main or interaction effect of factor(s) on the response variable.
Continuing from the experimental design described in the previous section, let us assume there are three types of fertilizers (A, B, and C) and three types of vegetables (Eggplants, Carrots, and Peppers).
Each of the experimental combinations would be put in plots, allowing for the full comparison of each factor level. The plot of land may be randomly divided in the manner shown below:
|Eggplants + A||Carrots + B||Eggplant + C|
|Carrots + C||Peppers + A||Peppers + B|
|Peppers + C||Eggplant + B||Carrots + A|
The crops will all be treated equally, and then data collected from each of the subplots. Once the data is complete, all factor combinations can be compared in order to determine which has the optimal yield. This process ensures that the optimum point is found by testing all of the factor combinations.
0.5 The Setting, Design, Analysis Paradigm for the Design of Experiments
In coming up with an experimental design, it is helpful to have a framework in which we can reason about the problem at hand. The paradigm adopted herein is the Setting-Design-Analysis paradigm, adopted from [?].
- Setting (of the problem)
- Design (of the experiment)
- Analysis (of the experiment)
- Exploratory Analysis
- (Null Hypothesis Statistical) Testing
- Estimation (of Parameters)
- Model Adequacy Checking
Step I: The Setting
Requires: A description of the problem being studied, the idea, the thesis
Yields: The question to be asked, A statement of the null hypothesis in non-mathematical terms, and the number and type of factors under test and/or control
Step II: The Design
Requires: The question to be asked, A statement of the null hypothesis in non-mathematical terms, the number and type of factors under test and/or control
Yields: A design for the experiment
The Analysis Phase and Recipe Outline
Requires: An experimental design, the results after running the experiment
Yields: A report, consisting of the following four parts:
- Exploratory Analysis
- Estimation (of Parameters)
- Model Adequacy Checking