# Recipes for the Design of Experiments/Chapter 8: Fractional Factorial Designs

**Fractional Factorial Designs**

The dataset under analysis includes data on potential medical indicators of type 2 diabetes for 403 patients. Using a fractional factorial design, the experiment will test which of 6 factors suggest a predisposition to the onset of type 2 diabetes as measured by the response variable, percent glycosolated hemoglobin. The World Health Organization suggests using a glycosolated hemoglobin threshold of at least 6.5% to determine the presence of type 2 diabetes. Due to the nature of the experimental design (2^k), the 6 predictor variables were typecasted as factors and reduced to two levels for use in the Analysis of Variance model. This was performed by defining values greater and less than the column means as “1” and “-1”, respectively.^{[1]}

This recipe for the Design of Experiments takes a data set containing 64 total experimental runs and analyzes variance in the response variable with regards to the six factors, each having two levels. After this is done a fractional factorial design of a 1/2 fraction is created and the data is analyzed again. This is done to show proper usage of fractional factorial designs and the R package FrF2.^{[2]}

The following recipe analyzes the effects of specific vehicle parameters on fuel mileage (mpg). The dataset that is analyzed contains 6 factors, each containing two levels, producing a total of 64 runs. An anova is initially preformed to determine if the variation in any of the factors has an effect on the variation of the fuel mileage. Afterwards, a fractional factorial design will be created using the existing dataset and a second anova will be performed to determine the effect on the statistical results. http://rpubs.com/maxwinkelman/42704

This is a 2^k-1 (k=6 in this case) design which involves creation of a fractional factorial design with exactly 2 levels. The data is a subset of a large data-set involving strength of materials. We consider only the composition and strength of concrete mixture. Independent variables are the various constituents (only 6 are considered in this case)that are used to make the concrete mixture used in a variety of construction applications (buildings and bridges). The 'strength' of the concrete mixture is the response variable. http://rpubs.com/Uzma_1004/42659

Data of following analysis is about prices of houses and factors that might have an influence. We apply a fractional factorial design strategy and test if it could have a same result as complete fractorial design. It is a 2^k-1 design where there are 6 factors (k=6), namely whether there is a driveway, a recreational room, full finished basement, gas for heating, central air conditioning and whether it is located in preffered neighbourhood of the city. Comparing ANOVA and model adequacy checking results, it seems fractional factorial design is able to reflect result of complete factorial design to some extent in this case. http://rpubs.com/chenh16/42734

The following recipe is an analysis of variance on a data set involving traffic fatality data and demographic information as well as laws surrounding driving. On top of the traditional analysis of variance on a general linear model a fractional factorial model is also created and used. This model is a 2^6 factor design only analyzed 32 runs (2^(6-1)). The null hypothesis is that the response variable traffic fatality rate is independent of the factors.^{[3]}

In this study that uses data collected from "The impact of unemployment insurance benefit levels on recipiency" (McCall, B.P. 1995), a six-factor, two-level experiment is performed to see if either being white, attending school for more than 12 years, being male/female, being married, having kids, or having previously applied for UI benefits has a statistically significant effect on the state unemployment rate (in %). In determining this level of significance, an ANOVA analysis is performed, Tukey Honest Significant Differences are computed, and a fractional factorial design [2^(6-1)] is generated.^{[4]}

This recipe is examining unemployment of Blue Collar Workers from the Ecdat package. We are looking at how 6 different factors, each with 2 levels effect the replacement rate. - Cheryl Tran, http://rpubs.com/tranc3/42909

The following recipe is an analysis of the Computers dataset from the Ecdat package. This is a 2^k-1 fractional factorial design where k = 6. The goal is to test if the outcome of the testing would produce the same result regardless of fractionalization. ANOVA and model adequacy techniques were utilized to determine the effect of the statistical results. - Matthew Macchi http://rpubs.com/macchm/44747

This recipe uses data from the Ecdat package. The dataset itself has observations of the number of doctor visits an individual goes to in a given year. Some of the factors in question are race, gender, years of school, perceived health of the individual, employment status, and marital status. An Analysis of Variance was completed in order to see if anything other than randomization could explain the variation in frequency of doctor visits among the individuals. After the anova, a 2^(k-1) fractional factorial design was generated. http://rpubs.com/braunj6/42823

The purpose of this project is to create a 2 level half fractional factorial design with 6 factors. The dataset used for this experiment is the “Benefit” from the “Ecdat Package” in R, which is used to explore the influencing factors of state unemployment rates of blue collar workers in 1972. The result shows that the fractional factorial design helps to reduce the estimation time and cost substantially, and it is able to maintain part of the results generated from the full factorial design. Our results show that the variation in the state unemployment rate cannot be explained by sample randomization only. - Wei Zou^{[5]}

The following analysis uses ANOVA on a fractional factorial design to examine how several physicochemical factors effect the perceived quality of red wine.^{[6]}

In the following experiment, the Somerville dataset from the Ecdat package is analyzed. The data is based off observations, such as if they pay an annual fee or participate in skiing, from individuals who visited Lake Somerville. Besides the basic exploratory data analysis, ANOVA model, and model adequacy checking, a new package, FrF2, is utilized. A 2^(k-1) or half fractional design is created. In the case of this recipe, we use k=6, producing 32 runs. Each factor has 2 levels (1 or -1).^{[7]}

**Fall 2016 Projects are here**

**Michael W** - The following experiment analyzes the OFP data set from the Ecdat package in R. The purpose of the experiment was to study the impact of 4 factors on the number of visits a subject made to the doctor's office. A 2^6-3 design was calculated using FrF2, with the 6 factors created by decomposing the two 3-level factors into 2-level factors and leaving the 2-level factors as is. The design was analyzed and main effects and aliasing was calculated. The size of the main effects and an ANOVA was used in order to determine inclusion in the final model. The results showed that 2 of the factors were significant, while 2 other factors were not, and that there was some deviation from normality in the model, indicating that there are other factors that could improve the model.^{[8]}

**Bjarke H** - This experiment analyzes the Star data set from the Ecdat package. The main focus is the 4 factors where 2 of them have 3 levels. These two 3-level factors are decomposed to four 2-level factors and FrF2 is used to contruct the design matrix for the fractional factorial design as well as the aliasing structure. The main and interaction effects are estimated in the analysis of variance (ANOVA). http://rpubs.com/bjarke1/project3

**Kristen C** - This experiment analyzes the Housing data set from the Ecdat package. This data set is made up of 546 observations of housing prices and other housing variables in Windsor in 1987. The four independent variables that we will focus on are : bathrooms (3 level), stories (3 level), fullbase (2 level), and driveway (2 level), with the response variable: price. The experimental design consolidates each three-level factor into two two-level factors. The main and interaction effects are estimated using ANOVA and the model is validated by plotting the residuals. http://rpubs.com/Kristencole3/234231

**Molly R** - This experiment examines the Health Insurance data set from the Ecdat package. The experiment found here analyzes the number of hours a wife works per week, as a function of whether or not she is covered by her husband's health insurance, whether or not she has health insurance through her job, her race (black, white, or other), and her region (south, west, northcentral). The fractional factorial design decomposes the two three-level factors into four two-level factors, making this a 2^6 design approximated by a 2^3 design. The analysis found that the husband's and wife's health insurance factors were significant, along with if they were white or not, and if they lived in the west region or not. However, the data set does not fit normality assumptions, so further analysis should be done to confirm these factors, as well as examine the aliased two-factor interactions.^{[9]}

**Kaan U** - Housing dataset from ECDAT package was analyzed with a (2^6-3) Fractional Factorial design. 2 two level factors (full finished basement and whether the house is located in a preferred area) and 2 three level factors (lot size of the property and number of bedrooms) selected for this design. With this initial screening analysis, we found all factors are significant. However, the full data set modeled with these findings is not fully fitting normality assumption and requires further analysis. http://rpubs.com/unnuk/234307

**Joonhyuk Bok** - Among a dataset from the Ecdat R Package, we select “Mathlevel” which would be useful for expecting SAT Math Score. In Mathlevel data, ‘language’, ‘sex’, ‘physiccourse’ and ‘chemistcourse’ are selected as factors, which could explain the result of SAT Math Score and have 2 levels, 2 levels, 3 levels and 3 levels respectively. And 'sat’ is chosen as a response variable. Fractional factorial design is conducted in order to reduce the computing power and runs necessary to reach an appropriate conclusion. The factors with 3 levels will be decomposed into factors with 2-level for calculating the required data needed to obtain appropriate data. The results from the factional factorial design will be evaluated for aliasing in order to decide the true amount of data which comes from more limited fractional design. Main and interaction effects are calculated using ANOVA analysis. And we show how to determine the generator I for the experimental design. http://rpubs.com/bokjh3/234088

**Mike D.** - A dataset comprised of the top Reddit posts from three popular subreddits (r/science, r/politics, r/news) were analyzed with a 2^(6-3) fractional factorial design using the FrF2 R package. The objective of this analysis was to identify which factors, if any, led to a significant difference in the amount of upvotes on the post as a percentage of the total number of votes. A tree-like sorting algorithm was developed for sampling from the dataset. Upon constructing a linear model and performing ANOVA it was found that none of the main effects were statistically significant. Explanations based on the model assumptions and limitations to the fractional factorial design are provided along with an explanation of the aliasing structure and generators for the fractional factorial design.^{[10]}

**Clare D.** - This experiment examines potential factors that influence the length of prison sentence that offenders receive. Over 3,000 inmates serving time in state and federal prisons provided personal and background information that was collected along with their sentence length in months. In this experiment we will use a 2^{6-3} fractional factorial design to estimate the main effects of four factors on the sentence length of prisoners. Two of these factors are 3-level factors that will be decomposed into two 2-level factors. This design has 8 experimental runs and from those we were able to estimate a tentative model for the response variable, Sentence length. This model was then checked against an ANOVA test on the full dataset.^{[11]}

**Alexis Z.** - This experiment studies the response variable, birth weight, of 1450 in North Carolina in 2001 in response to four factors. The factors in question were: gender of the baby (2 levels), whether or not the mother smoked (2 levels), race of child (3 levels), and weeks of gestation (4 levels). The study begins with exploratory boxplot analysis. Next, a fractional factorial design was conducted to study the data in 64 runs; this design was further reduced to a 1/8 fractional factorial design. Main effects were calculated and all factors appeared to have significant effects; however, as this is a resolution III design, the effects are aliased with interactions. A linear model was still made, and this model had quite appropriate. Ultimately, models containing 2fi should be considered, and further statistical methods should be employed, as not all of the design runs were represented in the dataset.^{[12]}

**Trilce ** In this report we evaluate air quality data from California metropolitan areas, to assess the impacts of socioeconomic and geographic characteristics on air pollution in the areas. A fractional factorial design is developed, and linear models are used to estimate the effects of various covariates.^{[13]}

**Yage Ding** This experiment studies how blue collar workers' gender, color, age, and years of tenure in job lost affect the state unemployment rate. As factors color and gender have 2 levels, factors age and years of tenure in job lost have 3 levels. Fractional factorial design is used in the design of this experiment. All factors are first transformed to 2-level factors. We then identified the desired effects for the design by computing exploratory main effects and ANOVA with the original data set. Although we found that 2 main effects and 1 interaction effect may be significant, our limited resource does not allow us to estimate all these effects using a total of eight experimental runs executed randomly to reveal the effects of these factors and interaction. We can only estimate the desired main effects, which, in this experiment, are aliased with 2-factor interaction effects. As the result of ANOVA suggests, neither of the main effects can explain the variations the state unemployment rate. Dataset used to conduct the experiment is from R package "Ecdat".^{[14]}

**Munira S** This experiments shows the effect of a man's health, ethnicity, residence, and marital status on the log of hourly wage. The data set is the Males data set from the Ecdat package in R. Fractional factorial design is used in the design of the experiment. All factors are transformed into 2-level factors. The desired effects were identified via a linear model and ANOVA. This experiment fount that the factors of marital status, and residence are significant.^{[15]}

**Dede D** The data analyzed in the recipe provides a comprehensive picture of how Americans use and pay for health services. The dataset is a subsample of individuals ages 66 and over (a total of 4406 observations). Each individual is covered by medicare. This experiment’s end goal is to construct a 2^m-3 fractional factorial design with the highest resolution possible. To reach that goal, following steps need to be done: (1) Full factorial design will be created of the form 2^2 * 3^2, (2) 3 levels will be transformed into 2 level factors to form 2^6 full factorial design, (3) 2^6 full factorial design can be transformed into 2^m-3 design. Using aliases, the aliasing structure of this final fractional factorial design will be determined. From the main effects, a linear model will be constructed and tested with ANOVA.^{[16]}

**Liang Z** Topics about properties have been analyzed in great detail by researchers in the past decades. Many factors can affect the value of housing, making the research more complex. In this project, we analyze the factors that may affect the house values. We apply fractional factorial design (FFD) among Housing dataset in Ecdat. The variables of study include two 2-level factors and two 3-level factors that may influence the price. Fractional Factorial Design was used to perform this analysis. We design the experiment to divide each 3-level factors into two 2-level factors. Then we analyze the main effect and compare the result with full factorial design.^{[17]}

**Benjamin B** The project looks into the cars dataset, which consists of 93 observations. Using a fractional factorial design, it looks into which factors affect the price. There are two 2 level factors and two 3 level factors, which are decomposed into 2 level factors. The factors include airbags, drive, man.trans, and origin. Then, it analyzes the results based on the aliased structure, me, ie, and ANOVA. The results were as expected, where drive, man.trans, and origin has the greatest effect on the price of cars.^{[18]}

**Rajani D** This analysis focuses on understanding how different attributes of a student affect his or her school attendance. The data set includes 30 attributes of 649 students from Portugal in 2008, but the experiment focuses on gender, involvement in extracurricular activities, alcohol consumption, and travel time to school. The effect of these four independent variables on number of absences is analyzed in this study using a 2^6-3 fractional factorial design. The analysis uses a linear model and ANOVA, and takes aliasing structure into consideration. http://rpubs.com/rajanideshpande/234410

**Felipe O**Using data collected to run a full factorial experiment on what factors affect the color change in clothes after being washed, a fractional factorial experiment was formed. These include type of soil, fabric, wash temperature, and whether the detergent is surfactant or not. The 3-level factors were converted into 2-level factors, and results were obtained from the new fractional design.</ref>http://rpubs.com/ortizf2/234412</ref>

**Shamus W** The data being used in the experiment is the Cars93 dataset from the Ecdat package. This dataset contains 93 observations of 23 variables and is a collection of attributes of vehicles that were for sale in the United States in 1993. For this experiment, we are interested in the effect of 4 factors, two 2-level factors and 2 3-level, on the price of the vehicle. The factors observed airbags, drive train, transmission type, and origin of the vehicle. This experiment was conducted using knowledge of aliasing, main and interaction effects, and ANOVA.^{[19]}

**Andreas V** Experimental dataset is 'Fatality' from the Ecdat package, which contains several factors related to traffic incident fatality rates. We examine 4 factors in 2^(6-3) fractional factorial design, two are 2-level and two are 3-level. Experiment examines aliasing, main & interaction effects, as well as analysis of variance. The FrF2 package is used for aliasing structure and ANOVA gives insights on main and secondary effects.^{[20]}

**Diana R** The study examined depression index for the youth population using fractional factorial designs. Four factors were involved in the analysis: gender, cigar daily use, alcohol and drug use levels. The resulting fractional factorial was composed of a Resolution III, 2^(6-3) design, where 8 runs were generated. The fitted regression model was also developed and ANOVA results indicated a significant difference among gender population and high levels of addiction.^{[21]}

**Trevor C.** For this recipe, a dataset of automobile design and performance metrics from 1974 Motor Trend is analyzed. MPG is a response variable, which is dependent on factors including number of cylinders, car weight, V or Straight Engine, and Automatic or Manual transmission. There were 32 observations. This data can be found on vincentarelbundock.github.io/Rdatasets/datasets.html. The experiment uses a fractional factorial design of Resolution III, and has ANOVA results which indicate significance of certain factors within the dataset.^{[22]}

**Prasanna Date** Studied the effect of two 2-level factors (sex of head of household and whether the household is in urban area) and two 3-level factors (age of the head of the household and size of the household) on the total household expenditure of Vietnamese households. The dataset was obtained from Ecdat package.^{[23]}

- ↑ Trevor Manzanares http://rpubs.com/manzat/42544
- ↑ http://rpubs.com/adamato/42650
- ↑ http://rpubs.com/Tothk2/Recipe8
- ↑ Brendan Howell http://rpubs.com/howelb/42796
- ↑ Wei Zou http://rpubs.com/serena049/46151
- ↑ http://rpubs.com/konraz/43258
- ↑ Ali Svoboda- http://rpubs.com/svoboa/44750
- ↑ Michael W- http://rpubs.com/mtwassick/232905
- ↑ Molly R - http://rpubs.com/molly_ren/doeproject3
- ↑ Mike D. - http://rpubs.com/deagem/reddit
- ↑ Clare D - http://rpubs.com/cadorsey/234395
- ↑ Alexis Z. - http://rpubs.com/ziemba/234397
- ↑ TE http://rpubs.com/trilcemarie/DoE_P03
- ↑ Yage Ding - http://rpubs.com/dingy2/232289
- ↑ Munira Shahir- http://rpubs.com/mjshahir/234401
- ↑ Dede Dolkar - http://rpubs.com/dededolkar/234405
- ↑ Liang Z http://rpubs.com/zhaol11/234349
- ↑ Benjamin B http://rpubs.com/byeonb/doeproject3_version01
- ↑ Shamus W http://rpubs.com/shamuswheeler/project3
- ↑ Andreas V http://rpubs.com/byeonb/234436
- ↑ Diana R. http://rpubs.com/dgramirez82/project3
- ↑ Trevor C. http://rpubs.com/trevorcorrao/234450
- ↑ Prasanna D. http://rpubs.com/prasanna_date/234882