# Recipes for the Design of Experiments/Chapter 2: Two or More Factors

2.1 Introduction

In this chapter, we will build on what we have done so far. Make sure you are comfortable with the subject before continuing on. If not, go back and review them. We will take you to a brief review of what we did.

We first started with the elementary techniques of Design of Experiments. Some of the designs we focused on were Best Guess and One Factor at a Time. Then, in the previous chapter, we introduced a new concept, factorial design, and discussed topics such as main effects, interaction effect, and 1-way Analysis of Variance, all of which we would go into more detail. However, in the practical world, one factor is not as widely used in experiments. So, we should extend our case of factorial design to two or more factors (or sets of treatment).

Using a factorial design has many advantages. Factorial designs does not limit the effects of the factors to only a certain degree, which allows experimenters to make findings and conclusions over a wider range of conditions. If there are interactions present, then factorial designs help prevent conclusions that may be incorrect. In most cases, factorial designs tend to be more efficient than OFAT. As a review, factorial design refers to the experimental design where all combinations of factors’ levels are covered for each replicate. In a two or more factorial design, we obtain information on all the factors through changing the factors one by one. For a general arrangement for a two factor design, there will be Factor A and Factor B, each with ‘a’ levels and ‘b’ levels, respectively.

We will now refer to main effect as me. The main effect usually refers to the key factors in the experiment that are of interest. It is defined as the change in the response variable as a result of the change in the factor’s level(s). At times, we may encounter interaction effects. For example, in a three factorial design (ABC), there may be some differences in the response(s) among the levels of Factor A and the levels of Factor B. Not all levels of the different factors will have the same response as another level of factors. When such conflict between the levels of factors exist, we still refer to this as the interaction between factors. In this text, we will now refer to 2-factor interaction as 2fi, 3-factor interaction as 3fi, and so forth. You probably get the idea by now. Get comfortable with the new wordings!

In the past chapter we introduced ANOVA designs that involved multiple levels of a single factor. In this chapter we will explore a deeper application of ANOVA; with 2 factors. Two-way ANOVA allows us to compare the effects of two factors simultaneously. The main purpose of this comparison is to understand if there is an interaction of the two independent factors on the dependent factor. For example, you could used two-way ANOVA to see whether there is an interaction between the amount of time spent exercising a day and the amount of time spent at work on the happiness level of an individual. The basis of two-way ANOVA is essentially the same as single factor ANOVA, with the exception being that two-way ANOVA displays a result of three F statistics; one to test each of the me and a third to test the combined effect of the two factors.

In order to gain a better understanding of experiments with two or more factors, the final section of this chapter will cover multiple sample recipes of experiments involving two or more factors. These recipes include analyses on datasets such as fuel economy data, storm weather data, and flight departure data. Each analyses will start with an overview of the data using descriptive statistics, moving into visualization of the data using several plotting techniques, and concluding with ANOVA testing and results.

Now, in Chapter 2, we will expand on our knowledge of factorial designs to two or more factors. Let’s get started!

2.2 Main and Interaction Effects

Factorial design is used to gain insight on the effect of various independent variables, or factors, on a dependent, or response, variable. These effects are split into two categories, main effects (me) and interaction effects. Main effects are calculated for each individual factor in in an experiment, so for an experiment with n factors there will be n main effects. Interaction effects look at 2 or more factors together to determine whether the combination of factors has an effect on the response variable. Analysis of interaction effects begins with 2-factor interactions (2fi) but can expand as far as n-factor interactions. However, it is commonly assumed that interaction effects involving three or more factors are equivalent to zero so the focus remains on 2-factor interactions. Still, it follows that in an experiment with n factors, there are nC2 2-factor interactions, nC3 3-factor interactions, and so on.

2.3 2-way ANOVA A 2-way analysis of variance (ANOVA) is a way to compare the effects of two factors on the variation in a response variable observed within a particular dataset, as well as any impact of the interaction between those two factors. Similar to a 1-way ANOVA, the sum of squares (SS) are calculated for each group and sub-group, and in dividing by the appropriate degrees of freedom (df) one can calculate the mean squares (MS) for each sub-group. By dividing the MS value between levels of a given factor by the reference distribution, the F-statistic can be calculated and compared to the probabilistic F-distribution to determine whether the F-statistic for a given factor (or interaction) is statistically significant. If the p-value associated with the F-statistic is significant, then the variation in the response variable cannot be explained by randomness alone.

Randomness of the dataset is a key underlying assumption in the analysis of variance, thus it is critical that all experimental trials were performed randomly. This involves random selection of trials sequence by factor, level, and replicate. If the experiments were performed randomly, it is safe to say that the variation between groups may be compared. Conversely, if the experiments were performed systematically, then there could be a systemic bias hidden in the data that could result in errors in the analysis.

2.4 Sample Recipes

Recipes for experiments with two or more factors

The objective of this analysis is to test factors that may influence fuel consumption, therefore helping consumers to make their most economic decision on used vehicle by using ANOVA. The author set ‘Honda’ and ‘Toyota’ as two levels of factor ‘make’ as an example; and year from ‘2005’ to ‘2011’ as levels of factor ‘year'. Response variable in this test is ‘cty’, referring to city fuel economy in mpg (miles per gallon). The analysis is based upon dataset 'fueleconomy'. Overall, although it is important to analyze data set using R, the author was needed to describe what he analyzed after using R. I am not sure that 'maker' and 'year' variable could be a factor because each year could not represent the level of a factor because 'maker' does not represent the level of factors and 'year' seems like continuous variable.

Also, the author needs more logical and theoretical support for hypotheses of main effects and interactions and reasons why the author select two factors for response variable. Furthermore, the author needs to explain theoretical meanings of randomize, replicate and block in detail in order to justify why the author does not consider them in the this experiment. In addition, the author should explicates what methods he used for model adequacy checking. And I recommend that the author explain the meanings of research results and findings at the end of the research. 

These data were collected by the Social Security Administration from 1880 to 2013, and are called the "babynames" dataset in R. Since the dataset is so large, it was subset for computational purposes to only include babies named 'Mary'. Surprisingly, there were several male babies named "Mary", so an analysis of variance was conducted to determine if "gender" and "year" had an effect on the number (n) of babies with this particular name over the years. In this experiment, the main purpose is to test the sensitivity of number of babies named "Mary" in response to the "year" and "gender". Experimental unit is number of babies named "Mary", and factors that the unit is subjected to are "gender" of the baby and "year" of birth. There are 2 levels to factor "gender": male and female, and 30 levels of factor "year": from 1880 to 2010. "Gender" and "year" are both random factors, because they do not have intentionally selected levels, and they are able to represent a general population rather than only the selected levels. Factorial design was chosen to be the design method of this experiment. Instead of varying factors in a sequence, or testing response on only one best guess treatment, the author treated experimental units with combinations of factors. There are no replicates in this experiment; the number of babies under each condition was not measured across multiple populations. Nuisance factors in this experiment can be fashion trends; for example, there could be celebrity named "Mary" who was popular in a few years. Since these factors are not of the experiment's interest, they should be blocked. However, fashion trend is not a controllable factor, so it is hard to block. To draw meaningful conclusion from the data, another process that is important to experimental design is randomization. First, the data in this experiment was randomly selected; it was not measure only in certain types family or region. Although in an ideal experiment, treatments should be randomly assigned to samples, in this experiment, no treatment was assigned. Therefore, random assignment was not achieved. The last aspect to randomization is random run order. Time frame is already a factor of this experiment, and cannot be controlled. Thus, random run order was not achieved either. To analyze these data, the author computed boxplot and one-way analysis of variance (ANOVA) on both factors, and 2-way ANOVA on the interaction of the 2 factors. Boxplots showed that there are main effects of each factor on the response. With the one-way ANOBA performed on "year" alone, the author failed to reject the H0 that there is no effect on n. That is, "year" had no effect on the number of babies named "Mary". For "gender" alone, the author rejected the H0 that there is no effect on n. That is, "gender" had a significant effect on the number of babies named "Mary". However, when analyzing the interaction of "year" and "gender", the author rejected the H0 that there was no effect on n. In this model, the interaction effect between the 2 factors had a significant effect on the number of babies named "Mary". Unfortunately, these data are not normally distributed, so the results cannot necessarily be valid without first exploring data transformations to increase their normality. 

The following document details the analysis from a two-factor, multi-level experiment of the data set ‘storms’ from the R package ‘nasaweather.’ The experiment looks at two factors: the month and year that a storm occurred, and one response variable: the pressure of a storm system. The ‘month’ factor had seven levels: June, July, August, September, October, November, December. The ‘year’ factor had six levels: 1995-2000, inclusive. This data was not randomized or replicated, as there is no way to control weather data other than collecting it as it occurs. Possible nuisance factors not examined in this analysis could include the location of the storm, the temperature of the storm system, or possibly the instrumentation used to collect the data (though in data collected by NASA, instrumentation error or bias seems unlikely). A blocking design categorizing storms by location could eliminate the effect of location on storm pressure, and possibly temperature of the system.

An analysis of variance (ANOVA) test was performed to determine the statistical relationship between the pressure means of storms that occurred in different months and different years. ANOVA techniques were also used to determine the relationship between the pressure means and the interaction between the month and year. The analysis concluded that there is a relationship between the variance in pressure values of storms and the month and year. There was also a strong interaction between the month and year that a storm occurred. Normal QQ plots were used to determine if the data set fits the Normal distribution. The QQ plots suggest that the Normal distribution may not be the best fit for the data, but could be a fair approximation. For a potentially more accurate analysis, a Kruskal-Wallis test should be used, as it does not assume normality, as an analysis of variance does. However, this Kruskal-Wallis test cannot calculate the interaction effects, so there may be a trade-off there.h.

In this recipe, fuel economy data collected by the U.S. Environmental Protection Agency was used to demonstrate the proper execution of a two-factor experiment with multiple levels for each factor. Specifically, this experiment analyzed the potential effect of number of cylinders in an engine and the transmission type on the highway gas mileage achieved by the vehicle. An analysis of variance (ANOVA) test was used to examine both main and interaction effects. For each of the main effects the ANOVA test produced the lowest calculable p-value in R. From this we can conclude that the difference in variation between different engine sizes and different transmissions is not due to random chance and that there is a very good possibility that both engine size and transmission type have a significant effect on highway gas mileage. The ANOVA test for the interaction effect between the 2 factors produced the same p-value. However, the F ratio, while still large, was many times smaller than the F ratio for each of the main effects. Still, it can be inferred with a high level of confidence that engine size and transmission type do interact in some way to produce a difference in variation that is too large to be random. It should also be mentioned that since this is an analysis of data that was previously collected, many of the assumptions used in order to draw conclusions from ANOVA testing may not be valid. 

The following experiment is a study on the effects of engine features on a vehicle’s fuel economy. This experiment utilizes the data set in the “fueleconomy” package. This data was collected by the EPA from 1985 to 2015. This uses a subset of the data levels to create a two factor, three level experiment. This experiment examines two factors, “number of cylinders” and “type of transmission”. Number of cylinders has three levels (4, 6, and 8 cylinders) and type of transmission has three levels (auto AV-S6, automatic 4-speed, and manual 6-speed). This experiment only considered cars of “Toyota” make and from the years “2005-2015”. Some levels of the type of transmission were also excluded from this experiment. The possible response variables are highway and city miles per gallon. For this experiment, the response variable is city miles per gallon. This experiment is not well randomized or replicated, but it does use blocking to look at a specific subset of the data. The author uses boxplots to show the potential main effects of number of cylinders and type of transmission on city miles per gallon. Using ANOVA, both of the main effects are significant. The interaction effect between cylinders and transmission on city miles per gallon was not significant. This is a simple example of a two factor, three level experimental design. 

Factorial design experiments study the responses of dependent variables to two or more factors. In a factorial design experiment, the subjects are randomly chosen from the population and assigned to a treatment in a random order, and the experimental runs are executed in a random order. The following R publication details an analysis using two factors selected from fuel economy data of an EPA study. Unfortunately, as the data was collected using unknown methods, it cannot be confirmed that randomization was used in this experiment. Two factors (cylinder number and fuel type) with multiple levels were studied. All levels of the factor cylinder number were used (9), and fuel type data was subsetted to include 3 out of 13 levels. The effect on the response variable, city gas mileage, was studied using an analysis of variance (ANOVA). The main effects, or the effects of each individual factor, on the response variable were computed. The interaction effect of cylinder number and fuel type (when the effects for one factor are different at different levels of another factor) were also computed. After the ANOVA, the adequacy of the model was assessed using qq-plots, interaction plots, and by plotting the fitted model against the residuals of the model. For all effects, the final results lead us to reject the null hypothesis, suggesting that the variation in city gas mileage is greater than variation that would occur as a result of randomization. However, the model adequacy checks suggest the model employed is inadequate to explain the results, so other models should be considered. 

A two-factor experiment was performed by using the storms database from the from National Hurricane Center. This database is composed of 2747 observations from all storms that have occurred in the Atlantic Ocean, Caribbean Sea and Gulf of Mexico, from 1995 to 2005. It includes the name of the storm, year, month, date, hour, latitude, longitude, type, air pressure, maximum wind speeds, and day of the hurricane season. The objective of the analysis was to observe how wind pressure levels are affected by type of storm and wind speed. The Type of Storm factor has four levels: Extratropical, Tropical Depression, Hurricane, and Tropical Storm. The Wind Speed factor ranges from 15 to 155. An ANOVA test was performed in order to explain if there is a strong relationship between the Type of Storm and Air Pressure, as well as to observe how Wind Speed relates to Air Pressure. Results indicate that that both main effects and interactions are significant at a 99.9% Confidence Interval, which indicates that the variation of air pressure can be explained by wind pressure and type of storm. .

The following analysis utilizes baby name data obtained from the SSA from the years 1880 to 2013. Popular male and female names are on occasion used as a unisex name by the less popular sex. This analysis seeks to investigate if both the top 100 male and female names of all time were differentially preferred over time.

This recipe is examining the vehicle data from the fueleconomy package.This dataset contains fuel economy data as a result of vehicle testing done at the Environmental Protection Agency’s National Vehicle and Fuel Emissions Laboratory in Ann Arbor,Michigan.This experiment is testing the effect of number of cylinders and fuel type on the highway fuel economy for Hondas. Each of these factors had multiple levels, with 3 levels for cylinders, 4 for fuel type and the response variable being the highway gas mileage of the vehicle. In order to determine the distribution of each of the data sets, boxplots were created, grouped by each of the factors. The cylinder and Gas type factors can be considered fixed effects, whereas random effects could be the performance of one car as compared to the 'average' car of that type. The experimental design was not randomized or replicated, but a blocking procedure was used in order to test all cars of the same brand at one time. ANOVA conducted on each of the factors as well as for interaction effects resulted in a significant main effect between each of the variables on the highway mileage of the car. The interaction effect of the number of cylinders and the gas type was also found to be statistically significant. In order to test the goodness of the model, QQ plots of the residuals and a residual plot were used in order to determine normality. Data was found to be relatively normal, although it deviated at extreme values. As a result, a Kruskal-Wallis test was used to examine the main effects, as it does not assume normality like an ANOVA does. The K-W test found that the main effects were statistically significant, even if normality is not assumed. Interaction effects can not be computed with the K-W test directly. 

The following experiment conducts a two-factor analysis on a multi-level experiment examining the data of various storms at different time stamps along their paths. With a dataset of 16 different columns and 2747 observations, some independent variables include the month, the latitude, the type, and the category and how all of these factors relate to the wind speed of the storm, the response variable. A multi-level, multi-factor Analysis of Variance (ANOVA) test was completed to statistically determine the significance of the p-values for each individual factor, along with the p-values of the interactions between the factors. The null hypothesis of this experiment states that the variation of the wind speed cannot be explained by anything other than randomization. Excluding the interactions below,

• Month & category
• Month & Pressure
• Latitude, Category & Pressure
• Month, Category & Pressure
• Latitude, Month, Category, Pressure

the results showed that the variation in wind speed can be explained by month, category, latitude, pressure and many interactions between these factors, because the p=values are less than an alpha of 0.05. 

Seasoned airline travelers know that occasional flight delays are inevitable. In 2013, a total of 336,776 domestic flights departed from the three NYC airports, and many of these flights experienced delays in arrival time. This statistical analysis explores two possible contributing factors for these arrival days: origin airport (JFK, LGA, or EWR) and destination airport (105 airports in the United States and its territories). A random sample of 5000 flights was selected from the dataset for a 2-way ANOVA to test against the null hypotheses that the origin airport, destination airport, and interaction between the two factors have no effect on the response variable: arrival delay. In fact, the analysis provides statistically significant reason to reject all three null hypotheses, indicating that origin airport, destination airport, and the interaction between these two factors have some effect on the delay in arrival. An independent repetition of the analysis confirmed the results that something other than randomness determined the variation between origin airport, destination airport, and the interaction (p < 0.024, p < 0.0004, and p < 2.6e-05, respectively). 

The following link examines the vehicles dataset within the “fueleconomy” package, which is the fuel economy data from the EPA covering years between 1985 and 2015. This dataset contains various variables, and does not include vehicles with incomplete data. In this study, a two-factor, multiple level analysis was performed to examine whether vehicle manufacturer (make) and/or fuel type (fuel) have an effect on the city fuel economy (cty). A subset of 4 levels used for vehicle manufacturer and all levels of fuel type considered. Make and fuel are both qualitative independent variables and ANOVA method was performed for statistically analyzing their effects on the response variable (city fuel economy). The null hypothesis for ANOVA was defined as the city fuel economy shows no difference on different makes and fuel types. Three ANOVA tables were created, two for main effects of make and fuel and one for interaction effect make*fuel, all results return a very small p-value which represents all effects are significant. However, without the confirmation of normal distribution of dependent variable we cannot accept the results of ANOVA correct. In this study Q-Q plots were drawn and they represent a difference then a normal distribution. (Additional to the existing Q-Q plots, by plotting a histogram, positively skewed distribution can be seen.). As a result, these findings of ANOVA should not be used and for further studies with other methods should be considered. (Note: There are some mistakes in the study which should be corrected, as an example in the Box Plots x and y axis labels are not correctly defined.) 

The following link directs you to a recipe of an experiment in regards to the fuel economy dataset. An analysis of a data set of city gas mileages of different vehicles, specifically Audi, of different characteristics. In the analysis I take a two-factor, multi-level approach to determine the effect of varying engine displacement and fuel type on the resultant city gas mileage of the Audi. This experiment was a fixed effect experiment; the levels of the vairables were specifically chosen and are not able to represent the entire population (there are other levels of fuel types and displacements not analyzed in the experiment). Exploritory data analysis was done in order to understand the main effects of the experiment. Analyses of variance (ANOVA) were performed to compare means among the factors with regards to their respective levels. The large levels of the F-values during this analysis show that the explained(between groups) variance is much larger than the unexplained(within group) variance, which shows that the experiment is significant. Another ANOVA was performed to analyze the interaction effect of the two factors on the response variable of highway gas mileage.

This sample recipe looks into a two-factor, multi-level experiment of the data set concerning hurricane storms. An analysis on the air pressure at the storm center is conducted based on the *storm* dataset in the *nasaweather* package. Here is the breakdown of the two factor, multi-level: “Year of Occurrence”, with 6 levels ranging from years 1995-2000, and “Storm Type”, with 4 levels as Tropical Depression, Tropical Storm, Hurricane, or Extratropical. The response variable is “pressure”, which is the air pressure at the center of the storm. Since the data collected occurred during given periods, the whole population characteristic was captured and may be seen as a random effect than a fixed effect. The fixed effect can be the type of storm because it was selected and does not represent the general population. As for the Experimental Design, the objective is to find if there is a trend in air pressure with year of occurrence and/or storm type. The design did not include replicates, repeated measures, and blocking. A statistical analysis was conducted, and represented as a box plot. ANOVA was utilized as a tool to determine statistical effects on the pressure due to year of occurrence, storms storm type, and the interaction effect of year and type. From all three analysis of variance, it is worth noting the small p-values. Looking at the Mean Square and F-value, the F-value was very large, which indicates the year and type do not explain the variance. From this, there is little chance that the variance of pressure can be explained by year of occurrence and type of storm. Results indicated that randomness in pressure can be due to a different factor besides simple randomization. To assure model adequacy, a qqplot and interaction plot were used. The result show that both the year of occurrence and storm type, and their interaction may explain the randomness in the air pressure at the storm center.

The following analysis of a three factor multiple level experiment uses non-parametric ANOVA to examine the effect of storm type, month, and location latitude on pressure during a storm. Each of these three factors is broken down into several levels; "Latitude" has 6 levels and ranges from 0 to 80, "Type of Storm" has 4 levels, and "Month" has 6 levels ranging from 6 to 12. The response variable, pressure, ranges from 905 to 1020 with a mean of 990. Interaction plots for the factors are used to determine interaction effects, which show that there is an interaction effect between "Month" and "Latitude". This suggests that different locations behave differently during different months in terms of storm pressure, so this interaction needs to be accounted for. In the model, this is done by adding a regressor for "Month * Latitude". To test whether each of the factors are found to have a significant difference in means, ANOVA is used. The small p-values (< 0.001) in the output allow the null hypothesis that randomization alone can account for variation in storm pressure to be rejected. Thus, latitude, month, and type of storm are all statistically significant in having an effect on storm pressure, and there is a statistically significant interaction between the month of the storm and the latitude at which it occurred. A normal Q-Q plot is used to look at the model residuals, which do not seem to be normally distributed on the low tail. This may be because the response variable of pressure is not normally distributed, so the model may not have the best fit. 

Fall 2016 Projects

The wages and education of young males were recorded from 1980 to 1987. The dataset has been obtained from the R package “Ecdat” and contains 4360 observations of 12 variables. The analysis uses a fixed effect model and analysis of variance to determine whether e.g. industry, occupation or being in a union has an influence on hourly wage. https://rpubs.com/bjarke1/isye4330project1

The following references links to an example examining terror attacks from around the world. The Global Terror Database contains records of terror attacks from 1973-2015. A subset of this data, attacks from 2012-2015, was examined to determine the effect of attack type, target type, weapon type, and location on the number of fatalities a terror attack results in. ANOVA techniques were used to examine the main effects and the interaction effects, which showed that all four factors have a statistically significant effect on the response variable, as well as significant interaction effects between all four factors. 

The deaths of the citizens of New York City were recorded from 2007 - 2011, as well other factors relating to their death. The total dataset includes 3840 observations and 6 factors. This example explains the preliminaries of data exploration as well as ANOVA implementation. http://rpubs.com/ortizf2/217716

The duration of breastfeeding in weeks of over 900 mothers was recorded from 1978-1986. The data set included 9 variables and the analysis shown in this examples explores 4 factors and their effects on the duration of breastfeeding through ANOVA and a mixed effects model. http://rpubs.com/Kristencole3/217721

The project examines which factors affect the air quality in the Californian Metropolitan Area. The data contains a total of N=30 observations, with 5 variables. The response variable is the air quality, and 4 factors consisting of values added from companies, amount of rain fall, coastal location or not, and population density.

In this analysis based on a publicly available dataset of 200,000+ Jeopardy questions, the objective is to determine whether the value of the question is related to the obscurity of the answer. To quantify the response variable of "obscurity" in the 500-question sample pulled from this dataset, a function was implemented that returns the number of Google Search hits for a given answer. Through several 2-factor ANOVA analyses, it was found that the question value did indeed show statistically significant variation in the means with p<<0.05. The 2-factor ANOVA between value and round indicated an almost-significant interaction between value and round.

This project analyzes the factors that affect several departments satisfaction ratings within a financial services firm. The general concept is to examine the effects of these factors on the overall ratings of the departments (response variable). Factors include attributes such as complaint handling, opportunities for learning, and frequency of performance based raises. These measures were quantified by a Likert type multi-level survey, which was distributed to 35 clerical employees throughout 30 different departments. This example involves several statistical models and data exploration tools such as box and interaction plots, as well as the extensive use of ANOVA models. http://rpubs.com/trevorcorrao/217804

In this study, a four-factor, multi-level experiment is performed to test whether factors of "Cyl", "Japanese", "SmartWay", and "Veh.Class" may affect the emission level of CO2. We find data from the link in `data.gov` in the 100+ interesting datasets, a dataset of automobile (model 2017) fuel economy is tested. This dataset contains the preliminary fuel economy values for 2017 model year vehicles from the Environmental Protection Agency’s National Vehicle and Fuel Emissions Laboratory in Ann Arbor, Michigan. Our main purpose in this study is to test whether the factors with different cars, which we choose based on Best Guess, can affect the level of emission of CO2. By using ANOVA test, we test the significance level of main factors and interactions. According to the results, four main factors are all significant at 1% level. For the interactions,'Japanese x Veh.Class','Japanese x SmartWay', 'Cyl x Veh.Class','Cyl x SmartWay' and 'Veh.Class x Smartway' are statistically significant at 5% level. The interaction factor 'Japanese x Cyl' are not significant. http://rpubs.com/zhaol11/217816

The recipe analyzes the factors that effect the score of extroversion participants achieved on an online personality test. In order to take the test participants were required to provide certain information about themselves to include: age, gender, race, hand dominance, and native language. The test consisted of 50 Likert questions, each corresponding to a specific personality trait. For the purposes of this recipe only the questions relating to extroversion were considered. The four factors chosen were age, gender, hand dominance and native language. A combination of graphical analysis and ANOVA testing was used to analyze this fixed effects model. 

Study of 4 multi level factors and their effect on a response variable. Data courtesy of Forbes regarding the top 2000 companies in terms of financial data. Mechanics ranging from descriptive statistics, various plots, analysis of variance, and interaction plots of used. Due to the nature of the dataset, factors are found to be highly predictive. To review this project, please follow this link! http://rpubs.com/byeonb/andreasv3

This study is a factorial design experiment that uses a fixed effect model to examine disbursements made for the 2016 election. The effect of factors: candidate, state, type of disbursement, and year were studied on the response variable, disbursement amount. Exploratory analysis was first done using boxplots for each factor. The main effect was then calculated for each factor. Analysis of variance was used to determined whether explained variance > unexplained variance for main effects and interaction effects. The main effects for state and type of disbursement as well as many of the interaction effects demonstrated statistically significant differences. The study culminated with interaction plots for the two-factor interactions. To make confidently make inferences about the effects of these factors, model accuracy checks need to be conducted.  (This was revised on 121616 to display the results.) 

This not-so-secret recipe is a multi-factor, multi-level factorial design experiment that uses data drawn from 1987 of selected married couples with nonnegative family total income, where wife was of working age (18-64) and not self-employed. This study analyzes the effect of 4 factors: owned, nonwhite, mortgage, occupation on the response variable, hours (labor supply of married females measured in hours per year). Analysis of variance test was conducted to analyze the significance level of the main and interaction effects. http://rpubs.com/dededolkar/217843

This analysis focuses on understanding how different factors affect the number of days a student is absent from school. The data set classifies children from Walgett, New South Wales, Australia by culture, age, sex, and learning status, and records the response of number of days absent from school in a particular school year. The factors included in this study were ethnicity, age, gender, and learning status- each with two or more levels. Multifactor analysis of variance was used to determine the effects of each of the factors as well as their interaction effects on number of days absent. http://rpubs.com/rajanideshpande/217857

This recipe is a study of 4 factors and their effect on a response variable. The data set is breaches from the Ecdat package in R. The data set consists of cyber security breaches for health care information. The factors utilized were state, location of breach, type of breach, and business associates if any, Analysis of variance test was utilized to analyze the significance level of the main and interaction effects. http://rpubs.com/mjshahir/217862

This report contributes to the limited number of investigations into the influence of socioeconomic characteristics on travel patterns for inhabitants of rural areas and small communities. Employing survey data from the National Household Travel Survey, we have evaluated the effect of socioeconomic determinants on daily trips made per household. A randomized multi factor design, along analysis of variance are used to assess the influence of various covariates. http://rpubs.com/trilcemarie/nhts_nys_nonmsa

Analysis of negative habits or addictives (ie. drug use, tobacco and alcohol consumption) are seen to negatively affect the health of the Americans. A survey that captured 55,602 respondents from the National Household Survey on Drug Use and Health (NSDUH) analyzed the relationship between gender, criminal activity and mental illness in the United States. This database was used to explore the significant differences among the population, based on demographic and addiction-related variables. ANOVA was used to test the statistical significance of these differences where the response variable was depression. http://rpubs.com/dgramirez82/DoEProject1

The objective of this experiment is the to test the variation of car price in response to the changes in country of brand, type of car, weight of car, and mileage. Factorial design was chosen as the experimental strategy of this study. The "rpart::car.test.frame" data set residing in the Ecdat package of the R programming language was utilized. Factors except country are found to have significant main effect on the price of car, and interaction effect exists among factors. However, only the interaction between country and type, and between type and weight are significant. ~Yage Ding - http://rpubs.com/dingy2/216646