Recipes for the Design of Experiments/Chapter 6: The Analysis of Covariance

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Chapter 6: The Analysis of Covariance

In this recipe, data taken from "iris.csv" will be analyzed. An analysis of covariance (ANCOVA) will evaluate if the means of the dependent variable, petal area (cm^2), are equal for all levels of the independent, categorical variable, species, while controlling for the effects of the continuous explanatory variables, sepal area (cm^2) and petal ratio.

In this study, we design an experiment to investigate the influencing factors of Cigarette Consumption in the US from 1985 to 1995. To do so, the dataset “Cigarette” under the “Ecdat” package in R was used and to exam whether the variations in average tax, state personal income or state have an effect on the variation of cigarette consumption per capita. Wei Zou

For this project, the dataset under analysis includes data on United States Crime Rates 1960 - 2012. This experiment will seek to understand which factors have an effect on total crime rates for these years. Analysis of Covariance is used to determine interaction effects as well as individual factor effects of Population, # of Violent Crimes, and factor levels of Robbery on the response variable "Total Number of Crimes". [1]

This recipe focuses on the application of an analysis of covariance. The data was from the Forbes 500 list, with ~79 companies included. The response variable of interest was the amount of assets a given company has, whereas the continuous independent variables were profits, cash flow and market value. The "treatment" of interest were the various market sectors. It was found that the data was not normal, and therefore multiple transformations were completed to better fit the data.

The following recipe is an analysis of co-variance on a data set involving cigarette sales in the US. The study collects data between 1985 to 1992 in the U.S. and records demographic statistics of that state as well as price and consumer price index for that state and year.This analysis looks to analyze the effect of the factor of disposable income per capita on the response variable of sales in packs per capita by state. Since this is an analysis of co-variance we also use two continuous explanatory variables which are price of cigarettes by pack and minimum price of cigarettes in adjacent states to increase the precision of our analysis.

The purpose of this experiment is to investigate the amount of Superplasticizer used in making concrete as dependent on the overall age(in days) of the concrete mixture. The analysis uses the strength (in MPa) and amount of water used in the mixture as the explanatory variables (cannot be controlled).Essentially we are trying to compare the different amounts of Superplasticizer used(kg in m^3 mixture) and their effect on the overall age of the mixture. Now strength and water are known to be affected by the proportion of Superplasticizer used in making the concrete mixture but we want to see its effect on the age of the mixture (in days).For this anlysis we use analysis of covariance (ANCOVA) technique.The entire analysis can be accessed at:

The following recipe for experimental design is an example of proper usage of an analysis of covariance (ANCOVA) to monitor the effects of two uncontrollable continuous variables as well as one categorical value on a single response variable. The data being used in this recipe is a collection of measurements of sepal and petal lengths of different flower species.

The following experiment conducts an analysis of co-variance (ANCOVA) on the Computers dataset from the Edcat package in R. The goal is to examine if computer speed, hard drive size, and screen size are tested to see if they are explanatory variables for computer price. A linear model is created, ANCOVA and Tukey tests are performed, the models are checked for adequacy, and contingencies are discussed for assumptions that are not met.[2]

In this analysis, we looked at the Cigars dataset within the Edcat package within R. The analysis looked at the relationship of variance between sales of cigarettes on the explanatory variables - consumer price index (CPI), minimum price in adjoining states per pack of cigarettes, and population above age 16. An ANCOVA test was performed and it was seen that we rejected the null hypothesis that the three explanatory variables had no significant difference on the response variable.

Following recipe is an example of analysis of covariance. Here we choose the data set from R package "Edcat", statistics are about budget share of food for Spanish households. We investigate how size of town people live affect percentage of total expenditure spent on food, with age of reference person and total expenditure of the household as independent explanatory variables. Through ANCOVA we could have an idea about how explanatory variables and a single factor together affect response variable. Exploratory data analysis, ANCOVA testing, and model adequacy checking are included.

In this study, a single-factor, multi-level experiment is be performed (using Lahman's Baseball Database) to see if the number of hits earned by a given team in a given season ('H') has a statistically significant effect on the number of losses that a given team earns in a given season ('L') [which is the response variable that the analysis considers]. In addition to analyzing the treatment 'H', two explanatory variables are also considered here. These explanatory variables refer to the number of strikeouts that a team earned in total in a given season ('SO') and the earned-run average of a given team in a given season ('ERA'). In determining this level of significance, an ANCOVA analysis is performed, Tukey Honest Significant Differences are computed, and modeling assumptions are verified. [3]

This recipe is examining Ship Accidents from the Ecdat package.Using this data set, we are testing the effect of one factor and 2 explanatory variables on the single response variable, accidents.

This recipe is a study of the Airq dataset from the Ecdat package. This recipe focuses on the application of analysis of covariance. The response variable that was examined was in regards to the air quality level in metropolitan Californian areas. The continuous independent variables were rain, added value and population density. The factor of interest were the varying locations of the measurements and whether or not they were taken from coastal locations or not.

The following analysis of covariance uses two-factor ANCOVA to examine the effect of geographic region on wealth of billionaires across the world using age of the billionaire as a covariate. [4]