Recipes for the Design of Experiments/Chapter 9: Response Surface Methods

The dataset under analysis includes data on potential medical indicators of type 2 diabetes for 403 patients. Using response surface methods, the experiment will test which of 4 factors suggest a predisposition to the onset of type 2 diabetes as measured by the response variable, percent glycosolated hemoglobin. The World Health Organization suggests using a glycosolated hemoglobin threshold of at least 6.5% to determine the presence of type 2 diabetes. Due to the nature of the experimental design (3^k), the 4 predictor variables needed to be reduced to three levels for use in the linear model. This was performed by defining values less than the individual column quantiles as “1”, values greater than or equal to the first quantile and less than or equal to the third quantiles as “2”, and values greater than the third quantiles as “3”. [1]

The following recipe uses data from the R package, Ecdat. The dataset has wage information for young American males in the United States. Along with wages, the dataset also includes various characteristics, such as years of schooling, residence, occupation, ethnicity, etc. These four factors were used in response surface methods to examine which may help to explain the variation in the log of hourly wages. It was found that both ethnicity and years of schooling have significant main effect, and the interaction between schooling and residence is also significant. Once the stationary points were found, they were classified as minima, maxima, ridges, saddle points, etc. [2]

The following recipe contains an experimental design created to determine the effect of four explanatory factors, each with more than two levels, on the amount of sugar cane harvested (tonnes/hectare). A linear model with be used to analyze if the variation in District Group, Position, Age, and Harvest Month effects the variation of the amount of harvested sugar. A response surface method will be employed to estimate the residuals, coefficients and ANOVA to analyze the main effects of the four factors. It was found that only Harvest Month had an effect on the amount of sugar cane harvested. <http://rpubs.com/maxwinkelman/46036>

The following recipe demonstrates proper usage of response surface methods to analyze the effects of four factors on a dingle response variable. A 3^k design is used in this experimental analysis, and the R package "rsm" was used to perform the analysis. The data set being analyzed is a set of crime rate statistics in the United States in 2012. [3]

This experiment is to analyze the effect of various constituents on the overall strength of concrete mixture used for construction purposes. We are trying to determine the most optimal setting for this experiment by using response surface models for analysis. Each factor has more than 4 levels and the null hypothesis is that there is no effect on the strength of the mixture due to the variability in any of these factors. However, from the experimental results we reject the null hypothesis because there are some significant main effect and interaction effects. This analysis can be accessed on the following link: http://rpubs.com/Uzma_1004/45921

The following recipe uses data from the R package, Ecdat. The dataset has information regarding the test scores of students in California from 1998-1999. The dataset also provides many other columns of data that might be useful for conducting an experiment. Four factors specifically were analyzed in regards to their effect on the outcome of the scores: number of teachers, number of computers, percentage of students that qualify for reduced-price lunch, and district average income. These factors were used in response surface methods to estimate the residuals, coefficients, and ANOVA to analyze the main effects of the four factors on the response variable, average cumulative test scores. http://rpubs.com/macchm/46054

This recipe uses a response surface method while analyzing a data set about drunk driving and fatality rates because of it. Using 4 factors (beer tax, minimum drinking age, average miles traveled, and unemployment rate) with 3 levels each a linear model is created and a response surface method approach is used to perform analysis of variance as well as identify a stationary point with an optimal response variable (min or max). [4]

The following recipe is about computer price and possible factors that might explain variance of price. In this analysis we create a linear model and use response surface method to test how selected factors influence computer price. Selected factors are: clock speed in MHz (speed), size of hard drive in MB (hd), size of Ram in MB (ram) and size of screen in inches (screen). Use these four factors to construct a linear model containing second order effects, then use 'rsm' for parameter estimation and analyze stationary points in different response surface plots. http://rpubs.com/chenh16/46081

In this study, we use the response surface method to exam the effect of "carat", "colour", "clarity" and "certification" on the diamond prices in Singapore, and used the method to determine the optimum state. The result shows that the variation in diamond stone prices are not due to sample randomization only, and the main effects, two-way interaction effects, and pure quadratic effects of the selected variables are analyzed. In the optimum price analysis, a saddle point is achieved and no local maximum/minimum is found. [5]

In this study, a four-factor, multi-level experiment is performed to see if the number of hits ('H'), the number of homeruns ('HR'), the number of strikeouts ('SO'), or the number of walks ('BB') earned by a given team in a given season has a statistically significant effect on the number of losses ('L') that a given team earns in a given season (which is this analysis' response variable). An analysis of variance (ANOVA) is performed as a means for determining the significance of these factors with regard to the response variable 'L', and response surface methods are used to determine the optimal operating characteristics for the response variable 'L' [as it corresponds to a linear model] with relation to the characterization of its second-order response surface and its stationary point. [6]

This recipe is examining the pricing the C's of diamond stones dataset from the Ecdat package. We are looking at how 4 different factors, each with 2 or more levels effect the price in Singapore.This method is used to model and analyze problems in which a response of interest (price of diamonds) is influenced by several variables (carat, colour, clarity, certification) and the aim is to optimize this response. From the response surface method we can find if there are any maximums, minimums, saddle points, or ridges. - Cheryl Tran http://rpubs.com/tranc3/46123

The following recipe examine the Computers dataset from the Ecdat package. Response Surface methods are used to determine the effect four factors with 3 or more levels, hard drive size, screen size, ram size and speed have on the computer price. The factor levels are manipulated so the response surface method can be used. The goal is to determine if computer speed and screen, ram and hard drive size can explain the variation in speed through an ANOVA. As usual, statistical analysis is performed, model adequacy is checked, and contingencies are discussed. [7]

2016 Projects

Bjarke H This experiment analyzes the Star data set from the Ecdat package. The main focus is response surface methods using the model from project 3 as well as adding continuous variables. The response surface is computed using the RSM package. http://rpubs.com/bjarke1/project4

Cole K This recipe analyzes the Housing data set from the Ecdat package. The Housing data set looks at the sales prices of houses in Windsor. It is a data set consisting of a cross-section from 1987 with 546 observations. The goal of this recipe is to use the RSM package in R to optimize the model create in Chapter 7: Fraction Factorial Methods. This is done visually by using both contour and perspective plots and then checking the model's adequacy. http://rpubs.com/Kristencole3/235877

Deshpande RThis analysis focuses on understanding how different attributes of a student affect his or her school attendance. The data set includes 30 attributes of 649 students from Portugal in 2008, but the experiment focuses on gender, involvement in extracurricular activities, alcohol consumption, and travel time to school. The effect of these four independent variables on number of absences is analyzed in this study using a RSM techniques. The analysis looks at main effects, two-factor interactions, and purely quadratic terms. Contour plots and visuals are also used to understand the RSM results. http://rpubs.com/rajanideshpande/235909

Trevor C. This recipe analyzes Motor Trend data that highlights performance and design factors of automobiles. Some main factors are engine type, transmission type, car weight, and number of cylinders. The effects of these factors are studied using the RSM techniques to primarily analyze main effects and two-factor interaction effects. We use fractional factorial design and linear ANOVA design as baseline methods. Contour plots and perspective plots are provided to help readers visualize and understand the data and the performed analysis. http://rpubs.com/trevorcorrao/235911

Benjamin B. The project looks into the cars dataset, which consists of 93 observations. First, using a fractional factorial design, it looks into which factors affect the price. There are two 2 level factors and two 3 level factors, which are decomposed into 2 level factors. The factors include airbags, drive, man.trans, and origin. Then, it analyzes the results based on the aliased structure, me, ie, and ANOVA. Next, after examining the me, aliasing, and ANOVA results, a model is formulated. Using this model, it also expands the case to response surface methods. The results were as expected, where drive, man.trans, and origin has the greatest effect on the price of cars. Both the fractional factorial design and the response surface methods were consistent. http://rpubs.com/byeonb/doeproject4_ver1 [8]

1. Trevor Manzanares http://rpubs.com/manzat/45910
2. Jane Braun http://rpubs.com/braunj6/46024