Econometric Theory/Multicollinearity

From Wikibooks, open books for an open world
Jump to navigation Jump to search

While running regressions on multiple explanatory variables, there often is the problem of two variables having the same effects on the dependent variable. For example, say that the reading level of your father and the reading level of your mother can predict your eventual max reading level (YourLevel = a + b*FatherLevel + c*MotherLevel + u). However, there probably is a high probability that your parents have similar reading levels, so that using both may be redundant. This is a problem known as Multicollinearity. The problem with this is bias in your estimators. Say we can find yours through your mother's level. YourLevel = a + c*MotherLevel + u. If we add on your father's level, we will increase the prediction for your level, although in real life your level will not increase at all.

Detecting Multicollinearity[edit | edit source]

Detecting multicollinearity can be more difficult than in the above example. But the first step that should be taken is an examination of the theory that is being tested. Is it redundant to have both mother's level and father's level? If this does not yield any results, probably because the theory is more complex, causing multicollinearity to be hidden, several econometric techniques can be used to find problems.

1) Large changes in the estimated regression coefficients when a predictor variable is added or deleted. Running the regression first with 'FatherLevel' and then without it may yield large variation, indicating that there is an error.

2) Non-significant results of simple linear regressions. Obviously if we find that with both FatherLevel and MotherLevel that neither are significant, than again there is something strange happening signaling possible multicollinearity.

3) Estimated regression coefficients have an opposite sign from predicted If in a regression with both FatherLevel and MotherLevel, b is positive, but c is negative; we know from theory that a higher reading level of the mother does not cause the child to be a worse reader. This is a possible sign of multicollinearity.

4)formal detection-tolerance or the variation inflation factor (VIF)

A tolerance of less than 0.1 indicates a multicollinearity problem.