04-10-2012, 03:28 PM
Multicollinearity and Endogeneity
Multicollinearity and Endogeneity.pdf (Size: 553.48 KB / Downloads: 906)
Ballentines with 2 Regressors
• size of circles gives variance.
• overlap gives identification of estimator.
• size of overlap gives variance of estimator.
• where’s the error term?
• 3 possible estimators of coefficients:
– (x1,y) overlap, (x2,y) overlap (use x1,x2 overlap twice;
– divide the (x1,x2) overlap
– disgard the (x1,x2) overlap
• What happens if x1,x2 have higher covariance?
Violating Assumption 6:
• Recall we assume that no independent variable is a perfect linear function of any
other independent variable.
– If a variable X1 can be written as a perfect linear function of X2 , X3 , etc., then we say
these variables are perfectly collinear.
– When this is true of more than one independent variable, they are perfectly
multicollinear.
• Perfect multicollinearity presents technical problems for computing the least
squares estimates.
( , ) 0 1 2 Cov X X ¹
– Example: suppose we want to estimate the regression:
Yi = β0 + β1X1i + β2X2i + εi where X1 = 2X2 + 5.
That is, X1 and X2 are perfectly collinear. Whenever X2 increases by one unit, we see X1
increase by 2 units, and Y increase by 2β1 + β2 units. It is completely arbitrary whether
we attribute this increase in Y to X1, to X2, or to some combination of them. If X1 is in the
model, then X2 is completely redundant: it contains exactly the same information as X1
(if we know the value of X1, we know the value of X2 exactly, and vice versa). Because
of this, there is no unique solution to the least squares minimization problem. Rather,
there are an infinite number of solutions.
– Another way to think about this example: β1 measures the effect of X1 on Y, holding X2
constant. Because X1 and X2 always vary (exactly) together, there’s no way to estimate
this.
mperfect Multicollinearity
• It is quite rare that two independent variables have an exact linear relationship
– it’s usually obvious when it does happen: e.g., the “dummy variable trap”
• However it is very common in economic data that two (or more) independent variables
are strongly, but not exactly, related
– in economic data, everything affects everything else
• Example:
– perfect collinearity: X1i = α0 + α1X2i
– imperfect collinearity: X1i = α0 + α1X2i + ζi where ζi is a stochastic error term
• Examples of economic variables that are strongly (but not exactly) related:
– income, savings, and wealth
– firm size (employment), capital stock, and revenues
– unemployment rate, exchange rate, interest rate, bank deposits
• Thankfully, economic theory (and common sense!) tell us these variables will be
strongly related, so we shouldn’t be surprised to find that they are ...
• But when in doubt, we can look at the sample correlation between independent
variables to detect imperfect multicollinearity
• When the sample correlation is big enough, Assumption 6 is “almost” violated
Consequences of Multicollinearity
• Least squares estimates are still unbiased
• recall that only Assumptions 1-3 of the CLRM
(correct specification, zero expected error,
exogenous independent variables) are
required for the least squares estimator to be
unbiased
• since none of those assumptions are violated,
the least squares estimator remains unbiased
• The least squares estimates will have big standard errors
• this is the main problem with multicollinearity
• we’re trying to estimate the marginal effect of an
independent variable holding the other independent variables
constant.
• But the strong linear relationship among the independent
variables makes this difficult – we always see them move
together
• That is, there is very little information in the data about the
thing we’re trying to estimate
• Consequently, we can’t estimate it very precisely: the
standard errors are large
Detecting Multicollinearity
• It’s important to keep in mind that most economic variables are correlated to some
degree
– that is, we face some multicollinearity in every regression that we run
• The question is how much? And is it a problem?
• We’ve seen one method of detecting collinearity already: look at the sample
correlation between independent variables.
– rule of thumb: sample correlation > 0.8 is evidence of severe collinearity
– problem: if the collinear relationship involves more than 2 independent
variables, you may not detect it this way
• Look at Variance Inflation Factors (VIFs)