25-07-2012, 01:33 PM
Least Squares.
LeastSquares-pretty.pdf (Size: 73.52 KB / Downloads: 128)
Introduction
The least square method—a very popular technique—is used to compute
estimations of parameters and to fit data. It is one of the oldest techniques of
modern statistics as it was first published in 1805 by the French mathematician
Legendre in a now classic memoir. But this method is even older because it
turned out that, after the publication of Legendre’s memoir, Gauss, the famous
German mathematician, published another memoir (in 1809) in which he mentioned
that he had previously discovered this method and used it as early as
1795. A somewhat bitter anteriority dispute followed (a bit reminiscent of the
Leibniz-Newton controversy about the invention of Calculus), which, however,
did not diminish the popularity of this technique. Galton used it (in 1886) in his
work on the heritability of size which laid down the foundations of correlation
and (also gave the name) regression analysis. Both Pearson and Fisher, who did
so much in the early development of statistics, used and developed it in different
contexts (factor analysis for Pearson and experimental design for Fisher).
Functional fit example: regression
The oldest (and still most frequent) use of OLS was linear regression, which
corresponds to the problem of finding a line (or curve) that best fits a set of
data. In the standard formulation, a set of N pairs of observations {Yi,Xi} is
used to find a function giving the value of the dependent variable (Y ) from the values of an independent variable (X). With one variable and a linear function,
the prediction is given by the following equation.
The geometry of least squares
OLS can be interpreted in a geometrical framework as an orthogonal projection
of the data vector onto the space defined by the independent variable.
The projection is orthogonal because the predicted values and the actual values
are uncorrelated. This is illustrated in Figure 1, which depicts the case of two
independent variables (vectors x1 and x2) and the data vector (y), and shows
that the error vector (y1 − ˆy) is orthogonal to the least square (ˆy) estimate
which lies in the subspace defined by the two independent variables.
Optimality of least square estimates
OLS estimates have some strong statistical properties. Specifically when (1)
the data obtained constitute a random sample from a well-defined population,
(2) the population model is linear, (3) the error has a zero expected value,
(4) the independent variables are linearly independent, and (5) the error is
normally distributed and uncorrelated with the independent variables (the socalled
homoscedasticity assumption); then the OLS estimate is the best linear
unbiased estimate often denoted with the acronym “BLUE” (the 5 conditions
and the proof are called the Gauss-Markov conditions and theorem). In addition,
when the Gauss-Markov conditions hold, OLS estimates are also maximum
likelihood estimates.
Weighted least squares
The optimality of OLS relies heavily on the homoscedasticity assumption.
When the data come from different sub-populations for which an independent
estimate of the error variance is available, a better estimate than OLS can
be obtained using weighted least squares (WLS), also called generalized least
squares (GLS). The idea is to assign to each observation a weight that reflects
the uncertainty of the measurement. In general, the weight wi, assigned to the
ith observation, will be a function of the variance of this observation, denoted
¾2 i . A straightforward weighting schema is to define wi = ¾−1
i (but other more
sophisticated weighted schemes can also be proposed).