06-12-2012, 01:17 PM
Regression and Correlation
1Regression.ppt (Size: 280 KB / Downloads: 399)
Scatter Diagram
Scatter diagram is a graphical method to display the relationship between two variables
Scatter diagram plots pairs of bivariate observations (x, y) on the X-Y plane
Y is called the dependent variable
X is called an independent variable
Is there a linear relationship between BMI and BW?
Scatter diagrams are important for initial exploration of the relationship between two quantitative variables
In the above example, we may wish to summarize this relationship by a straight line drawn through the scatter of points
Simple Linear Regression
Although we could fit a line "by eye" e.g. using a transparent ruler, this would be a subjective approach and therefore unsatisfactory.
An objective, and therefore better, way of determining the position of a straight line is to use the method of least squares.
Using this method, we choose a line such that the sum of squares of vertical distances of all points from the line is minimized.
Least-squares or regression line
These vertical distances, i.e., the distance between y values and their corresponding estimated values on the line are called residuals
The line which fits the best is called the regression line or, sometimes, the least-squares line
The line always passes through the point defined by the mean of Y and the mean of X
Correlation Coefficient, R
R is a measure of strength of the linear association between two variables, x and y.
Most statistical packages and some hand calculators can calculate R
For the data in our Example R=0.94
R has some unique characteristics
Coefficient of Determination
R2 is another important measure of linear association between x and y (0 £ R2 £ 1)
R2 measures the proportion of the total variation in y which is explained by x
For example r2 = 0.8751, indicates that 87.51% of the variation in BW is explained by the independent variable x (BMI).
Limitations of the correlation coefficient
Though R measures how closely the two variables approximate a straight line, it does not validly measures the strength of nonlinear relationship
When the sample size, n, is small we also have to be careful with the reliability of the correlation
Outliers could have a marked effect on R
Causal Linear Relationship
Logistic Regression
Logistic Regression is used when the outcome variable is categorical
The independent variables could be either categorical or continuous
The slope coefficient in the Logistic Regression Model has a relationship with the OR
Multiple Logistic Regression model can be used to adjust for the effect of other variables when assessing the association between E & D variables