# QT/U2 Topic 2 Coefficient of Determination and Correlation

### COEFFICIENT OF DETERMINATION

The coefficient of determination (denoted by R2) is a key output of regression analysis. It is interpreted as the proportion of the variance in the dependent variable that is predictable from the independent variable.

• The coefficient of determination is the square of the correlation (r) between predicted y scores and actual y scores; thus, it ranges from 0 to 1.
• With linear regression, the coefficient of determination is also equal to the square of the correlation between x and y scores.
• An R2 of 0 means that the dependent variable cannot be predicted from the independent variable.
• An R2 of 1 means the dependent variable can be predicted without error from the independent variable.
• An R2 between 0 and 1 indicates the extent to which the dependent variable is predictable. An R2 of 0.10 means that 10 percent of the variance in Y is predictable from X; an R2 of 0.20 means that 20 percent is predictable; and so on.

The formula for computing the coefficient of determination for a linear regression model with one independent variable is given below.

Coefficient of determination. The coefficient of determination (R2) for a linear regression model with one independent variable is:

R2 = { ( 1 / N ) * Σ [ (xi – x) * (yi – y) ] / (σx * σy ) }2

where N is the number of observations used to fit the model, Σ is the summation symbol, xi is the x value for observation i, x is the mean x value, yi is the y value for observation i, y is the mean y value, σx is the standard deviation of x, and σy is the standard deviation of y.

Coefficient of Correlation

• The coefficient of determination, r 2, is useful because it gives the proportion of the variance (fluctuation) of one variable that is predictable from the other variable.
• It is a measure that allows us to determine how certain one can be in making predictions from a certain model/graph.
• The coefficient of determination is the ratio of the explained variation to the total variation.
• The coefficient of determination is such that 0 < r 2 < 1,  and denotes the strength of the linear association between x and y.
• The coefficient of determination represents the percent of the data that is the closest to the line of best fit. For example, if r = 0.922, then r 2 = 0.850, which means that 85% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation).  The other 15% of the total variation in y remains unexplained.
• The coefficient of determination is a measure of how well the regression line represents the data. If the regression line passes exactly through every point on the scatter plot, it would be able to explain all of the variation. The further the line is away from the points, the less it is able to explain.