The Coefficient of Determination

November 25, 2022 One-minute read

Sparisoma Viridi

sum-of-squares • least-square

In simple linear regression

$\tag{1} y = f(x) = c_0 + c_1 x,$

some sums of squares are as follow

$\tag{2} {\rm SSR} = \sum_{i = 1}^N (\hat{y}_i - \bar{y})^2,$

for regression sum of squares (SSR),

$\tag{3} {\rm SSE} = \sum_{i = 1}^N (y_i - \hat{y}_i)^2,$

for error sum of squares (SSE), and

$\tag{4} {\rm SST} = \sum_{i = 1}^N (y_i - \bar{y})^2,$

for total sum of squares (SST), where

$\tag{5} \hat{y}_i = f(x_i) = c_0 + c_1 x_i$

is the predicted value using the model and $(x_i, y_i)$ is observed values. Then, the coefficient of determination $r^2$ is defined as

$\tag{6} R^2 = \frac{\rm SSR}{\rm SST} = 1 - \frac{\rm SSE}{\rm SST}$

since

$\tag{7} {\rm SST} = {\rm SSR} + {\rm SSE},$

where value of $R^2$ is between $0$ and $1$ . If $R^2 = 1$ , all of the data points fall perfectly on the regression line, but if $R^2 = 0$ the estimated regression line is perfectly horizontal. In the former case the predictor $x$ accounts for all variation in $y$ , while in the later it accounts for none.

John Haubrick, “The Coefficient of Determination, r-squared”, STAT 462 Applied Regression Analysis, Lesson 2, Chapter 2.5, Pennsylvania State University, 2022, url https://online.stat.psu.edu/stat462/node/95/ [20221125].