The Coefficient of Determination

In simple linear regression

$$\tag{1} y = f(x) = c_0 + c_1 x, $$

some sums of squares are as follow

$$\tag{2} {\rm SSR} = \sum_{i = 1}^N (\hat{y}_i - \bar{y})^2, $$

for regression sum of squares (SSR),

$$\tag{3} {\rm SSE} = \sum_{i = 1}^N (y_i - \hat{y}_i)^2, $$

for error sum of squares (SSE), and

$$\tag{4} {\rm SST} = \sum_{i = 1}^N (y_i - \bar{y})^2, $$

for total sum of squares (SST), where

$$\tag{5} \hat{y}_i = f(x_i) = c_0 + c_1 x_i $$

is the predicted value using the model and $(x_i, y_i)$ is observed values. Then, the coefficient of determination $r^2$ is defined as

$$\tag{6} R^2 = \frac{\rm SSR}{\rm SST} = 1 - \frac{\rm SSE}{\rm SST} $$

since

$$\tag{7} {\rm SST} = {\rm SSR} + {\rm SSE}, $$

where value of $R^2$ is between $0$ and $1$. If $R^2 = 1$, all of the data points fall perfectly on the regression line, but if $R^2 = 0$ the estimated regression line is perfectly horizontal. In the former case the predictor $x$ accounts for all variation in $y$, while in the later it accounts for none.

  • John Haubrick, “The Coefficient of Determination, r-squared”, STAT 462 Applied Regression Analysis, Lesson 2, Chapter 2.5, Pennsylvania State University, 2022, url https://online.stat.psu.edu/stat462/node/95/ [20221125].