The Coefficient of Determination

In simple linear regression

y=f(x)=c0+c1x,(1)\tag{1} y = f(x) = c_0 + c_1 x,

some sums of squares are as follow

SSR=i=1N(y^iyˉ)2,(2)\tag{2} {\rm SSR} = \sum_{i = 1}^N (\hat{y}_i - \bar{y})^2,

for regression sum of squares (SSR),

SSE=i=1N(yiy^i)2,(3)\tag{3} {\rm SSE} = \sum_{i = 1}^N (y_i - \hat{y}_i)^2,

for error sum of squares (SSE), and

SST=i=1N(yiyˉ)2,(4)\tag{4} {\rm SST} = \sum_{i = 1}^N (y_i - \bar{y})^2,

for total sum of squares (SST), where

y^i=f(xi)=c0+c1xi(5)\tag{5} \hat{y}_i = f(x_i) = c_0 + c_1 x_i

is the predicted value using the model and (xi,yi)(x_i, y_i) is observed values. Then, the coefficient of determination r2r^2 is defined as

R2=SSRSST=1SSESST(6)\tag{6} R^2 = \frac{\rm SSR}{\rm SST} = 1 - \frac{\rm SSE}{\rm SST}

since

SST=SSR+SSE,(7)\tag{7} {\rm SST} = {\rm SSR} + {\rm SSE},

where value of R2R^2 is between 00 and 11. If R2=1R^2 = 1, all of the data points fall perfectly on the regression line, but if R2=0R^2 = 0 the estimated regression line is perfectly horizontal. In the former case the predictor xx accounts for all variation in yy, while in the later it accounts for none.

  • John Haubrick, “The Coefficient of Determination, r-squared”, STAT 462 Applied Regression Analysis, Lesson 2, Chapter 2.5, Pennsylvania State University, 2022, url https://online.stat.psu.edu/stat462/node/95/ [20221125].