R-squared (Coefficient of Determination)

In regression output...

R2 = SSR / SST
R2 = 362018 / 458019
R2 = 0.79040

Okay... but actually explain it to me now.

R2 shows us how much variation in the response variable can be explained by the predictor variable(s).

Scenario: Crammer Nation University wants to develop a regression equation to predict the "Number of Recruits" a given fraternity will receive this rush season given the "Parties Thrown" by the fraternity the previous year. They take a sample of 6 fraternities on campus, resulting in the following scatterplot with line of best fit.

The purpose of each Sum of Squares is...

  • SSE is the unexplained variation.
  • SSR is the explained variation.
  • SST is the total variation. (unexplained + explained)

SSR = (-3.8)2 + (-3.0)2 + (-1.4)2 + (+1.1)2 + (+2.7)2 + (+4.4)2
SSR = (14.44) + (9.00) + (1.96) + (1.21) + (7.29) + (19.36)
SSR = 53.26

SST = (-5)2 + (-1)2 + (-4)2 + (+3)2 + (+5)2 + (+2)2
SST = (25) + (1) + (16) + (9) + (25) + (4)
SST = 80

R2 = SSR / SST
R2 = 53.26 / 80
R2 = 0.66575

Answer: 66.575% of the variation in "Number of Recruits" can be explained by the "Parties Thrown" by the fraternity the previous year.

We want to strive for R2 to be as close to 1.00 as possible!

Another way to calculate...

The purpose of each Sum of Squares is...

  • SSE is the unexplained variation.
  • SSR is the explained variation.
  • SST is the total variation. (unexplained + explained)

SSE = (-1.2)2 + (+2.0)2 + (-2.6)2 + (+1.9)2 + (+2.3)2 + (-2.4)2
SSE = (1.44) + (4.00) + (6.76) + (3.61) + (5.29) + (5.76)
SSE = 26.86

R2 = 1 - (26.86 / 80)
R2 = 1 - (0.33575)
R20.66575

Slight variation (≈) due to simplicity of this scenario for ease of learning.

Visualizing R2 on data point

For this data point...

  • SSR is the variation explained by our regression.
  • SSE is the variation left unexplained (a.k.a. due to "error").
  • SST is the total variation. (unexplained + explained)

When we do this for all of the data points...

R2 = SSR / SST

...and get the following result above...

R2 = 0.66575

For all data points, our regression (SSR) is explaining 66.575% of the total variation (SST) in the response variable.

Activate AutoScroll