In Figure 3.3 it is shown that the deviation of an individual y value from its mean can be
partitioned into deviation of the predicted value from the mean and the deviation of the observed value from the predicted value
We square each side of the equation-because the sum of deviations about the mean is equal to zero-and sum the results over all n points
Some of you may note the squaring of the right- hand side should include the cross product of the two terms in addition to their squared quantities. It can be shown that the cross predicted term goes to The explanatory power of a linear regression equation zero. This equation is expressed as
We see that the total variability-SST- consists of two components-SSR-the amount of variability explained by the regression equation- named “Regression Sum of Squares” and –SSE-random or unexplained deviation of points from the regression line-named “Error Sum of Squares”. Thus
Total sum of squares:
Regression Sum of Squares:
Error Sum of Squares:
For a given set of observed values of the dependent variables, y, the SST is fixed as the total variability of all observations from the mean. We see that in the partitioning larger values of SSR and hence smaller The explanatory power of a linear regression equation value of SSE indicate a regression equation that “fits” or comes closer to the observed data. This partitioning is shown graphically in Figure 3.3.
Let us find SST, SSR and SSE for the data on incomes and food expenditure.
Using calculation given in the table 3.3 we find the value of total sum of squares as
|10.3884 14.0872 6.6896 11.4452 5.1044 8.5390 7.7464||-1.3884 0.9128 0.3104 -0.4452 -0.1044 -0.5390 1.2536||4.7143 18.7143 -9.2857 8.7143 -15.286 -2.2857 -5.2857||22.2246 350.225 86.2242 75.9390 233.653 5.2244 27.9386||1.9277 0.8332 0.0963 0.1982 0.0109 0.2905 1.5715|
The error sum of squares SSE is given in the sum of the eights column in Table 3.4. Thus,
The regression sum of squares can be found from .
The value of SSR can also be computed by using the formula.(Check!!)
The total sum The explanatory power of a linear regression equation of squares SST is a measure of the total variation in food expenditures, SSR is the portion of total variation explained by the regression model (or by income), and the error sum of squares SSE is the portion of total variation not explained by the regression model.
3.6.1. Coefficient of determination
If we divide both side of the equation
by SST, we obtain
We have seen that the fit of the regression equation to the data is improved as SSR increases and SSE decreases. The ratio provides a descriptive measure of the proportion or percent of the total variability that is The explanatory power of a linear regression equation explained by the regression model. This measure is called the coefficient of determination-or more generally .
The coefficient of determination is often interpreted as the percent of variability in y that is explained by the regression equation. We see that increases directly with thespread of the independent variable.
can vary from 0 to 1 sinceSST is fixed and . A larger implies a better regression, everything else being equal.
Interpretation of : About of the sample variation in y (measured by the total sum of squares of deviations of the sample y values about their mean ) can be explained by using x to The explanatory power of a linear regression equation predict y in the straight line model.
Calculate the coefficient of determination for the data on monthly incomes and food expenditures of seven households.
From earlier calculations
We can state that 92% of the variability in y is explained by linear regression, and the linear model seems very satisfactory in this respect. In other words, we can state that 92% of the total variation in food expenditures of households occurs because of the variation in their incomes, and the remaining 8% is due to other variables, like differences in size of the household, preferences and tastes and so The explanatory power of a linear regression equation on.