Goodness of Fit – R-squared (R²) and Adjusted R-squared:
In regression analysis, R-squared (R²) and adjusted R-squared are two important measures of goodness of fit that assess how well the regression model explains the variation in the dependent variable.
R-squared (R²):
R-squared is a statistical measure that represents the proportion of the variance in the dependent variable (Y) that is explained by the independent variables (X₁, X₂, …, Xₚ) in the regression model. It is calculated as the ratio of the explained variance to the total variance:
R² = Explained Variance / Total Variance
R² takes values between 0 and 1. A value of 0 indicates that the model explains none of the variation in the dependent variable, while a value of 1 indicates that the model explains all of the variation. A higher R² value indicates a better fit of the model to the data, as it suggests that a larger proportion of the variance in Y is accounted for by the independent variables.
However, R² has a limitation – it tends to increase with the addition of more independent variables, even if those variables do not significantly improve the model. This can lead to overfitting. To address this, the adjusted R-squared is used.
Adjusted R-squared:
Adjusted R-squared (R²_adj) is a modified version of R-squared that penalizes the inclusion of additional independent variables. It is adjusted based on the number of independent variables and the sample size. The formula for adjusted R-squared is:
R²_adj = 1 – [(1 – R²) * (n – 1) / (n – p – 1)]
where n is the sample size and p is the number of independent variables.
The adjusted R-squared provides a more realistic assessment of model fit, as it decreases if the addition of an independent variable does not contribute significantly to explaining the variation in the dependent variable. It tends to be lower than R-squared when multiple independent variables are included in the model.
Partial Regression Coefficients:
Partial regression coefficients, also known as partial slopes or partial effects, represent the change in the dependent variable associated with a one-unit change in an independent variable while holding all other independent variables constant. In a multiple regression model, there is a partial regression coefficient for each independent variable.
For example, in a regression model to predict a person’s salary (Y) based on their years of education (X₁) and years of work experience (X₂), the partial regression coefficient for X₁ would indicate how much the salary is expected to change for a one-year increase in education, holding work experience constant. Similarly, the partial regression coefficient for X₂ would indicate how much the salary is expected to change for a one-year increase in work experience, holding education constant.
Partial regression coefficients allow us to understand the unique contribution of each independent variable to the dependent variable’s variation while controlling for the effects of other variables in the model. They are essential for drawing meaningful conclusions about the relationships between the variables in multiple regression analysis.