Regression: Meaning, Assumption, Regression Line
Regression is a statistical measurement used in finance, investing and other disciplines that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables).
Regression helps investment and financial managers to value assets and understand the relationships between variables, such as commodity prices and the stocks of businesses dealing in those commodities.
The two basic types of regression are linear regression and multiple linear regressions, although there are non-linear regression methods for more complicated data and analysis. Linear regression uses one independent variable to explain or predict the outcome of the dependent variable Y, while multiple regressions use two or more independent variables to predict the outcome.
Regression can help finance and investment professionals as well as professionals in other businesses. Regression can also help predict sales for a company based on weather, previous sales, GDP growth or other types of conditions. The capital asset pricing model (CAPM) is an often-used regression model in finance for pricing assets and discovering costs of capital.
The general form of each type of regression is:
- Linear regression: Y = a + bX + u
- Multiple regression: Y = a + b1X1 + b2X2 + b3X3 + … + btXt + u
Y = the variable that you are trying to predict (dependent variable).
X = the variable that you are using to predict Y (independent variable).
a = the intercept.
b = the slope.
u = the regression residual.
Regression takes a group of random variables, thought to be predicting Y, and tries to find a mathematical relationship between them. This relationship is typically in the form of a straight line (linear regression) that best approximates all the individual data points. In multiple regression, the separate variables are differentiated by using numbers with subscripts.
ASSUMPTIONS IN REGRESSION
- Independence: The residuals are serially independent (no autocorrelation).
- The residuals are not correlated with any of the independent (predictor) variables.
- Linearity: The relationship between the dependent variable and each of the independent variables is linear.
- Mean of Residuals: The mean of the residuals is zero.
- Homogeneity of Variance: The variance of the residuals at all levels of the independent variables is constant.
- Errors in Variables: The independent (predictor) variables are measured without error.
- Model Specification: All relevant variables are included in the model. No irrelevant variables are included in the model.
- Normality: The residuals are normally distributed. This assumption is needed for valid tests of significance but not for estimation of the regression coefficients.
Definition: The Regression Line is the line that best fits the data, such that the overall distance from the line to the points (variable values) plotted on a graph is the smallest. In other words, a line used to minimize the squared deviations of predictions is called as the regression line.
There are as many numbers of regression lines as variables. Suppose we take two variables, say X and Y, then there will be two regression lines:
- Regression line of Y on X: This gives the most probable values of Y from the given values of X.
- Regression line of X on Y: This gives the most probable values of X from the given values of Y.
The algebraic expression of these regression lines is called as Regression Equations. There will be two regression equations for the two regression lines.
The correlation between the variables depend on the distance between these two regression lines, such as the nearer the regression lines to each other the higher is the degree of correlation, and the farther the regression lines to each other the lesser is the degree of correlation.
The correlation is said to be either perfect positive or perfect negative when the two regression lines coincide, i.e. only one line exists. In case, the variables are independent; then the correlation will be zero, and the lines of regression will be at right angles, i.e. parallel to the X axis and Y axis.
Note: The regression lines cut each other at the point of average of X and Y. This means, from the point where the lines intersect each other the perpendicular is drawn on the X axis we will get the mean value of X. Similarly, if the horizontal line is drawn on the Y axis we will get the mean value of Y.