Correlation Coefficient, Assumptions of Correlation Coefficient

Correlation coefficient is a statistical measure that calculates the strength of the relationship between the relative movements of the two variables. The range of values for the correlation coefficient bounded by 1.0 on an absolute value basis or between -1.0 to 1.0. If the correlation coefficient is greater than 1.0 or less than -1.0, the correlation measurement is incorrect. A correlation of -1.0 shows a perfect negative correlation, while a correlation of 1.0 shows a perfect positive correlation. A correlation of 0.0 shows zero or no relationship between the movements of the two variables.

While the correlation coefficient measures a degree of relation between two variables, it only measures the linear relationship between the variables. The correlation coefficient cannot capture nonlinear relationships between two variables.

A value of exactly 1.0 means there is a perfect positive relationship between the two variables. For a positive increase in one variable, there is also a positive increase in the second variable. A value of -1.0 means there is a perfect negative relationship between the two variables. This shows the variables move in opposite directions — for a positive increase in one variable, there is a decrease in the second variable. If the correlation is 0, there is no relationship between the two variables.

The strength of the relationship varies in degree based on the value of the correlation coefficient. For example, a value of 0.2 shows there is a positive relationship between the two variables, but it is weak and likely insignificant. Experts do not consider correlations significant until the value surpasses at least 0.8. However, a correlation coefficient with an absolute value of 0.9 or greater would represent a very strong relationship.

This statistic is useful in finance. For example, it can be helpful in determining how well a mutual fund performs relative to its benchmark index, or another fund or asset class. By adding a low or negatively correlated mutual fund to an existing portfolio, the investor gains diversification benefits.

Correlation Coefficient Formulas

One of the most commonly used formulas in stats is Pearson’s correlation coefficient formula. If you’re taking a basic stats class, this is the one you’ll probably use:

topic 1.1.gif Where,

r = Pearson correlation coefficient
x = Values in first set of data
y = Values in second set of data
n = Total number of values.

Assumptions of Correlation Coefficient:

  • Linearity:

The primary assumption of Pearson’s correlation is that the relationship between the two variables is linear. This means that the best-fit line through the data points (the regression line) adequately describes the relationship. Non-linear relationships (where changes in one variable do not correspond to proportional changes in the other) may not be accurately captured by Pearson’s correlation.

  • Bivariate Normality:

Both variables should be normally distributed, and the joint distribution of the variables should be bivariate normal. This assumption ensures that the correlation is measuring a true linear relationship and not being influenced by outliers or a skewed distribution of variables.

  • Homoscedasticity:

The scatterplot of the two variables should show a consistent spread of data points around the regression line throughout the range of values. In other words, the variance of one variable is the same at all values of the other variable. Heteroscedasticity, where the spread of data points varies along the range of data, can distort the correlation coefficient.

  • Independence of Observations:

The observations (data points) should be independent of each other. There should be no hidden relationship among observations that could influence the variables being studied. For instance, measurements from the same subject or related subjects may violate this assumption.

  • Interval or Ratio-Level Data:

Pearson’s correlation assumes that the variables are measured on an interval or ratio scale, where the intervals between measurements are equally meaningful. Correlation calculations on ordinal data or data that do not meet these measurement criteria may not be valid.

  • No Outliers:

Outliers can have a disproportionate effect on the correlation coefficient, potentially exaggerating or diminishing the perceived strength of a relationship. The analysis should either exclude outliers or use a robust method of correlation that can handle outliers.

error: Content is protected !!