Categories
1. What is Correlation?

Correlation Example

Years of Education and Age of Entry to Labour Force Table.1 gives the number of years of formal education (X) and the age of entry into the labour force (Y ), for 12 males from the Regina Labour Force Survey. Both variables are measured in years, a ratio level of measurement and the highest level of measurement. All of the males are aged close to 30, so that most of these males are likely to have completed their formal education.

Respondent NumberYears of Education, XAge of Entry into Labour Force, Y
11016
21217
31518
4815
52018
61722
71219
81522
91218
101015
11818
121016

Table 1. Years of Education and Age of Entry into Labour Force for 12 Regina Males

Since most males enter the labour force soon after they leave formal schooling, a close relationship between these two variables is expected. By looking through the table, it can be seen that those respondents who obtained more years of schooling generally entered the labour force at an older age. The mean years of schooling are ¯XX¯ = 12.4 years and the mean age of entry into the labour force is ¯YY¯= 17.8, a difference of 5.4 years.

Correlation

This difference roughly reflects the age of entry into formal schooling, that is, age five or six. It can be seen through that the relationship between years of schooling and age of entry into the labour force is not perfect. Respondent 11, for example, has only 8 years of schooling but did not enter the labour force until the age of 18. In contrast, respondent 5 has 20 years of schooling but entered the labour force at the age of 18. The scatter diagram provides a quick way of examining the relationship between X and Y.
To get more information about correlation and related concepts, download BYJU’S – The Learning App today!

Categories
1. What is Correlation?

Linear Correlation Coefficient Formula

The formula for the linear correlation coefficient is given by;

Linear Correlation Coefficient Formula

Sample Correlation Coefficient Formula

The formula is given by:

rxy = Sxy/SxSy

Where Sx and Sy are the sample standard deviations, and Sxy is the sample covariance.

Categories
1. What is Correlation?

Negative Correlation

A negative (inverse) correlation occurs when the correlation coefficient is less than 0. This is an indication that both variables move in the opposite direction. In short, any reading between 0 and -1 means that the two securities move in opposite directions. When ρ is -1, the relationship is said to be perfectly negatively correlated.

In short, if one variable increases, the other variable decreases with the same magnitude (and vice versa). However, the degree to which two securities are negatively correlated might vary over time (and they are almost never exactly correlated all the time). 

Examples of Negative Correlation

For example, suppose a study is conducted to assess the relationship between outside temperature and heating bills. The study concludes that there is a negative correlation between the prices of heating bills and the outdoor temperature. The correlation coefficient is calculated to be -0.96. This strong negative correlation signifies that as the temperature decreases outside, the prices of heating bills increase (and vice versa).

When it comes to investing, a negative correlation does not necessarily mean that the securities should be avoided. The correlation coefficient can help investors diversify their portfolio by including a mix of investments that have a negative, or low, correlation to the stock market. In short, when reducing volatility risk in a portfolio, sometimes opposites do attract.  

For example, assume you have a $100,000 balanced portfolio that is invested 60% in stocks and 40% in bonds. In a year of strong economic performance, the stock component of your portfolio might generate a return of 12% while the bond component may return -2% because interest rates are rising (which means that bond prices are falling).

Thus, the overall return on your portfolio would be 6.4% ((12% x 0.6) + (-2% x 0.4). The following year, as the economy slows markedly and interest rates are lowered, your stock portfolio might generate -5% while your bond portfolio may return 8%, giving you an overall portfolio return of 0.2%.

What if, instead of a balanced portfolio, your portfolio were 100% equities? Using the same return assumptions, your all-equity portfolio would have a return of 12% in the first year and -5% in the second year. These figures are clearly more volatile than the balanced portfolio’s returns of 6.4% and 0.2%.

Categories
1. What is Correlation?

Positive Correlation

A positive correlation—when the correlation coefficient is greater than 0—signifies that both variables move in the same direction. When ρ is +1, it signifies that the two variables being compared have a perfect positive relationship; when one variable moves higher or lower, the other variable moves in the same direction with the same magnitude.

The closer the value of ρ is to +1, the stronger the linear relationship. For example, suppose the value of oil prices is directly related to the prices of airplane tickets, with a correlation coefficient of +0.95. The relationship between oil prices and airfares has a very strong positive correlation since the value is close to +1. So, if the price of oil decreases, airfares also decrease, and if the price of oil increases, so do the prices of airplane tickets.

In the chart below, we compare one of the largest U.S. banks, JPMorgan Chase & Co. (JPM), with the Financial Select SPDR Exchange Traded Fund (ETF) (XLF).12 As you can imagine, JPMorgan Chase & Co. should have a positive correlation to the banking industry as a whole. We can see the correlation coefficient is currently at 0.98, which is signaling a strong positive correlation. A reading above 0.50 typically signals a positive correlation.

Categories
1. What is Correlation?

Calculating ρ

The covariance of the two variables in question must be calculated before the correlation can be determined. Next, each variable’s standard deviation is required. The correlation coefficient is determined by dividing the covariance by the product of the two variables’ standard deviations.

Standard deviation is a measure of the dispersion of data from its average. Covariance is a measure of how two variables change together. However, its magnitude is unbounded, so it is difficult to interpret. The normalized version of the statistic is calculated by dividing covariance by the product of the two standard deviations. This is the correlation coefficient.

Categories
1. What is Correlation?

Understanding Correlation

The correlation coefficient (ρ) is a measure that determines the degree to which the movement of two different variables is associated. The most common correlation coefficient, generated by the Pearson product-moment correlation, is used to measure the linear relationship between two variables. However, in a non-linear relationship, this correlation coefficient may not always be a suitable measure of dependence.

The possible range of values for the correlation coefficient is -1.0 to 1.0. In other words, the values cannot exceed 1.0 or be less than -1.0. A correlation of -1.0 indicates a perfect negative correlation, and a correlation of 1.0 indicates a perfect positive correlation. If the correlation coefficient is greater than zero, it is a positive relationship. Conversely, if the value is less than zero, it is a negative relationship. A value of zero indicates that there is no relationship between the two variables.

Categories
1. What is Correlation?

What Do Correlation Coefficients Positive, Negative, and Zero Mean?

Correlation coefficients are indicators of the strength of the linear relationship between two different variables, x and y. A linear correlation coefficient that is greater than zero indicates a positive relationship. A value that is less than zero signifies a negative relationship. Finally, a value of zero indicates no relationship between the two variables x and y.

This article explains the significance of linear correlation coefficient for investors, how to calculate covariance for stocks, and how investors can use correlation to predict the market.

KEY TAKEAWAYS:

  • Correlation coefficients are used to measure the strength of the linear relationship between two variables.
  • A correlation coefficient greater than zero indicates a positive relationship while a value less than zero signifies a negative relationship.
  • A value of zero indicates no relationship between the two variables being compared.
  • A negative correlation, or inverse correlation, is a key concept in the creation of diversified portfolios that can better withstand portfolio volatility.
  • Calculating the correlation coefficient is time-consuming, so data are often plugged into a calculator, computer, or statistics program to find the coefficient.
Categories
1. What is Correlation?

Visualizing correlations with scatterplots

Back to our example from above: as campsite elevation increases, temperature drops. We can look at this directly with a scatterplot. Imagine that we’ve plotted our campsite data:

Each point in the plot represents one campsite, which we can place on an x- and y-axis by its elevation and summertime high temperature.

The correlation coefficient (r) also illustrates our scatterplot. It tells us, in numerical terms, how close the points mapped in the scatterplot come to a linear relationship. Stronger relationships, or bigger r values, mean relationships where the points are very close to the line which we’ve fit to the data.

Categories
1. What is Correlation?

What do correlation numbers mean?

We describe correlations with a unit-free measure called the correlation coefficient which ranges from -1 to +1 and is denoted by r. Statistical significance is indicated with a p-value. Therefore, correlations are typically written with two key numbers: r = and p = .

The closer r is to zero, the weaker the linear relationship.

Positive r values indicate a positive correlation, where the values of both variables tend to increase together.

Negative r values indicate a negative correlation, where the values of one variable tend to increase when the values of the other variable decrease.

The p-value gives us evidence that we can meaningfully conclude that the population correlation coefficient is likely different from zero, based on what we observe from the sample.

“Unit-free measure” means that correlations exist on their own scale: in our example, the number given for r is not on the same scale as either elevation or temperature. This is different from other summary statistics. For instance, the mean of the elevation measurements is on the same scale as its variable.

Once we’ve obtained a significant correlation, we can also look at its strength. A perfect positive correlation has a value of 1, and a perfect negative correlation has a value of -1. But in the real world, we would never expect to see a perfect correlation unless one variable is actually a proxy measure for the other. In fact, seeing a perfect correlation number can alert you to an error in your data! For example, if you accidentally recorded distance from sea level for each campsite instead of temperature, this would correlate perfectly with elevation.

Another useful piece of information is the N, or number of observations. As with most statistical tests, knowing the size of the sample helps us judge the strength of our sample and how well it represents the population. For example, if we only measured elevation and temperature for five campsites, but the park has two thousand campsites, we’d want to add more campsites to our sample.

Categories
1. What is Correlation?

Correlations describe data moving together

Correlations are useful for describing simple relationships among data. For example, imagine that you are looking at a dataset of campsites in a mountain park. You want to know whether there is a relationship between the elevation of the campsite (how high up the mountain it is), and the average high temperature in the summer.

For each individual campsite, you have two measures: elevation and temperature. When you compare these two variables across your sample with a correlation, you can find a linear relationship: as elevation increases, the temperature drops. They are negatively correlated.