We describe correlations with a unit-free measure called the correlation coefficient which ranges from -1 to +1 and is denoted by r. Statistical significance is indicated with a p-value. Therefore, correlations are typically written with two key numbers: r = and p = .
The closer r is to zero, the weaker the linear relationship.
Positive r values indicate a positive correlation, where the values of both variables tend to increase together.
Negative r values indicate a negative correlation, where the values of one variable tend to increase when the values of the other variable decrease.
The p-value gives us evidence that we can meaningfully conclude that the population correlation coefficient is likely different from zero, based on what we observe from the sample.
“Unit-free measure” means that correlations exist on their own scale: in our example, the number given for r is not on the same scale as either elevation or temperature. This is different from other summary statistics. For instance, the mean of the elevation measurements is on the same scale as its variable.
Once we’ve obtained a significant correlation, we can also look at its strength. A perfect positive correlation has a value of 1, and a perfect negative correlation has a value of -1. But in the real world, we would never expect to see a perfect correlation unless one variable is actually a proxy measure for the other. In fact, seeing a perfect correlation number can alert you to an error in your data! For example, if you accidentally recorded distance from sea level for each campsite instead of temperature, this would correlate perfectly with elevation.
Another useful piece of information is the N, or number of observations. As with most statistical tests, knowing the size of the sample helps us judge the strength of our sample and how well it represents the population. For example, if we only measured elevation and temperature for five campsites, but the park has two thousand campsites, we’d want to add more campsites to our sample.