The Correlation Coefficient: Statistics 101

This article is an excerpt from the Shortform book guide to "Naked Statistics" by Charles Wheelan. Shortform has the world's best summaries and analyses of books you should be reading.

Like this article? Sign up for a free trial here .

What is the correlation coefficient in statistics? What can the correlation coefficient tell us about the relationship between two variables? What is the danger in mistaking correlation for causation?

A figure called the correlation coefficient quantifies the strength and direction of the relationship between two variables. A common mistake in statistics is equating correlation with causation. It can be tempting to extrapolate beyond a correlation coefficient, but that will lead to causal conclusions that correlation can’t support.

In this article, we’ll break down the concept of statistical correlation and explain why correlation does not equal causation.

The Correlation Coefficient Explained

In statistics, the correlation coefficient calculates how well a change in one dataset tracks a change in the other (for example, how well a change in the amount of floral perfume someone wears predicts a change in the number of mosquito bites they get).

Correlation coefficients are values between negative one and one. A correlation coefficient of one signifies a “perfect” correlation between two variables, meaning that a change in one “perfectly” corresponds with a change in the other. A correlation coefficient of zero indicates that the variables have no meaningful connection to each other—a change in one doesn’t predict a change in the other at all.

The sign of the correlation coefficient communicates the direction of the variables’ relationship. Our variables “move” up or down together in a positive correlation. When one goes up, so does the other. For example, you might notice a positive correlation between the amount of floral perfume you wear (a known mosquito attractant) and the number of mosquito bites you get—the more perfume you wear, the more bites you get.

Our variables “move” in opposite directions in a negative correlation. When one goes up, the other goes down. For example, you might notice a negative correlation between the amount of bug spray you put on and the number of mosquito bites you get—the more bug spray you wear, the fewer bites you get.

**Correlation Is Not Causation**

Just because two variables are correlated doesn’t mean one is causing the other. Correlation quantifies a relationship between two variables, but it doesn’t explain that relationship. Wheelan notes that this is a crucial distinction to keep in mind, as equating correlation and causation can lead to misinformed decisions.

For example, data might show a positive correlation between owning an expensive car and dying in a plane crash. But if you avoid buying an expensive car because you’re worried that it might somehow cause you to die in a plane crash, you misunderstand the concept of correlation.

There are plenty of reasons why the same people who can afford to purchase an expensive car might be more likely to die in a plane crash—namely because they’re more likely to be on a plane in the first place. Wealthy people may choose to fly rather than drive long distances, take more vacations, or even own and travel on a plane of their own. Therefore, while there may be a relationship between being in a plane crash and owning an expensive car, the relationship is one of correlation, not causation.

As we see in the example above, we need to think logically and critically when interpreting statistics.

The Correlation Coefficient: Statistics 101