What is central tendency in statistics? What are the different ways to measure central tendency?
Central tendency is a descriptive statistic that represents the middle of a data set. There are three main statistical measures of central tendency: the mean, median, and mode. Each of these measures describes a slightly different central position within a data set.
Let’s examine the three statistical measures of central tendency.
Central Tendency: The “Middle” of a Data Set
Some of the most basic descriptive statistics are measures of central tendency, or what Wheelan refers to as the “middle” of a data set.
We talk about averages, one measure of central tendency, all the time. But as we’ll see, there are two main ways to communicate the midpoint of a data set: the mean (what we usually refer to as the average) and the median. As statistics students, we should understand the difference between the two and when to use one over the other.
|Another Measure of Central Tendency: The Mode
There is a third measure of central tendency that Wheelan does not discuss: the mode. The mode is the value that occurs most frequently in a dataset.
For example, in the dataset 1,2,3,4,4,4,4,5,5,6,6,7,8, the mode is four because it appears more times than any other number.
The mode is useful for identifying patterns in a dataset. For instance, say you’re interested in selling your home and look up what comparable homes in your neighborhood have sold for recently. You collect the following prices:
$180,000, $185,000, $190,000, $192,000, $195,000, $200,0000, $200,000, $200,000, $205,000, $208,000
You see that the most common sale price in your neighborhood is $200,000.
The mode is also useful for communicating trends in categorical data when there are no numerical values. For example, say you collected data on people’s favorite holiday, with the following results:
Christmas, Christmas, Christmas, Christmas, Christmas, Christmas, Christmas, Christmas, Christmas, Halloween, Halloween, Halloween, Thanksgiving, Thanksgiving, Easter, Easter.
You can’t calculate an average holiday, but you can see that Christmas is the most common choice.
The Mean (Average)
The average, or mean, of a data set is the sum of all of the values in the data set divided by the number of data points.
For example: If you wanted to know the average number of cookies you eat each time you open a package, you would keep track of the number of cookies you eat at each sitting and divide that number by the number of cookie-eating events.
|Number of Cookies Eaten per Sitting
The sum of the values in your data set: 15+8+6+10+9= 48
Divided by the number of data points (5): 48/5= 9.6
You average 9.6 cookies per sitting.
Limitations of Using the Mean
Wheelan cautions that the mean can be a misleading figure because it doesn’t convey the influence of outliers in a data set. (An outlier is a data point that is numerically far from others in the same data set.) In other words, a few “extreme” pieces of data can skew the mean in either direction, giving us a warped sense of the average.
For example, a store manager may report that her average monthly sales of Easter egg chocolates totaled $300 over the last year. However, her monthly sales data shows that she sold $3,000-worth of chocolate eggs in April, while sales for the other 11 months totaled between zero and $25. In this data set, the month of April is an outlier, and the mean of $300 doesn’t provide the truest picture of average chocolate egg sales for the store.
The median is another way to measure central tendency and is not influenced by outliers. The median takes an ordered data set (where the values are organized into ascending order) and divides it in half. The median is the middle value of a data set (or the average of the two middle values if the data set has an even number of data points).
Back to our chocolate eggs example, our ordered data set might look like this:
|$ Earned From Chocolate Easter Egg Sales
To calculate the median, we take the average of four and 10, which is seven. So the median chocolate egg sales figure is $7, which is a very different figure from the mean of $300, even though both are measures of central tendency.
Communicating Central Tendency
Since the mean and median can communicate different messages about the “middle” of a dataset, it’s important to keep the difference between them in mind when communicating and interpreting statistics. Wheelan explains that it’s common for people to share the mean instead of the median, or vice versa, to suit their goals.
For example, say the beach authorities at a fictional beach were collecting data on the number of jellyfish stings swimmers suffered each week throughout the summer. The data might look something like this:
|Jellyfish Stings/Week/500 swimmers
(Shortform note: In this example, the dataset is naturally ordered, so we don’t need to order it to determine the median.)
The mean number of jellyfish stings is 42. The median number of stings is zero. Beach authorities could either say:
A) “Visit our beach! The mean number of weekly stings per 500 swimmers throughout the summer is only 42!”
B) “Visit our beach! The median number of weekly stings throughout the summer is zero!”
Neither of these statements is incorrect, but they convey a different message to prospective swimmers. The beach authorities are sure to advertise option B over option A because option B makes the beach look more attractive.
This example highlights two of Wheelan’s cautions about descriptive statistics.
First, neither the mean nor the median tells prospective visitors the “story” behind the dataset, which suggests a “jellyfish season” at the beach that might be worth planning a visit around. But again, when we condense the real world into a statistic, this nuance is lost.
Second, this example showcases how statistics make it possible to mislead people without actually lying. Many readers would likely read option B and interpret it as reassurance that they can visit the beach at any time and are highly unlikely to get stung by a jellyfish. As we can see, in mid-September, this is simply not true.
|The Utility of Central Tendency
Statistical measures of central tendency are foundational to how we think about and communicate data. But as Wheelan cautions and our jellyfish example highlights, if they aren’t used with care, they can be unhelpful or even dangerous.
A TED Talk entitled “The Myth of Average” highlights how the misapplication of central tendency affected the United States Air Force in the 1950s. Despite having well-trained pilots and the most advanced airplanes to date, the Air Force was dissatisfied with pilots’ performance. Research on the dimensions of thousands of pilots revealed that the cockpits designed for the “average-sized” pilot didn’t fit any pilot well, and the ill-fitting cockpits prevented the pilots from flying their best. In response, the Air Force shifted its design focus from making cockpits that fit the average person to making cockpits that could accommodate the extremes of human dimensions. This shift improved the performance of existing pilots and allowed the Air Force to recruit the most diverse pool of fighter pilots in the world.
The lesson in this example is that a tool designed for the average user isn’t likely to be ideal for anyone. In many cases, such as a pair of scissors, we can easily accept this compromise. However, when it comes to life-altering scenarios such as flying a plane, we may want to rethink designs based on an average.
———End of Preview———
Like what you just read? Read the rest of the world's best book summary and analysis of Charles Wheelan's "Naked Statistics" at Shortform .
Here's what you'll find in our full Naked Statistics summary :
- An explanation and breakdown of statistics into digestible terms
- How statistics can inform collective decision-making
- Why learning statistics is an exercise in self-empowerment