This article is an excerpt from the Shortform book guide to "How to Lie With Statistics" by Darrell Huff. Shortform has the world's best summaries and analyses of books you should be reading.
Like this article? Sign up for a free trial here.
What are the five questions you should ask yourself when looking at statistics? How can you tell when a statistic is being manipulated?
It is important to perform a statistical evaluation before you trust any statistic you’re given. Many statistics are manipulated to fulfill an agenda but even manipulated statistics can provide you with valuable information.
Continue on to learn how to perform a statistical evaluation.
Statistical Evaluation Techniques
In this article, you’ll learn about a five-question checklist you can go through every time you encounter a statistic to do a statistical evaluation and assess its legitimacy. The goal is to find balance—you don’t want to swallow statistics without thinking about them (it’s often worse to know something wrong than to be ignorant), but you also don’t want to be so suspicious that you ignore all statistics and miss out on important information.
Here are the statistical evaluation questions:
Question #1: What Is the Source of the Figure?
The first thing to do when confronted with a statistic is to figure out where it’s coming from. The source may not be obvious, because liars often borrow the numbers of reputable organizations, such as universities or labs, but come to their own conclusions. Then, they try to make it look like their conclusion is the reputable organization’s conclusion. Always be suspicious of the phrase “the survey/study shows”; who says that the survey or study shows this?
- For example, in an article about how women who attend college have a higher likelihood of becoming old maids, the writer cited data from Cornell about how many of its women students were married. Cornell did publish numbers about how many of its students were married, but the school didn’t draw any conclusions. The conclusion in the article—women who go to college are more likely to stay unmarried—came from the article’s writer, but since the data was from Cornell, it almost appeared as if the conclusion had come from Cornell.
Once you’ve determined the source, look for these two types of biases:
- Conscious. If you think the statistic is coming from a liar with an agenda, look for the techniques covered in the chapters above.
- Unconscious. If the bias is unconscious, there won’t be obvious clues that the figures are inaccurate (for instance, the vague use of the word “average,” without explaining which average they mean). If there are no signs of obvious lying, consider whether the source’s agenda is furthered by the figures it gives and if this might have blinded them to certain ideas or further explorations of the data.
Question #2: What Was the Data Collection Method?
The second statistical evaluation question addresses the data collection method. Any data that’s based on what respondents say, or how motivated they are to respond to something in a certain way, can skew the truth because people aren’t always truthful. When confronted with a statistic that was calculated based on people’s responses, ask yourself if there’s any reason the respondents might have been motivated to lie.
- For example, one census in China, for military and tax purposes, found the population of one region to be 28 million. The next census, for famine relief purposes, found the population of the same region to be 105 million. The population hadn’t changed much over the five years in between censuses—people were just a lot keener to be counted when it meant famine relief than when it meant getting taxed.
Question #3: Is Any Relevant Information Omitted?
In the third statistical evaluation question, you’ll consider the context of the statistic. If a figure is cited on its own, ask yourself if any of the following accompanying information exists, and if leaving it out would further anyone’s interests:
1. Statistical qualifiers. See Chapter 2 for a discussion of what numbers need to accompany stats (such as degree of significance) to make them meaningful.
2. Other relevant figures. Consider what additional context statisticians would need to take into account to come up with the most accurate figure possible.
- For example, an environmentalist who wants the government to regulate pollution might cite a high death rate during pollution-related foggy weather in London and attribute the deaths to the fog. However, this doesn’t represent how the world works—people die for plenty of reasons that don’t have anything to do with the weather, and the high death rate could have been caused by something else. A more accurate statistic would be to cite the death rate accompanied by cause of death: This would show how many people truly died due to fog.
3. Cause. If an explanation for the figure isn’t included, ask yourself what it might be. (If a liar leaves out the real cause, they can imply an effect was prompted by a more desirable cause.)
- For example, one retail company wanted to show that business was improving because this year’s April sales were better than last year’s. A quick check of the calendar shows that Easter had fallen in March the year before and in April that year. Holiday sales were more likely responsible for the boost than an overall improvement in business.
Question #4: Is the Language Surrounding the Figures Misleading?
Statistics are often reported in articles, surrounded by words (as opposed to in a table or chart). To answer this fourth question, study the words surrounding the figure and consider their definitions (to twist their results to suit their argument, liars may not use the most common definition of an everyday word, as you learned with “average”).
- (Shortform example #1: Anything can be the “first” or “biggest” or “best” of its kind, depending on how people define these words. For instance, the “biggest” waterfall in Canada is Niagara Falls (if “big” means the largest volume of water falling) or Della Falls (if “big” means highest).)
- Example #2: Accountants proposed using “retained earnings” or “appreciation of fixed assets” instead of the word “surplus” on corporate balance sheets. Most people know what surplus means, but not what the other words mean, so using these words in the balance sheets could hide how well companies were doing.
- Example #3: In a statistical context, “normal” doesn’t mean “ideal” or “good;” it means “usual.” But since most people do associate “normal” with “good,” seeing the word paired with a statistic can leave them with inaccurate (and emotionally worrying) conclusions. For example, if you see a stat that says most children “normally” start talking at a certain age and your child doesn’t talk by that age, you might worry that she’s abnormal or behind, when this isn’t necessarily the case.
Question #5: Does that Figure Make Sense?
To answer this last statistical evaluation question, don’t blindly trust numbers—consider if what they reveal actually makes sense.
There are four ways to assess a statistic against common sense:
1. Simply ask yourself if it seems right.
- For example, according to the Rudolf Flesch readability formula, Plato’s Republic was significantly easier to read than “The Legend of Sleepy Hollow.” As Republic is a complicated, ethical dialogue written in 375 BC and “Sleepy Hollow” is a short story, it doesn’t seem right that the short story is more challenging to read.
2. Compare the figure to commonly known and reputable facts.
- For example, one urologist calculated that there are eight million cases of prostate cancer in the US. At the time, the male population of the US was less than eight million, which meant the figure couldn’t be accurate.
3. Consider the figure’s precision. If the figure represents something abstract or difficult to measure (such as happiness), then it’s unlikely someone would have actually been able to measure it with a decimal point’s worth of precision.
4. Remember that extrapolation has limits. If a figure is based on extrapolation, be mindful that extrapolations are nothing more than educated guesses. Often, reality turns out to be much different from what the extrapolation predicted because things never continue to grow as you expect.
- For example, based on population data from 1790 to 1860, Abraham Lincoln predicted that the country’s population would be over 150 million by 1930, which was very incorrect.
———End of Preview———
Like what you just read? Read the rest of the world's best book summary and analysis of Darrell Huff's "How to Lie With Statistics" at Shortform.
Here's what you'll find in our full How to Lie With Statistics summary:
- The 10 ways you might end up fooled by statistics
- How to differentiate between legitimate and lying statistics
- Why you can't even trust a graph