How to Get Accurate Statistics: Good Sampling

This article is an excerpt from the Shortform book guide to "How to Lie With Statistics" by Darrell Huff. Shortform has the world's best summaries and analyses of books you should be reading.

Like this article? Sign up for a free trial here .

How can you get accurate statics if you can’t possibly count every entity of a population? What criteria make a good sample?

It is true that the only way to get a 100% accurate sample is to count every single entity, but that’s just impossible. That’s why statisticians use samples to represent a population. There are certain criteria that must be met to ensure that your sample is as accurate as possible.

Keep reading to learn more about how good sampling leads to accurate statistics.

Accurate Statistics

The only way to get perfectly accurate statistics is to count every entity that makes up the whole. For example, if you want to know how many red beans there are in a jar of red-and-white colored beans, the only way to find out for sure is to count all of the red beans in the jar.

However, in most cases, counting every single entity is impossibly expensive and impractical. For instance, imagine you were trying to know how many red beans there are in every jar on the planet—you’d have to count all the red beans in the world at any given time.

To get around this problem, statisticians count a sample instead of the whole, assuming the sample’s make-up proportionally represents the whole.

A sample must meet the following two criteria to actually be representative of the whole (and thus, be “good”):

Criteria For Good Statistics

Criteria #1: Large. This reduces the effects of chance—chance affects every survey, poll, and experiment, but when the sample size is large, its effects are negligible.

For example, the probability of getting heads when flipping a coin is 50%. In practice, if you flip a coin 10 times, you’re unlikely to get heads five times. You’ll probably get some other number due to chance—say, three. If you don’t flip the coin any more times, you’re left with the impression that the probability of getting heads is 3/10, or 30%, which is clearly incorrect. You’ll need to flip the coin 1,000 times to reduce the effects of chance and get a figure closer to the real probability of a half.

How big your sample needs to be depends on what you’re studying. For example, if the incidence of polio is one in 500, and you want to test a vaccine, you’ll have to vaccinate far more than 500 people to get any meaningful results about the vaccine’s efficacy. It’s hard to know if the vaccine works if, even without its use, only one person would have contracted the disease anyway.

Criteria #2: Random. Every entity in the complete group must have an equal chance of being selected to be part of the sample. Perfectly random sampling is too expensive and unwieldy to be practical. (Even if you were only going to randomly select one bean in 1,000 to be part of the sample, you’d first need a list of every bean in the world to even determine where to find each thousandth bean.) Instead, statisticians use stratified random sampling, which works like this:

Statisticians divide the whole into groups: for example, people over the age of forty, people under the age of forty, Black people, white people, and so on.
They select samples from each group. How many are taken from each group depends on the group’s proportion in relation to the whole.

Avoiding Bias

Despite statisticians’ best efforts, bias is always present when choosing samples because:

Statisticians might get the proportions wrong and over or under-represent certain groups.
Statisticians can’t always tell which entity belongs to which group. (Shortform example: You might be able to tell how many people have red hair by looking at them, but how do you know if it’s naturally red?)
When sampling people, interviewers may be biased in their choice of subjects. (For example, if an interviewer has the choice between two people from the same group, she might choose the one who looks friendlier, to make it easier to get her job done.)
Interviewers might bias respondents. For example, one wartime poll asked Black people living in the South what they thought was more important, beating the Nazis, or bolstering democracy in the US. The Black interviewers found 39% of respondents prioritized beating the Nazis, while white interviewers found 62%. Black people might be more inclined to give white interviewers the answer they think they want, rather than tell them what they actually believe, so that they appear more loyal.

How to Get Accurate Statistics: Good Sampling