What are inferential statistics? How are inferential statistics applied in real life?

Inferential statistics are a powerful research tool due to a statistics tenet called the central limit theorem. The central limit theorem states that the mean of a representative sample will be close to the mean of the larger population. Therefore, we can confidently make inferences about a population from a sample or about a sample from a population, and we can compare samples to each other.

## Making Inferences About a Population From a Sample

We can use samples to glean reliable information about an entire population if we know the population’s mean and standard deviation for the variable we’re interested in. Since collecting data on an entire population is often neither feasible nor possible, the central limit theorem allows us to ask research questions that would otherwise be unanswerable.

(Note: Wheelan explains that 30 is the minimum sample size for reliable statistics, although there are mathematical fixes for smaller sample sizes.)

For example, many people believe that participating in gymnastics at a young age stunts girls’ growth. You can’t possibly collect data on every female gymnast to answer this question. Still, with the central limit theorem and inferential statistics, you can use a sample of gymnasts to test the null hypothesis that: “There is no difference in height as an adult between female gymnasts and non-gymnasts.”

## Making Inferences About a Sample From a Population

Since we know that samples look like their underlying population, we can also use the central limit theorem to make inferences about the composition of a sample taken from a given population. For example, say the organizers of a fun-run want to know how long they should give participants to finish their two-mile course. They can’t know the exact pace of each participant who will show up at the race, but they can use average running paces from the general population to assume that the majority of participants will finish the course between the 20- and 30-minute mark.

## Making Inferences to Connect Samples

Since samples look like their population, we would expect two samples taken from the same population to look like each other. When they don’t, the two samples have significant underlying differences in terms of the variable we’re studying.

For example, a researcher’s null hypothesis might be: “There is no difference in lung cancer rates between asbestos factory workers and the general population.” The researcher could collect data on 100 asbestos workers and 100 people from the general population and compare cancer rates. If the rates of lung cancer were significantly different between the two samples, it would be a powerful indicator that the two samples came from “different populations” with respect to the variable we’re studying (in other words, as far as lung cancer rates are concerned, asbestos workers are in a different population than the public).

### Standard Error

We noted above that according to the central limit theorem, the means of large representative samples will be “close” to the mean of the underlying population. More specifically, the central limit theorem states that sample means form a normal distribution around the population mean. In other words, if we took enough samples, the means of those samples would form a perfectly symmetrical bell curve around the true mean of the population.

For instance, let’s say you took a random sample of 100 people, measured their heights, and got a mean height of 5 feet 7 inches. Then you repeated the experiment with a different group of 100 people and got a mean height of 5 feet 8 inches. If you repeated the experiment 50 times with 50 different samples, then plotted those 50 sample means on a graph, they would form a normal bell curve centering on the true average height for the entire population.

Since data is distributed predictably around the mean in a normal distribution, we use the same principles we covered in our discussion of standard deviation to describe how close a particular sample mean is to the population mean. However, when we’re discussing the distribution of sample means around a population mean, we use a statistic called standard error rather than the standard deviation. The standard error is the standard deviation of sample means around a population mean.

As a reminder, in a normal distribution, 68.2% of data points will fall within one standard deviation of the mean, 95.4% of data points fall within two standard deviations of the mean, and 99.7% of data points fall within three standard deviations of the mean. Therefore, for a group of sample means around a population mean, 68.2% of sample means will fall within one standard error of the population mean and so on.

Using our height example, say the mean population height is 5’8”, and we calculate a standard error of 1 inch for our sample means. A sample with a mean height of 5’10” would be two standard errors taller (on average) than the population mean height. Another way to say this is that this sample mean is larger than 95.4% of all other sample means.

### Polling With the Central Limit Theorem

Another powerful use of inferential statistics involves polling. Since polling involves making inferences about a population from a sample, the central limit theorem allows us to use polls to predict public opinion. In polls, our “sample mean” is the percent of survey respondents who indicate that they’ll vote a certain way, and our “population mean” is the actual percent of people who will vote that same way on election day.

For example, say you conduct a survey to see whether voters will approve a ballot measure to build a local dog park. Your sample is the people who respond to your survey. Your “sample mean” is the number of respondents who indicate that they’ll vote “yes” on election day. Your population mean is the number of residents who will actually vote “yes” on election day.

The standard error and normal distribution allow us to gauge how confident we can be in our polling results. The standard error of a poll is also a percent. We can use the standard error to establish a range around our polling results within which we expect the true election results to fall, and our normal distribution to express how confident we are in that range. We’ll illustrate this with our dog park example.

Say that 55% of your dog park survey respondents indicate that they’ll vote “yes,” and say that you calculate the standard error of your survey to be 4%. The range of “yes” responses within one standard error of your poll is 51% to 59%. Using the normal distribution, you know that 68.2% of the time your polling results will be within one standard error of the true election results. Since 51% is over half the votes, you can be 68.2% confident that the dog park will pass with between 51% and 59% of the votes.

(Note: If a range of two standard errors from our poll results was still above 50% we would be able to use the normal distribution to say that we were 95.4% confident the measure would pass, and if a range of three standard errors from our poll results were still above 50%, we would be able to say that we’re 99.7% confident that the dog park will pass.)

Inferential Statistics: Examples in Real life

### ———End of Preview———

#### Like what you just read? Read the rest of the world's best book summary and analysis of Charles Wheelan's "Naked Statistics" at Shortform .

Here's what you'll find in our full Naked Statistics summary :

• An explanation and breakdown of statistics into digestible terms
• How statistics can inform collective decision-making
• Why learning statistics is an exercise in self-empowerment

#### Darya Sinusoid

Darya’s love for reading started with fantasy novels (The LOTR trilogy is still her all-time-favorite). Growing up, however, she found herself transitioning to non-fiction, psychological, and self-help books. She has a degree in Psychology and a deep passion for the subject. She likes reading research-informed books that distill the workings of the human brain/mind/consciousness and thinking of ways to apply the insights to her own life. Some of her favorites include Thinking, Fast and Slow, How We Decide, and The Wisdom of the Enneagram.