What is Charles Wheelan’s Naked Statistics about? What statistics does Wheelan explore in the book?
Naked Statistics puts the math behind statistics into digestible terms and explains statistics concepts with relatable, relevant, and even humorous examples. Readers also benefit from additional socio-political insight from the book, as Wheelan uses real-world anecdotes to explore how statistics can inform collective decision-making.
Below is a brief overview of the key themes and concepts from Wheelan’s Naked Statistics.
Naked Statistics: Stripping the Dread from the Data
Wheelan opens Naked Statistics with the admission that he sometimes struggled to see the relevance of what he was learning as a math student. Therefore, he puts the relevance of statistics front and center in the book, building his discussion of each statistics concept around why we should know about it. Better yet, Wheelan proves that statistics don’t need to be intimidating by putting the math behind statistics into digestible terms and explaining concepts with relatable, relevant, and even humorous examples.
This guide largely focuses on two main themes in Charles Wheelan’s Naked Statistics. First, we cover what many common statistics mean, how to interpret them, and why they matter. Like Wheelan, we use real and fictional examples to add context to each statistic covered. Second, we examine Wheelan’s discussion of the consequences of bias and the misapplication and misinterpretation of statistics to make the case that everyone should develop basic statistical literacy.
Statistics Organize Data
We rely on data to make sense of the world, but without statistics, datasets would be largely useless. Imagine asking a car salesperson what kind of mileage a car gets, only to get a 100-page spreadsheet of the individual miles that car has driven and how much gas it used each mile! While the spreadsheet may be comprehensive, it’s also pretty useless if you were hoping for a quick answer. With statistics, we can take unwieldy datasets and transform them into meaningful and actionable values, like average miles per gallon.
Statistics that summarize datasets are called descriptive statistics. Two of the most familiar and commonly used descriptive statistics are the mean (the average) and the median (the middle number when you put all of your data in numerical order). The mean and median are called measures of central tendency, and while they both tell us about the “middle” of a dataset, Wheelan explains that they can convey very different messages. With a basic understanding of statistics, we can learn when to use one over the other and spot when someone might be reporting the mean instead of the median (or vice versa) to further an agenda.
Say the beach authorities at a fictional beach were collecting data on the number of jellyfish stings swimmers suffered each week throughout the summer. The data might look something like this:
|Jellyfish Stings/Week/500 swimmers|
(Shortform note: In this example, the dataset is naturally ordered, so we don’t need to order it to determine the median.)
The mean number of jellyfish stings is 42. The median number of stings is zero. Beach authorities could either say:
A) “Visit our beach! The mean number of weekly stings/500 swimmers throughout the summer is only 42!”
B) “Visit our beach! The median number of weekly stings throughout the summer is zero!”
Neither of these statements is incorrect, but they convey a different message to prospective swimmers. The beach authorities are sure to advertise option B over option A because option B makes the beach look more attractive. As astute statistics students, we should question which measure of central tendency best captures the “story” of the dataset and be aware that no single statistic can fully convey real-world complexity.
|The Utility of Central Tendency|
Measures of central tendency are foundational to how we think about and communicate data. But as Wheelan cautions and our jellyfish example highlights, if they aren’t used with care, they can be unhelpful or even dangerous.
A TED Talk entitled “The Myth of Average” highlights how the misapplication of central tendency affected the United States Air Force in the 1950s. Despite having well-trained pilots and the most advanced airplanes to date, the Air Force was dissatisfied with pilots’ performance. Research on the dimensions of thousands of pilots revealed that the cockpits designed for the “average-sized” pilot didn’t fit any pilot well, and the ill-fitting cockpits prevented the pilots from flying their best. In response, the Air Force shifted its design focus from making cockpits that fit the average person to making cockpits that could accommodate the extremes of human dimensions. This shift improved the performance of existing pilots and allowed the Air Force to recruit the most diverse pool of fighter pilots in the world.
The lesson in this example is that a tool designed for the average user isn’t likely to be ideal for anyone. In many cases, such as a pair of scissors, we can easily accept this compromise. However, when it comes to life-altering scenarios such as flying a plane, we may want to rethink designs based on an average.
Statistics Reveal and Describe Relationships
Descriptive statistics can also illuminate and describe relationships between variables in a dataset. As Wheelan explains, analyzing the correlation between two variables can tell us whether a change in one variable corresponds to a change in the other. For example, a nursery owner might find a positive correlation between the hours of sunlight her mums get and the number of blooms on each plant. The number of flowers on each plant increases predictably with sunshine. In contrast, she might find a negative correlation between the number of ladybugs on her plants and the number of aphids. As the number of ladybugs (which eat aphids) increases, the number of aphids decreases.
The correlation coefficient communicates the strength of the relationship between two variables, with a coefficient of one meaning a “perfect correlation” and a coefficient of zero meaning “no meaningful relationship.” We can use the value of the correlation coefficient to help guide both our research and our actions. For example, say researchers investigating lead poisoning found a correlation coefficient of 0.8 between the amount of city water children drank and lead levels in their blood. This large positive correlation can’t prove that city water is causing lead poisoning. But, these findings would warrant investigating the city’s water quality and might lead parents to purchase bottled water.
|Inferring Causation From Correlation|
Correlation isn’t the same as causation. But it can be easy to equate the two even when we know better. Wheelan explains that when the correlation between two variables is particularly strong, or when changes in variables track each other tightly, it can be difficult not to infer a causal relationship.
Tyler Vigen uses comedy to highlight this natural yet problematic tendency in his book Spurious Correlations and on his website. By sifting through mountains of data, Vigen generates graphs that show uncannily tight correlations between unrelated variables. For example, he highlights a nearly 99% correlation between how much margarine the average American eats and how many couples get divorced in Maine. Additionally, his research shows a nearly 99% correlation between the amount of money arcades make every year and the number of people who earn a doctorate degree in computer science that same year.
Vigen’s work has been highlighted by the Harvard Business Review and received positive reviews from the Boston Globe and Washington Post, among others, in part because these ridiculous examples highlight the fact that equating causation with correlation is incorrect, no matter how close the relationship.
Another statistics technique, regression analysis, goes beyond describing the relationship between two variables and allows us to make mathematical predictions based on those relationships. For example, the nursery owner above could generate an equation with regression analysis to predict how many flowers her plants would have based on the amount of sunlight she gave them.
|Regression Analysis for Smoking and Lung Cancer|
As Wheelan explains, regression analysis is a staple in medical and social sciences research. A study in the National Library of Medicine used regression analysis to calculate that for every 1% more collective smoking American adults do, lung cancer rates rise by 164 cases per 1,000 citizens.
Statistics Help Answer Complicated Questions
Probability is one way statistics can help us make more informed decisions. It allows us to manage uncertainty, calculate risks, and put possible outcomes in perspective. Wheelan explains that understanding probability can be especially relevant to our daily lives because we make decisions based on our perception of probability all the time. However, our perception of likely outcomes is often mathematically irrational. For example, the probability of getting in a car accident while driving to a beach is far higher than the probability of being attacked by a shark there, but we often—irrationally—fear the shark risk more.
|Probability Isn’t Intuitive|
There are several reasons for our mathematically irrational perception of probability, including:
Confirmation Bias: When we focus on what we expect and ignore the rest. Using our shark attack example above, we might justify our fear of swimming at the beach with a statement like, “well, that one guy was bitten by a shark at Cape Cod last year!” ignoring the tens of thousands of swimmers who weren’t attacked.
Anecdotal Logic: Improbable events are statistically bound to happen, and people notice and talk about them when they do. These stories of improbable occurrences stick in our minds and shape our perception of what is likely. For example, say you have a friend diagnosed with an exceedingly rare cancer. Even though your friend’s diagnosis is an anomaly, the rare form of cancer suddenly feels more prevalent.
Short-Term Thinking: Humans are evolutionally hardwired to think in the short and middle-term, which can make us feel like we’re witnessing statistically improbable events when we’re not (for example, witnessing a 100-year flood) and make us unable to process long-term data (for example, focusing on the cold snap over the last several days and ignoring climate change).
Our brains’ tendency to misunderstand probability makes it a useful subject to study if we want to use statistics to make more informed decisions.
People often use probability to assess risk when making financial decisions. Wheelan explains that a statistic called the “expected value” can help us determine whether we want to take a financial risk when we know the probability of each possible outcome and its respective payoff. Real-estate developers, for instance, can use this tool to make sure that their multiple investments are likely to make money as a whole. Even if one property loses money or underperforms in a given year, as long as the expected value of their portfolio is profitable overall, they are likely to make money.
|Probability and Purchasing Stock|
As Wheelan explains, probability is an effective tool for managing risk. Unfortunately, many of us underutilize probability when investing in the stock market. Research shows that we tend to overestimate the probability of rare events and our ability to foresee them. For example, people will often invest in a single stock that they think will be the next Apple instead of spreading their investment across a diverse portfolio. Therefore, people tend to under-diversify their stock, costing them an average of $2,500 per year.
Using statistics like the expected value is likely a good idea before investing in the stock market as it can temper our “gut feeling” about a stock with math and help us make smarter investments.
In addition to helping us make more informed decisions, statistics can offer insight into questions we couldn’t possibly design an experiment to answer. For example, say we wanted to know whether exposure to a certain chemical (we’ll call it chemical X) corresponds to higher rates of cancer. Ethics precludes purposefully exposing people to chemical X in a laboratory setting for the sake of science. Additionally, so many other variables impact a person’s personal cancer risk that we can’t possibly know if chemical X was the sole cause of anyone’s cancer diagnosis. Without statistics, complex but important questions like this would remain unanswered.
To answer the question of whether chemical X is associated with higher rates of cancer, researchers could collect a large dataset including people who were and were not exposed to chemical X and record their rates of cancer diagnoses. Then the researchers could use regression analysis to determine the association between exposure to chemical X and a cancer diagnosis, independent of other factors such as smoking, exercise, family history, and so on. Statistics can even tell us what percent of a person’s overall cancer risk is mathematically associated with exposure to chemical X rather than other factors.
As Wheelan explains, the ability to mathematically separate individual variables (like exposure to a particular chemical) in the complexity of the real world makes statistical analysis an invaluable part of medical and social sciences research.
|Using Statistics to Assess Whether Money Can Buy Happiness|
Researchers have even used statistics to try to answer the age-old question of whether money can buy happiness. In a 2010 study from Princeton University, researchers gathered data from 450,000 responses to a Gallup survey about day-to-day emotions and overall life evaluation (how people rate their life in the “big picture”). Next, researchers used multivariate regression analysis to analyze whether a high income is correlated with increased emotional well-being and better life evaluation.
The results of the study suggest that money can buy happiness to a point. Income was positively associated with life evaluation in general (people had a more favorable opinion of their lives overall if they made more money) and associated with emotional well-being up to an income of $75,000. Beyond $75,000, income no longer predicted emotional well-being.
Learning Statistics Is Empowering
Learning statistics is an exercise in self-empowerment. Wheelan explains that thanks to modern society’s affinity for and reliance on technology, we’re constantly surrounded and impacted by data. This abundance of data is a blessing in that it gives researchers a chance to study society’s most pressing issues, for example, using student outcomes to highlight racial and social inequities in our education system. But, the amount of data we’re bombarded with every day through targeted marketing, political campaigns, and social media can also be a challenge when we don’t know how to gauge its reliability. Studying statistics can give us a better sense of how much trust we should put in different sources of information and can help us interpret published statistics correctly.
Learning statistics to be an informed citizen is part of a larger skill set called “data literacy.” Data literacy refers to the ability to analyze and interpret data correctly. Just as a literate person can understand a story by reading the words on a page, a data-literate person can look at a statistic, chart, graph, and so on and correctly interpret its “story.”
Data literacy is a critical yet neglected skill. Poor data literacy skills hamper our individual and societal ability to make informed decisions. For example, Wheelan cites mainstream confusion about the difference between correlation and causation, combined with a lack of awareness of modern vaccine-safety research, as the cause of the anti-vaccination movement.
The gap between data literacy skills and data literacy needs in the modern workplace is costly. Many jobs require working with and making decisions using data, yet many employees lack the skills to do this effectively. Estimates show that over $109 billion are lost to the US economy every year due to underdeveloped data literacy skills in the workforce. In response, most corporations are adopting data literacy as a critical skill.
Studying statistics also makes us less susceptible to being purposefully misled. Unfortunately, Wheelan explains that the purposeful misuse of statistics is more common than we may think. While the statistics values themselves can’t lie, the statistical tests that people choose to use, the data they choose to calculate statistics with, and the choice to include or not include specific statistics from datasets can construct various versions of “the truth.” For example, consider the following statements based on the same hypothetical dataset:
- Vote for Mark Smith! His tax cuts have saved the people in this town an average of $1,000 per year!
- Don’t vote for Mark Smith! His “tax cuts” have saved the wealthiest 1% of town residents tons of money and have saved low-income residents almost nothing!
Neither of these statements is a lie. Instead, different uses of data and statistics construct versions of the truth that best suit differing perspectives. While we can’t expect ourselves to dive into the underlying data for every statistic we read or hear, Wheelan explains that we can better spot incomplete or misleading information with a basic understanding of statistics.
In his 1954 book How to Lie With Statistics (republished in 1993), Darell Huff explores several ways that statistics are used to deliberately mislead an audience. His examples include using small sample sizes to inflate results, taking biased samples, and omitting values that are critical for context.
As an example of the latter, take the following hypothetical marketing for a weight loss supplement:
Headline: “Supplement Users Lost Twice as Much Weight During Their First Month as Those Taking a Placebo!”
This sounds appealing and might tempt many people to spend big on the supplements. However, it gives no context given for what “twice as much” means because no actual weight loss figures are included. Perhaps those on the supplement lost just one pound, while those on the placebo lost just half a pound. While it’s true that one pound is twice as much as half a pound, the actual figures are far less impressive than the report makes them appear.
Huff cautions that dishonest or incomplete statistics combined with a data-illiterate audience render many published statistics meaningless at best and harmful at worst.
———End of Preview———
Like what you just read? Read the rest of the world's best book summary and analysis of Charles Wheelan's "Naked Statistics" at Shortform .
Here's what you'll find in our full Naked Statistics summary :
- An explanation and breakdown of statistics into digestible terms
- How statistics can inform collective decision-making
- Why learning statistics is an exercise in self-empowerment