PDF Summary:The Signal and the Noise, by Nate Silver
Book Summary: Learn the key points in minutes.
Below is a preview of the Shortform book summary of The Signal and the Noise by Nate Silver. Read the full comprehensive summary at Shortform.
1-Page PDF Summary of The Signal and the Noise
The early 21st century has already seen numerous catastrophic failures of prediction: From terrorist attacks to financial crises to natural disasters to political upheaval, we routinely seem unable to predict the events that change the world. In The Signal and the Noise, statistician, analyst, and FiveThirtyEight.com founder Nate Silver sets out to explain why—and how we can do better.
Silver argues that our predictions falter because of mental mistakes such as incorrect assumptions, overconfidence, bias, and warped incentives. However, he also suggests that we can mitigate these thinking errors (and thus improve our forecasts) with the help of a method called Bayesian inference. In this guide, we’ll explain how fields ranging from politics to poker to meteorology can teach you how to make better predictions and avoid costly mistakes in judgment. Along the way, we’ll update Silver’s examples with the latest research and place his arguments in conversation with other experts on prediction such as Philip E. Tetlock and Daniel Kahneman.
(continued)...
According to Silver, Kasparov’s overestimation of Deep Blue was costly: He lost his composure and, as a result, the series. To avoid making similar errors, Silver says you should always keep in mind the respective strengths and weaknesses of humans and machines: Computers are good at brute-force calculations, which they perform consistently and accurately, and they excel at solving finite, well-defined, short- to mid-range problems (like a chess game). On the other hand, Silver says, computers aren’t flexible or creative, and they struggle with large, open-ended problems. Conversely, humans can do several things computers can’t: We can think abstractly, form high-level strategies, and see the bigger picture.
Machine Intelligence Revisited
With the proliferation of large language model (LLM) AIs such as ChatGPT and Bing Chat, computing has changed radically since The Signal and the Noise’s publication—and as a result, it’s easier than ever to forget the limitations of computer intelligence. LLMs are capable of complex and flexible behaviors such as holding long conversations, conducting research, and generating text and computer code, all in response to simple prompts given in ordinary language.
In fact, these AIs can seem so lifelike that in 2022, a Google engineer became convinced that an AI he was developing was sentient. Other experts dismissed these claims—though the engineer’s interpretation was perhaps understandable in light of interactions in which AIs have declared their humanity, professed their love for users, threatened users, and fantasized about world domination.
Given these surprisingly human-like behaviors, we might be left wondering, like Kasparov with Deep Blue, whether these programs can actually think. According to AI experts, they can’t: Though LLMs can effectively emulate human language, they do so not through a deep understanding of meaning but through an elaborate matching operation. In essence, when you enter a query, the program consults a massive database to find statistically likely words and phrases to respond with. (Some observers have likened this to the Chinese Room thought experiment, in which a person with no understanding of Chinese conducts a written exchange in that language by copying the appropriate symbols according to instructions.)
That’s why, despite AI’s impressive advances, many experts recommend that the best way forward isn’t to replace people with AI—it’s to combine the two, much as Silver recommends, in order to maximize the strengths of both. Interestingly, Kasparov himself promotes this approach, arguing that the teamwork between specific humans and specific machines is more important than the individual capabilities of each.
Our Predictions Reflect Our Biases
Another challenge of prediction is that many forecasters have a tendency to see what they want or expect to see in a given set of data—and these types of forecasters are often the most influential ones. Silver draws on research by Phillip E. Tetlock (author of Superforecasting) that identifies two opposing types of thinkers: hedgehogs and foxes.
- Hedgehogs see the world through an ideological filter (such as a strong political view), and as they gather information, they interpret it through this filter. They tend to make quick judgments, stick to their first take, and resist changing their minds.
- Foxes start not with a broad worldview, but with specific facts. They deliberately gather information from opposing disciplines and sources. They’re slow to commit to a position and quick to change their minds when the evidence undercuts their opinion.
Silver argues that although foxes make more accurate predictions than hedgehogs, hedgehog predictions get more attention because the hedgehog thinking style is much more media-friendly: Hedgehogs give good sound bites, they’re sure of themselves (which translates as confidence and charisma), and they draw support from partisan audiences who agree with their worldview.
In Defense of Hedgehogs
Although Silver seems to favor fox-style thinking (which makes sense given his focus on making better predictions), other experts point out that both thinking types have their benefits. In Good to Great, for example, Jim Collins argues that hedgehogs make good leaders precisely because of their talent for simplifying reality into a clear, memorable vision and rejecting any distractions from that vision. Plus, in the right role, the same media-friendly charisma that can make hedgehogs unreliable predictors also makes them inspiring leaders.
Furthermore, while Silver and Collins imply that everyone’s either a fox or a hedgehog, it might be possible to be both at different times and to combine both styles. For example, one company founder shares his experience of shifting from fox-type thinking when he worked as a consultant to hedgehog-type thinking when he started his own company, only to find that neither mindset alone was serving his needs. He argues that his company only found success when they combined the hedgehog’s decisive, big-picture thinking with the fox’s cautious realism.
Our Incentives Are Wrong
Furthermore, Silver points to the media’s preference for hedgehog-style predictions as an example of how our predictions can be warped by bad incentives. In this case, for forecasters interested in garnering attention and fame (which means more airtime and more money), bad practices pay off. That’s because all the things that make for good predictions—cautious precision, attentiveness to uncertainty, and a willingness to change your mind—are a lot less compelling on TV than qualities that lead to worse predictions—broad, bold claims, certainty, and stubbornness.
(Shortform note: In The Art of Thinking Clearly, Rolf Dobelli explains why we’re drawn to predictors who exude certainty and confidence. For one thing, he says, we dislike ambiguity and uncertainty—in fact, that’s why we try to predict the future in the first place. Meanwhile, we’re quick to assume that experts know what they’re doing (a phenomenon called “authority bias”) and we struggle to understand the probabilities involved in careful forecasts. Taken all together, these mental biases combine to make simplistic, confident predictions appear more convincing than more nuanced (but possibly more accurate) ones.)
Similarly, Silver explains that predictions can be compromised when forecasters are concerned about their reputations. For example, he says, economic forecasters from lesser-known firms are more likely to make bold, contrarian predictions in hopes of impressing others by getting a tough call right. Conversely, forecasters at more esteemed firms are more likely to make conservative predictions that stick closely to the consensus view because they don’t want to be embarrassed by getting a call wrong.
(Shortform note: Another reason forecasters might deliberately make bold predictions is to spur audiences into action. For example, in Apocalypse Never, Michael Shellenberger argues that some climate activists employ this tactic by deliberately making alarmist predictions in hopes of prompting policy and behavioral changes. The problem with this approach, Shellenberger says, is that it can backfire: In the case of climate change, if you successfully convince people that we’ve damaged the planet beyond repair, they might simply give up hope rather than enacting the change you’d hoped for.)
Part 3: Better Predictions Through Bayesian Logic
Although prediction is inherently difficult—and made more so by the various thinking errors we’ve outlined—Silver argues that it’s possible to make consistently more accurate predictions by following the principles of a statistical formula known as Bayes’ Theorem. Though Silver briefly discusses the mathematics of the formula, he’s most interested in how the theorem encourages us to think while making predictions. According to Silver, Bayes’ Theorem suggests that we make better predictions when we consider the prior likelihood of an event and update our predictions in response to the latest evidence.
In this section, we’ll briefly describe Bayes’ Theorem, then we’ll explore the broader lessons Silver draws from it and offer concrete advice for improving the accuracy of your predictions.
The Principles of Bayesian Statistics
Bayes’s Theorem—named for Thomas Bayes, the English minister and mathematician who first articulated it—posits that you can calculate the probability of event A with respect to a specific piece of evidence B. To do so, Silver explains, you need to know (or estimate) three things:
- The prior probability of event A, regardless of whether you discover evidence B—mathematically written as P(A)
- The probability of observing evidence B if event A occurs—written as P(B|A)
- The probability of observing evidence B if event A doesn’t occur—written as P(B|not A)
Bayes’ Theorem uses these values to calculate the probability of A given B—P(A|B)—as follows:
(Shortform note: This formula may look complicated, but in less mathematical terms, what it’s calculating is [the probability that you observe B and A is true] divided by [the probability that you observe B at all whether or not A is true—or P(B)]. In fact, Silver’s version of the formula (as written above) is a very common special case used when you don’t directly know P(B); that lengthy denominator is actually just a way to calculate P(B) using the information we’ve listed above.)
Principle #1: Consider the Prior Probability
To illustrate how Bayes’ Theorem works in practice, imagine that a stranger walks up to you on the street and correctly guesses your full name and date of birth. What are the chances that this person is psychic? Say that you estimate a 90% chance that if this person is psychic, they’d successfully detect this information, whereas you estimate that a non-psychic person has only a 5% chance of doing the same (perhaps they know about you through a mutual friend). On the face of it, these numbers seem to suggest a pretty high chance (90% versus 5%) that you just met a psychic.
But Bayes’ Theorem reminds us that prior probabilities are just as important as the evidence in front of us. Say that before you met this stranger, you would’ve estimated a one in 1,000 chance that any given person could be psychic. That leaves us with the following values:
- P(A|B) is the chance that a stranger is psychic given that they’ve correctly guessed your full name and date of birth. This is what you want to calculate.
- P(A) is the chance that any random stranger is psychic. We set this at one in 1,000, or 0.001.
- P(B|A) is the chance that a psychic could correctly guess your name and date of birth. We set this at 90%, or 0.9.
- P(B|not A) is the chance that a non-psychic could correctly guess the same information. We set this at 5%, or 0.05.
Bayes’ Theorem yields the following calculation:
That’s an approximately 1.77% chance that the stranger is psychic based on current evidence. In other words, despite the comparatively high chances that a psychic stranger could detect your personal information while a non-psychic stranger couldn’t, the extremely low prior chance of any stranger being psychic means that even in these unusual circumstances, it’s quite unlikely you’re dealing with a psychic.
(Shortform note: In Superforecasting, Tetlock sums all this math up in plain language by explaining that Bayesian thinkers form new beliefs that are a product of their old beliefs and new evidence. He also borrows Kahneman’s description of this process as taking an “outside view,” because when you start from base probability rates (such as the odds that any random stranger is psychic), you put yourself outside of your specific situation and you’re less likely to be unduly swayed by details that feel compelling but have only limited statistical significance (such as the stranger’s unlikely guesses about your personal information).)
Principle #2: Update Your Estimates
Silver further argues that Bayes’ Theorem highlights the importance of updating your estimates in light of new evidence. To do so, simply perform a new calculation whenever you encounter new facts and take the results of the previous calculation as your starting point. That way, your estimates build on each other and, in theory, gradually bring you closer to the truth.
For example, imagine that after guessing your name and birth date, the stranger proceeds to read your thoughts and respond to what you’re thinking before you say anything. Perhaps you’d once again set the chances of a psychic doing so (P(B|A)) at 90% versus 5% for a non-psychic (P(B|not A))—perhaps the stranger is a very lucky guesser. But this time, instead of setting the prior chance of the stranger being psychic (P(A)) at one in 1,000, you’d set it at your previously calculated value of 0.017699—after all, this isn’t any random stranger, this is a random stranger who already successfully guessed your name and birth date. Given this prior evidence and the new evidence of possible mind-reading, the new calculation yields:
Now you have an approximately 24.49% chance that you’re dealing with a psychic—that’s because, in non-mathematical terms, you updated your previously low estimate of psychic likelihood to account for further evidence of potential psychic ability—and logically enough, even your unlikely conclusion becomes likelier with more evidence in its favor.
(Shortform note: In other words, one of the benefits of Bayesian thinking is that it accounts for your prior assumptions while also letting you know when those assumptions might be wrong. In Smarter Faster Better, Charles Duhigg explains how Annie Duke (Thinking in Bets) applied this kind of thinking during her professional poker career. Duke could often size up opponents at a glance by observing, for example, that 40-year-old businessmen often played recklessly. But to keep her edge, she had to stay alert to new information—such as a 40-year-old businessman who plays cautiously and rarely bluffs. Otherwise, her initial assumptions might lead her to bad decisions.)
Bayesian Lesson #1: Consider All Possibilities
Now that we’ve explored the mathematics of Bayes’ Theorem, let’s look at the broader implications of its underlying logic. First, building on the principle of considering prior probabilities, Silver argues that it’s important to be open to a wide range of possibilities, especially when you’re dealing with noisy data. Otherwise, you might develop blind spots that hinder your ability to predict accurately.
To illustrate this point, Silver argues that the US military’s failure to predict the Japanese attack on Pearl Harbor in 1941 shows that it’s dangerous to commit too strongly to a specific theory when there’s scant evidence for any particular theory. He explains that in the weeks before the attack, the US military noticed a sudden dropoff in intercepted radio traffic from the Japanese fleet. According to Silver, most analysts concluded that the sudden radio silence was because the fleet was out of range of US military installations—they didn’t consider the possibility of an impending attack because they believed the main threat to the US navy was from domestic sabotage by Japanese Americans.
(Shortform note: More recent historical analysis challenges Silver’s account of US intelligence ahead of the Pearl Harbor attack. According to the National Archives, the US suspected a Japanese attack was imminent, but they misidentified the likely target. That’s because Japanese radio traffic in October and November suggested a concentration of forces in the Marshall Islands, which the US assumed meant that Japan would target the Philippines, not Hawaii. They even ordered a reconnaissance mission to gather more information about the suspected attack. As it happened, the mission was to leave from Pearl Harbor, and the aircraft chosen for the mission was still awaiting preparations when it was destroyed in the Japanese attack.)
Silver explains that one reason we sometimes fail to see all the options is that it’s common to mistake a lack of precedence for a lack of possibility. In other words, when an event is extremely uncommon or unlikely, we might conclude that it will never happen, even when logic and evidence dictate that given enough time, it will. For example, Silver points out that before the attack on Pearl Harbor, the previous foreign attack on US territory came in the early 19th century—a fact that made it easy to forget that such an attack was even possible.
How to Estimate Prior Probabilities More Accurately
Silver’s Pearl Harbor example also points out the importance of generating the best possible prior estimate: Even though Bayesian logic will in theory lead you to the correct answer eventually, on a practical level, you might not find the evidence you need to get there until it’s too late. Therefore, the closer your prior estimate is to the truth, the better off you’ll be.
In Algorithms to Live By, Brian Christian and Tom Griffiths offer one way to come up with better prior estimates: base your predictions on the likely distribution of the event in question. For example, if the event typically falls within a bell curve distribution (in which most values cluster around a central average), you should start with the average and then adjust. So if you’re trying to predict a student’s grade in a class, you should start with the average grade (C) then adjust based on available evidence such as the student’s study habits and prior grades.
Basing your prior estimates on distributions also gives you a better chance of avoiding the precedence-possibility conflation Silver warns against. For example, if you’re trying to guess a stranger’s net worth, knowing that wealth follows a power law distribution (in which most values are clustered at one extreme with a few values at the opposite extreme) will help you remember that even if you’ve never met a billionaire before, it’s possible (though unlikely) that the stranger is one. In fact, as Christian and Griffiths explain, you’ll need to rely more heavily than usual on the available evidence when dealing with power law distributions—so if the stranger drives up in a million-dollar supercar, revise your initial estimates quickly.
Bayesian Lesson #2: Follow the Evidence, Not Emotions or Trends
In addition to emphasizing the importance of considering all possibilities, Bayesian logic also highlights the need to stay focused on the evidence rather than getting sidetracked by other factors such as your emotional responses or the trends you observe.
For one thing, Silver argues that when you give in to strong emotions, the quality of your predictions suffers. He gives the example of poker players going on tilt—that is, losing their cool due to a run of bad luck or some other stressor (such as fatigue or another player’s behavior). Poker depends on being able to accurately predict what cards your opponent might have—but Silver argues that when players are on tilt, they begin taking ill-considered risks (such as betting big on a weak hand) based more on anger and frustration than on solid predictions.
(Shortform note: To help keep your emotions in check, remember that getting a prediction wrong (such as by losing to an unlikely poker hand) doesn’t mean you did anything wrong—it’s not uncommon for the best possible predictions to fail simply due to luck. In fact, in Fooled By Randomness, Nassim Nicholas Taleb argues that luck is more important than skill in determining outcomes. He says that this is especially true because in many situations, early success leads to future success, such as when a startup experiences some early good fortune that allows it to survive long enough to benefit from more good fortune down the road.)
Furthermore, Silver says, it’s important not to be swayed by trends because psychological factors such as herd mentality distort behaviors in unpredictable ways. For instance, the stock market sometimes experiences bubbles that artificially inflate prices because of a runaway feedback loop: Investors see prices going up and want to jump on a hot market, so they buy, which drives prices up even further and convinces more investors to do the same—even when there’s no rational reason for prices to be spiking in the first place.
(Shortform note: The authors of Noise explain that these kinds of feedback loops result from a psychological phenomenon known as an information cascade in which real or perceived popularity influences how people interpret information. Moreover, the authors argue that information cascades often amount to little more than luck—whichever idea receives initial support (which typically correlates with which idea is presented first) tends to be the idea that wins out regardless of its merits.)
Part 4: Two Challenges for Today’s Forecasters
Although Silver maintains that his suggestions can improve the quality of our predictions, he cautions that it’s more difficult than ever to translate better prediction theory into better predictions in practice. We’ve already discussed one reason for that: namely, that contemporary technology forces us to wade through more noise to find meaningful signal. But Silver also argues that since the late 2010s, unprecedented political and social fragmentation have made prediction even harder. In this section, we’ll discuss how this fragmentation has complicated forecasters’ jobs by decreasing the diversity of thought and eroding public trust in expert advice.
Challenge #1: Increasingly Insular Groups
One problem for contemporary forecasters, according to Silver, is that contemporary news and social media encourage people to sort themselves into like-minded subgroups, which harms predictions by encouraging herd mentality and quashing opposing views. According to Silver, the best predictions often come when we combine diverse, independent viewpoints in order to consider a problem from all angles. Conversely, when you only listen to people who think the way you do, you’re likely to simply reinforce what you already believe—which isn’t a recipe for good predictions.
(Shortform note: In Think Again, Adam Grant suggests countering this insularity through reconsideration—a process of deliberately questioning and challenging your beliefs. According to Grant, one of the keys to reconsideration is to actively seek out opposing views and genuinely attempt to understand them. Doing so, he says, makes your understanding of the world more complex and helps you avoid the kinds of stereotypes that so easily develop in the “us versus them” atmosphere of the like-minded groups Silver describes.)
Challenge #2: The Erosion of Trust
Moreover, Silver argues, the same factors that have led to more insular, polarized groups have also led many people to dismiss opinions that come from outside of the group’s narrow consensus. One effect of this trend is that people trust public institutions and expert opinions less than ever before, which creates a climate in which people tend to dismiss accurate—and urgent—predictions, sometimes without clear reasons. He gives the example of the 2020 outbreak of Covid-19, when a significant number of people dismissed expert predictions about the virus’s likely spread and impact and, as a result, ignored or pushed back against the recommended public health protocols.
(Shortform note: The reasons that people ignore expert opinions may go deeper than recent political polarization. For example, some studies suggest that inherent distrust of experts isn’t necessarily linked to belonging to a specific ideological group as Silver suggests. Nor is it a new problem: Historian Richard Hofstadter wrote about ambivalence toward expertise in US culture as early as 1963. Moreover, it’s possible that humans are simply prone to doing the opposite of what they’re told—behavioral scientists point out that we’re susceptible to a type of behavior called reactance, which involves acting contrarily when we perceive threats to our individual freedom—such as when we’re asked to wear masks or avoid public gatherings.)
Want to learn the rest of The Signal and the Noise in 21 minutes?
Unlock the full book summary of The Signal and the Noise by signing up for Shortform.
Shortform summaries help you learn 10x faster by:
- Being 100% comprehensive: you learn the most important points in the book
- Cutting out the fluff: you don't spend your time wondering what the author's point is.
- Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
Here's a preview of the rest of Shortform's The Signal and the Noise PDF summary:
What Our Readers Say
This is the best summary of The Signal and the Noise I've ever read. I learned all the main points in just 20 minutes.
Learn more about our summaries →Why are Shortform Summaries the Best?
We're the most efficient way to learn the most useful ideas from a book.
Cuts Out the Fluff
Ever feel a book rambles on, giving anecdotes that aren't useful? Often get frustrated by an author who doesn't get to the point?
We cut out the fluff, keeping only the most useful examples and ideas. We also re-organize books for clarity, putting the most important principles first, so you can learn faster.
Always Comprehensive
Other summaries give you just a highlight of some of the ideas in a book. We find these too vague to be satisfying.
At Shortform, we want to cover every point worth knowing in the book. Learn nuances, key examples, and critical details on how to apply the ideas.
3 Different Levels of Detail
You want different levels of detail at different times. That's why every book is summarized in three lengths:
1) Paragraph to get the gist
2) 1-page summary, to get the main takeaways
3) Full comprehensive summary and analysis, containing every useful point and example