Seth Stephens-Davidowitz: Why Big Data Matters

This article is an excerpt from the Shortform book guide to "Everybody Lies" by Seth Stephens-Davidowitz. Shortform has the world's best summaries and analyses of books you should be reading.

Like this article? Sign up for a free trial here .

What is big data? Why does Seth Stephens-Davidowitz care about it?

In Everybody Lies, Seth Stephens-Davidowitz says information from big data can be used for the greater good. But to do so, data researchers have to understand big data’s inherent strengths—and avoid its inherent weaknesses.

Learn more about big data, Seth Stephens-Davidowitz’s definition of it, and why it’s important for research.

Why Data?

Let’s look at the bigger picture of what big data is and why Seth Stephens-Davidowitz says we should care about big data. Seth Stephens-Davidowitz argues that it’s really an extension of our natural intuition and of the kinds of studies social scientists have always been interested in.

What Is “Big Data”?

Stephens-Davidowitz explicitly refuses to define “big data,” arguing that the only way to do so is by assigning an arbitrary numerical cutoff—in other words, by deciding that if you have at least X data points, you have big data. While it’s fair to say that big data is a somewhat fluid concept, other data experts accept that there are some key characteristics that we can use to pin down “big data.” Chief among these characteristics are the “three Vs”:

  • Volume: The sheer amount of data. As you’d expect, big data implies very large data sets, typically measured in terabytes or petabytes. In refusing to define big data, Stephens-Davidowitz is mostly refusing to identify a specific volume at which data becomes big. 
  • Velocity: With big data, new information comes in fast. If your data set consists of Twitter posts, for example, you’re taking in an average of 6,000 new tweets per second. Note that not all of the data Stephens-Davidowitz uses is high velocity. Google search results are; historical census databases aren’t. But as we’ll see later in this guide, big data techniques can yield new insights into older data that has accumulated gradually over time. 
  • Variety: Big data comes in many forms (text, video, audio, and so on) and doesn’t fit neatly into a standardized database. Stephens-Davidowitz does talk about data variety, as we’ll see, though he doesn’t explicitly say that variety is one of the defining characteristics of big data.

Extending Our Intuition

Seth Stephens-Davidowitz argues that when used well, big data is an extension of our natural intuition (though as we’ll see in a moment, it often defies our intuitive expectations and assumptions). He says that one of our basic activities as humans is spotting patterns and cause-effect relationships to make predictions. Good data science, he says, is just an expanded, more rigorous version of this activity.

However, data science has two advantages over our natural intuition: It can consider much bigger sample sizes, and it doesn’t get distracted by compelling stories. 

The Power of Bigger Samples

One of the problems with human intuition is that we can only base our judgments on our own knowledge and experience. For example, a teacher grading an essay might suspect that a student copied material from a source. That suspicion often depends on previous knowledge—for example, how similar the paper is to previous plagiarism cases the teacher has encountered and how different the questionable passages are from that student’s previous writing.

Moreover, it’s hard to enforce plagiarism penalties based only on suspicion. But online plagiarism checkers—which compare the text in question to materials published on the web as well as to databases of other student papers—allow teachers to easily check questionable essays against millions of potential text matches. 

(Shortform note: When it comes to human judgment, larger sample sizes (which often come with age and experience) can create overconfidence bias and prevent a person from recognizing deviations. For example, the longer a teacher teaches, the more likely she is to assume that she can recognize a cheater; this overconfidence can prevent the teacher from evaluating individual cases with thoughtful analysis.)   

Defying Conventional Wisdom

According to Stephens-Davidowitz, another advantage data has over human intuition is that it isn’t beholden to conventional wisdom. Stephens-Davidowitz argues that humans are overly influenced by a good story, which leads us to make faulty predictions or explanations because they sound good or fit a preconceived notion. For example, sports commentators will often laud players for consistently getting clutch hits or hitting high-pressure shots—but data analysis suggests that clutch performance is mostly a statistical mirage rather than a repeatable skill.  

Data Gives the Social Sciences More Rigor

In addition to improving on our natural intuition, data studies can make the social sciences more rigorous. Stephens-Davidowitz notes that traditionally, there’s a divide between hard sciences (such as physics and chemistry) and soft sciences (such as psychology and sociology). That divide boils down to differences in method and types of evidence, with critics accusing the social sciences of advancing theories that can’t be falsified. 

Stephens-Davidowitz gives the example of Freud’s theories of sexuality, which Freud based on his own observations and interpretations rather than on experimental evidence. Stephens-Davidowitz shows how Google and Pornhub search data let us test these previously untestable ideas (he finds no evidence for Freud’s claim that phallic symbols in dreams reveal latent desires; on the other hand, he finds a surprising number of searches for parent-child incest videos, suggesting some truth to Freud’s Oedipal theory).

Seth Stephens-Davidowitz: Why Big Data Matters

———End of Preview———

Like what you just read? Read the rest of the world's best book summary and analysis of Seth Stephens-Davidowitz's "Everybody Lies" at Shortform .

Here's what you'll find in our full Everybody Lies summary :

  • How people confess their darkest secrets to Google search
  • How this "big data" can be used in lieu of voluntary surveys
  • The unethical uses and limitations of big data

Katie Doll

Somehow, Katie was able to pull off her childhood dream of creating a career around books after graduating with a degree in English and a concentration in Creative Writing. Her preferred genre of books has changed drastically over the years, from fantasy/dystopian young-adult to moving novels and non-fiction books on the human experience. Katie especially enjoys reading and writing about all things television, good and bad.

Leave a Reply

Your email address will not be published.