Everybody Lies, by Seth Stephens-Davidowitz, is about big data’s potential to revolutionize social science research. The book’s central premise is that people reveal more about themselves when making web searches than they would ever reveal in public or a traditional survey. Stephens-Davidowitz argues that by harnessing data from search results and similar sources, scientists have access to all new insights into issues like sexuality, racism, and health. He suggests that these insights can inform better social policies, improve institutions like education and health care, promote social equity, and bring hidden injustice to light.
Stephens-Davidowitz has a Ph.D. in economics and has worked as a...
Unlock the full book summary of Everybody Lies by signing up for Shortform.
Shortform summaries help you learn 10x better by:
Here's a preview of the rest of Shortform's Everybody Lies summary:
Before we get into the specifics of how to use big data well, let’s look at the bigger picture of what big data is and why Stephens-Davidowitz says we should care about it. Though data science might seem arcane, Stephens-Davidowitz argues that it’s really an extension of our natural intuition and of the kinds of studies social scientists have always been interested in.
What Is “Big Data”?
Stephens-Davidowitz explicitly refuses to define “big data,” arguing that the only way to do so is by assigning an arbitrary numerical cutoff—in other words, by deciding that if you have at least X data points, you have big data. While it’s fair to say that big data is a somewhat fluid concept, other data experts accept that there are some key characteristics that we can use to pin down “big data.” Chief among these characteristics are the “three Vs”:
- Volume: The sheer amount of data. As you’d expect, big data implies very large data sets, typically measured in terabytes or petabytes. In refusing to define big data, Stephens-Davidowitz is mostly refusing to identify a specific volume at which data becomes big.
*...
Despite all of these potential advantages, Stephens-Davidowitz acknowledges that it’s easy to use big data ineffectively—for example, by obsessing over the sheer size of your dataset without thinking about what that data can actually do for you. To get the most out of big data, Stephens-Davidowitz says you should focus on its four main benefits: new types of information, unprecedented honesty, high resolution, and easy cause-effect analysis. In this section, we’ll explore each of these benefits in turn.
Stephens-Davidowitz says that one of the benefits of big data is that it opens our eyes to new types of information that weren’t previously available or that we might not previously have been able to study. (Shortform note: This benefit essentially expounds on big data’s variety—one of the three Vs mentioned earlier.)
There’s nothing new about using words as data. As Stephens-Davidowitz points out, linguists, social scientists, historians, and others have studied words and word usage for a long time. But big data dramatically enhances the type and volume of words researchers can study. (Shortform...
This is the best summary of How to Win Friends and Influence People I've ever read. The way you explained the ideas and connected them to other books was amazing.
Even though Stephens-Davidowitz is openly enthusiastic about data studies, he’s aware that data has drawbacks and limitations and can lead to great harm if used unethically. In this section, we’ll look at some of the drawbacks and dangers Stephens-Davidowitz identifies and explore some cases where these dangers have come to pass since the book’s publication.
Stephens-Davidowitz warns that good data science isn’t just a matter of amassing a giant data set. When working with data, he says it’s important to keep data’s shortcomings in mind and not lose sight of the bigger picture.
Stephens-Davidowitz says that when a dataset is too detailed, it can lead to predictive errors. The problem, he says, is the curse of dimensionality—a phenomenon whereby the more details a dataset contains, the more likely it is to suggest false positives when you look for predictive correlations.
Stephens-Davidowitz gives the example of flipping coins to try to predict the stock market. Say you flip a coin every day, record whether it was heads or tails, and then record whether the stock market went up or down...
Stephens-Davidowitz says that good data studies are intuitive—they answer relevant and interesting questions by making the kinds of connections we make all the time. Let’s see how intuitive this kind of work is by thinking through a problem the way a data scientist might.
What is a question you’d like answered or a problem you’d like to solve that big data could help with?
"I LOVE Shortform as these are the BEST summaries I’ve ever seen...and I’ve looked at lots of similar sites. The 1-page summary and then the longer, complete version are so useful. I read Shortform nearly every day."
Jerry McPhee