Language Processing in the Brain: How We Know What We Hear

This article is an excerpt from the Shortform book guide to "The Language Instinct" by Steven Pinker. Shortform has the world's best summaries and analyses of books you should be reading.

Like this article? Sign up for a free trial here.

When you listen to someone, how does your brain find meaning in what they’re saying? How does your memory play a role?

In The Language Instinct, experimental psychologist Steven Pinker explores many facets of human language. This includes language processing in the brain, which is quite a remarkable undertaking.

Continue reading to learn how our brains process speech.

Language Processing in the Brain

In his exploration of language processing in the brain, Pinker points out that there’s no distinct gap between each word when we speak. So, as someone interprets speech, their brain is constantly parsing the audio input, separating it into discrete words, and processing the meaning of words based on memory and context.

Pinker explains that, to process speech, humans not only sort out the individual words but also parse the words into noun phrases, verb phrases, and prepositional phrases. We logically link the phrases, use our short-term memory to keep track of multiple phrases, and interpret the most likely meaning of each word as we go along.

For example, in the sentence, “The proportion of advanced students in my science class increased this year,” the brain has to recognize that the phrases “of advanced students” and “in my science class” are both included in the noun phrase with “proportion” as its subject or head noun. Then when we finally get to the verb phrase “decreased this year,” we have to remember that it refers to “proportion” from the beginning of the sentence.

(Shortform note: Pinker uses English to explain the principles of language interpretation, but it’s unclear if the cognitive process of parsing different types of phrases applies to other languages with very different syntax. For example, as we mentioned in the previous section, many languages create entire sentences by adding morphemes onto a single word, so they don’t have different phrases to group together.)

Combining Word-By-Word Interpretation and Cultural Nuance

Understanding sentences is partly a modular process because we group words into phrases, but we also interpret the most likely meaning of each word as we go along. Sometimes, if we initially interpret the wrong meaning of a word, we have to backtrack and try interpreting the sentence again with a different plausible word meaning.

For example, when people read the sentence “The man who whistles tunes pianos,” they likely assume that “tunes” is a noun (the thing that the man whistles) since “whistles tunes” is a common word combination. In this case, the word “pianos” afterward doesn’t make any sense. Re-reading it, they can then see that “who whistles” describes the man, and he performs the action of tuning pianos. These ambiguous sentence structures that tend to lead you to a false interpretation are called “garden path” sentences.

Pinker points out that, in addition to quickly choosing a word meaning based on the context, people rely on subtext, humor, sarcasm, and metaphor to understand what other people are really saying. This is partly due to our desire to adhere to social norms, like being polite. For example, if someone says, “Do you think you could use your headphones to listen to music?”, this might be a polite way of saying, “Your music is bothering me.”

Processing Written and Spoken Language

Pinker’s examples of garden path sentences and contextual nuance demonstrate the differences in language processing for spoken language compared to written language. For example, the garden path phenomenon only applies to written text because people provide verbal cues about a sentence’s meaning by changing their tone and inflection when they speak: a quality called “prosody.”

Thus, written language relies partly on punctuation to help the reader visually group words while spoken language relies on prosody and visual cues in the environment. For example, sarcasm and humor are easier to interpret when listeners can hear the speaker’s tone and observe their facial expressions. When listening to speech, people might also interpret a pronoun for a person or interpret a vague word such as “that” by following the speaker’s line of sight. Despite these differences, research suggests that both visual and spoken language are mostly processed incrementally as we read or hear each word.

Pinker asserts that the combination of these skills—grouping types of phrases, identifying a word’s meaning based on the context, and incorporating cultural nuance—are what make the human approach to interpreting language highly sophisticated and difficult to replicate. For example, artificial intelligence (AI) language models exclusively use word-by-word probability algorithms to interpret language, and they also lack cultural knowledge that helps people deduce meaning. Pinker claims that without these uniquely human advantages, AI will never come close to interpreting language with the same accuracy as humans.

The Evolution of AI Language Models

Since the book was published, AI language models have advanced far beyond what Pinker imagined was possible. Older models—called probabilistic or n-gram models—relied on a relatively small amount of input data to create strings of text by identifying the word that’s most likely to follow the previous word. These types of models had limited ability to understand context and implied subtext in sentences like, “Why do you think my wife left me?” In this case, the AI model wouldn’t understand the implication that the speaker’s wife ended their relationship, and the AI model might interpret the sentence literally instead.

However, modern neural network-based language models are trained on vast amounts of data, allowing them to incorporate many layers of contextual information and interpret language more accurately. These advances have brought AI language models closer to the human level of language competence, but some argue that AI models will never match human social competence and creativity.

Language Processing in the Brain: How We Know What We Hear