Aggregation Statistics: The Power of Collective Data

This article is an excerpt from the Shortform book guide to "Superforecasting" by Philip E. Tetlock. Shortform has the world's best summaries and analyses of books you should be reading.

Like this article? Sign up for a free trial here .

What is the function of aggregation statistics? How does aggregating data from multiple sources influence forecast accuracy?

Aggregation statistics combine data from multiple sources. Aggregate numbers are often used in forecasting to obtain a more accurate prediction.

Read about aggregation statistics as it relates to forecasting.

What Are Aggregation Statistics?

In the context of forecasting, aggregation statistics are a powerful tool, because the aggregated collective judgment of a group of people is usually more accurate than the judgment of an average member of the group.

Even if some individual judgments are more accurate than the aggregate number, the collective judgment is still more likely to be accurate than if you were to select a group member at random and use their individual answer. This phenomenon is often called “the wisdom of crowds,” named for the 2004 book that popularized the idea.

(A classic example of the wisdom of crowds effect comes from a 1906 country fair, where British scientist Sir Francis Galton observed a crowd of people attempt to guess the weight of an ox. The average of all their guesses was only one pound different from the true answer.)

The reason this works is that when a group of people faces a single problem, each person has a small piece of the puzzle. In the country fair example, the crowd may have included a butcher, a farmer, and someone who had been part of the same guessing game at last year’s fair. Each person contributed a piece of information—adding those pieces together gave a much more accurate picture of the situation than any individual person in the crowd possessed.

This effect works even when some members of the crowd are outright wrong. In the fair example, there was only one right answer, but countless possible wrong answers. This means that the few people who had some valid information to inform their guess would all have very similar predictions, but people who guessed randomly were equally likely to guess too high or too low a number. In a big enough group, these wrong answers would then cancel each other out, and the group average would still be roughly accurate.
However, the makeup of the crowd is important: If no one has any valid information, the average answer won’t be magically accurate. The more experts in the crowd, the more accurate the collective judgment is likely to be.

For even more accurate results, it’s also possible to use aggregations of aggregations. In practice, this looks like conducting a survey of a large group of people to get an average answer, repeating the process with new groups of people, and ultimately taking the average of those averages. The resulting answer is likely to be far more accurate than any individual judgment.

Aggregation Statistics: The Power of Collective Data