The 3 Challenges in Data Science When Using Dangerous Models

A graph representing the challenges of data science with bars going up and down.

This article is an excerpt from the Shortform book guide to "Weapons of Math Destruction" by Cathy O'Neil. Shortform has the world's best summaries and analyses of books you should be reading.

Like this article? Sign up for a free trial here.

What are the main challenges in data science? How can mathematical models be dangerous?

A mathematical model is a mathematical simulation of a real-world event. Three characteristics make them dangerous: they’re opaque, they don’t incorporate feedback, and they operate on a large scale.

Keep reading to learn more about the challenges in data science and math models.

1. Dangerous Models Are Opaque

According to Cathy O’Neil, one of the challenges in data science is that dangerous mathematical models don’t make their methods known to the public. Often, the inner workings of these models are considered valuable proprietary information—and the companies that own these models closely guard this information. O’Neil argues that this lack of transparency makes it impossible for outsiders to critique a particular model as outsiders aren’t permitted to know how the model works.

Lack of transparency also makes it impossible for individuals to protest the way models judge them. For example, if a bank uses a mathematical model to decide to deny you a loan and refuses to explain how the model reached that decision, you’re left with no recourse except to look elsewhere.

(Shortform note: O’Neil’s argument about the lack of transparency in mathematical models echoes the ethos of the open source movement. This movement promotes the transparent production and distribution of software, giving anyone access to the source code and allowing them to spot and fix bugs. However, some argue that transparency alone isn’t enough to improve a product—without the proper context and expertise, users won’t be able to accurately interpret information. Similarly, making mathematical models transparent won’t guarantee that consumers will be able to improve them; they’ll also need to know how to make sense of what they’re seeing.)

By contrast, good mathematical models make their methods transparent, says O’Neil. By publicizing the data and methods they use to make their judgments, they open themselves to critique—if a method is obviously flawed or unfair, transparency allows outsiders to point out the flaw so it can be corrected.

Transparency is especially important in models that judge individuals, such as credit scores. Transparency allows consumers to understand the criteria their score is based on and the steps they can take to improve their score.

(Shortform note: While O’Neil argues that transparent mathematical models allow for flaws to be spotted and corrected, this doesn’t ensure that organizations will actually correct those flaws. The credit scoring system is an example—despite studies citing the ways in which the system is flawed and consumer complaints about inaccuracies, credit bureaus haven’t made significant changes. One reason may be that correcting flaws isn’t simple or straightforward and may have some unintended consequences. For example, using alternative data such as rent payments for credit scores may backfire on renters in the event of a recession—they may struggle to make their rent payments on time, which would then have a negative impact on their credit scores.)

2. Dangerous Models Don’t Incorporate Feedback

The next defining characteristic of dangerous mathematical models is a failure to incorporate feedback. According to O’Neil, good mathematical models are constantly incorporating feedback based on the accuracy of their predictions. If their predictions are accurate, the model remains unchanged, but when errors are detected, the model is adjusted. By incorporating feedback, good models slowly become more accurate over time, as their equations and algorithms are tweaked to avoid replicating old errors.

By contrast, dangerous models don’t collect or incorporate feedback. Models that don’t collect feedback remain oblivious to their own errors. If the original model includes bias, such models will continue to replicate that bias, without their designers ever realizing what’s happening.

For example, suppose a news channel in Stockholm debuts a new weather forecasting model that boldly predicts 90-degree highs in the middle of winter. If it’s a good model, it’ll be tweaked over the next week as each of its predictions misses the mark. However, a dangerous model would simply be allowed to continue incorrectly predicting unseasonable highs and lows, without incorporating the results of those predictions into its algorithms.

(Shortform note: The process of incorporating feedback is much like the Japanese concept of kaizen—continuous, incremental improvement—that’s typically applied to manufacturing. In The Toyota Way, Jeffrey K. Liker details how automotive company Toyota practices kaizen by constantly seeking to improve the manufacturing process: Workers stop immediately once they detect a problem, zero in on the problem, determine its root cause, and then fix it on the spot. By doing so, Toyota prevents other problems that may pile up as a result of the initial problem. Applied to mathematical models, kaizen can enable data scientists to detect errors in their early stages, address them, and prevent bigger problems from arising.)

3. Dangerous Models Operate on a Large Scale

The final characteristic of dangerous mathematical models is excessive scale. Good mathematical models are used at a reasonable scale, only in the contexts they were designed to simulate. When used within their designated contexts, good mathematical models often make accurate predictions, and when they fail, the harm they do is limited.

For example, imagine a mathematical model designed to predict your favorite color based on your shopping habits. Used within a reasonable context, this model might be used to present you with ads for clothing in your favorite color—such a model would be helpful to both you and the advertiser. And, if the model got your favorite color wrong, the only consequence would be presenting you with a blue dress instead of a red one.

By contrast, dangerous mathematical models are deployed at massive scale, often far beyond the contexts they were originally designed for. O’Neil says that when used in such broad contexts, even low rates of inaccuracy cause harm to many people, as even small fractions of massive numbers often represent sizable groups of people.

For example, imagine that researchers discover a slight correlation between favorite color and credit—people who prefer blue are found to be slightly more likely to pay their bills on time than those who prefer red. In response, banks start using the favorite color model to set interest rates for mortgages. Nationwide, people start paying different amounts for the same services, all because their favorite color supposedly corresponds to their reliability. As a result of being deployed at scale, far beyond its intended context, the favorite color model becomes dangerous.

Making Mathematical Models Antifragile

Mathematical models that become harmful when deployed at scale are what Nicholas Nassim Taleb might call fragile systems—predictive models that fall apart under big changes such as scaling. In Antifragile, he writes that one way to minimize the harmful effects of models used at a massive scale is to perform an acceleration of harm test: Take the model and ask, “What if it’s wrong?” Change key assumptions incrementally and assess how it affects the results. If negative changes outpace positive ones, then the model ideally shouldn’t be used at scale. In other words, stress-test a model by pushing its assumptions to extreme or unexpected scenarios in order to see if the model breaks down or leads to harmful outcomes.

For example, if a bank were to use mathematical models for loan approvals, it could test three assumptions: 1) the borrower’s income remains stable over time, 2) the economy remains stable and unemployment rates remain low, 3) credit scores accurately reflect a borrower’s creditworthiness. The bank would take each of these assumptions and ask “What if it’s wrong?” (For instance, what if the borrower’s income doesn’t remain stable over time?) By stress-testing each of these assumptions, the bank would be able to identify the vulnerabilities and limitations of the model and determine whether it’s a good model to use at scale.

The 3 Challenges in Data Science When Using Dangerous Models