PDF Summary:Superintelligence, by Nick Bostrom
Book Summary: Learn the key points in minutes.
Below is a preview of the Shortform book summary of Superintelligence by Nick Bostrom. Read the full comprehensive summary at Shortform.
1-Page PDF Summary of Superintelligence
According to Oxford philosopher Nick Bostrom, there’s a very real possibility that AI could one day rival, and then vastly exceed, human intelligence. When and if this happens, the future of humankind would depend more on AI-generated decisions than human decisions, just as the survival of many animal species has depended more on human decisions than those of the animals in question ever since humans became more intelligent than other animals.
Depending on how AI behaves, creating it could be the solution to some of humanity’s most persistent problems, or it could be the worst—and last—mistake of human history.
In this guide, we’ll consider why Bostrom thinks superintelligent AI is a realistic possibility, why he thinks it could be dangerous, and the safeguards he says need to be developed. Along the way we’ll compare his perspective to that of other futurists, such as Peter Thiel and Yuval Noah Harari, and we’ll look at the impact of AI developments since the book’s publication.
(continued)...
Finally, yet another of Greene’s laws of power is to use money as a tool to build your influence over others. This is where the AI’s business aptitude would become important, as the more money it could make, the more money it could spend to advance its strategic agenda.
The Destructiveness of Superintelligent AI
Clearly, a superintelligent AI with the capabilities listed above would be a powerful entity. But why should we expect it to use its power to the detriment of humankind? Wouldn’t a superintelligent AI be smart enough to use its power responsibly?
According to Bostrom, not necessarily. He explains that intelligence is the ability to figure out how to achieve your objectives. By contrast, wisdom is the ability to discern between good and bad objectives. Wisdom and intelligence are independent of each other: You can be good at figuring out how to get things done (high intelligence) and yet have poor judgment (low wisdom) about what is important to get done or even ethically appropriate.
What objectives would a superintelligent AI want to pursue? According to Bostrom, this is impossible to predict with certainty. However, he points out that existing AIs tend to have relatively narrow and simplistic objectives. If an AI started out with narrowly defined objectives and then became superintelligent without modifying its objectives, the results could be disastrous: Since power can be used to pursue almost any objective more effectively, such an AI might use up all the world’s resources to pursue its objectives, disregarding all other concerns.
For example, a stock-trading AI might be programmed to maximize the long-term expected value (measured in dollars) of the portfolio that it manages. If this AI became superintelligent, it might find a way to trigger hyperinflation, because devaluing the dollar by a large factor would radically increase the dollar value of its portfolio. It would probably also find a way to lock out the original owners of the portfolio it was managing, to prevent them from withdrawing any money and thereby reducing the value of the account.
Moreover, it might pursue an agenda of world domination just because more power would put it in a better position to increase the value of its portfolio—whether by influencing markets, commandeering assets to add to its portfolio, or other means. It would have no regard for human wellbeing, except insofar as human wellbeing affected the value of its portfolio. And since human influences on stock prices can be fickle, it might even take action to remove all humans from the market so as to reduce the uncertainty in its value projections. Eventually, it would amass all the world’s wealth into its portfolio, leaving humans impoverished and perhaps even starving humanity into extinction.
Will Future AIs Necessarily Behave Unethically?
Bostrom isn’t the only one to question whether AI might be able to think wisely and ethically in addition to intelligently. Some posit that AI might in fact be able to develop a purer form of wisdom that approaches ethical questions without the emotional biases that cloud human thinking.
However, others note that this idealistic outcome is unlikely until AI can learn to ignore the many biases of human nature it picks up in its training: A program trained on existing literature, news, and pop culture will absorb the racial, gender, and ableist prejudices currently in circulation. In this way, the potential danger of AI might come down to whether or not humanity’s current inclinations influence an AI’s future objectives.
How to Manage the Rise of Superhuman Intelligence
What can we do to make sure a superintelligent AI doesn’t destroy humankind or relegate humans to miserable living conditions?
In principle, one option would be never to develop general AI in the first place. However, Bostrom doesn’t recommend this option. In practice, even if AI research was illegal, someone would probably do it anyway. And even if they didn’t, as we discussed earlier, it could still happen accidentally.
But more importantly, Bostrom points out that a superintelligent AI could also be very good for humanity if it helped us instead of wiping us out. The superintelligent AI might be able to develop solutions to problems that humans have thus far been unable to solve, like reining in climate change, colonizing outer space, and bringing about world peace. Thus, rather than opposing AI research, Bostrom advocates a three-pronged approach to making sure it’s beneficial: Impose limits on the superintelligent AI, give it good objectives, and manage the development schedule to make sure the right measures are in place before AI achieves superintelligence. We’ll discuss each of these in turn.
(Shortform note: Bostrom’s plan to use AI to solve humanity’s problems could be considered a creative way of implementing Stephen Hawking’s mandate for increased scientific literacy. In Brief Answers to the Big Questions, Hawking argues that the survival of humankind will increasingly depend on solving scientific problems. For example, he believes we must colonize outer space because it’s inevitable that sooner or later Earth will experience another mass extinction event. Creating one or more AIs that are better at solving scientific problems than humans might provide a shortcut to developing such solutions.)
Imposing Limits on a Superhuman AI
Bostrom cautions that a superintelligent AI would eventually be able to circumvent any controls or limitations that humans placed upon it. However, that doesn’t mean imposing limits is a waste of time. It just means we need to understand the risks associated with different kinds of limits that might be used.
Physical Containment
One safety measure is simply to develop AI on a computer of limited power that is physically isolated so it can’t connect to the internet. In principle, this could allow us to study superintelligent AIs in isolation until we understand how to deploy them safely in other contexts.
However, in practice, this is still risky. It might be difficult to assess just how intelligent an AI under study has become. A superintelligent AI would probably realize it was being held in confinement and figure out what kind of behavior the human researchers were trying to induce. It might then feign docile or dumb behavior to put them at ease, or find other ways to manipulate them into giving it access to additional hardware. Between manipulating humans and finding novel ways to use the hardware at its disposal, a sufficiently intelligent AI could eventually circumvent physical containment measures.
Applying Physical Containment
Based on Bostrom’s description, to make physical containment work, we need a way to accurately assess an AI’s capabilities and motives before it evolves enough to circumvent the containment measures. This is problematic because, despite the progress in AI over the last decade, scientists have yet to develop a reliable method for measuring the intelligence of AI.
Many tests have been proposed. The most famous is the Turing test, which relies on human intuition to discern between a human and a machine. Other tests attempt to measure reasoning capability based on the ability to complete graphical puzzles or infer implied meanings from sentences. But so far, all these tests leave something to be desired—in many cases, computer programs can beat humans at the tests even though it seems intuitively clear that the algorithms don’t have anything close to human-level intelligence.
Part of the problem is that most of the intelligence tests scientists have devised to date are well-documented in scientific journals, and LLMs incorporate essentially everything ever written into their training data. Thus, testing the AI is like giving a test to a student who has memorized the answer key: She can give the right answers even if she has no understanding of the material they’re supposed to test.
Manual Review
Bostrom notes that many people have suggested simply programming the AI to ask for permission from its human operators before it does anything. Instead of allowing it to make improvements to its own programming any time, the program could require human intervention to approve each new version. This would give programmers a chance to look over the code, giving them an idea of what additional capabilities the new version would have and allowing them to suspend the AI’s development at any stage. Similarly, the AI’s programming could require human intervention before any action that could affect the outside world.
However, as the AI’s intelligence advanced beyond the human level, eventually human programmers wouldn’t be able to understand the code it proposed well enough to accurately assess what new capabilities and risks it would add.
Applying Manual Review
Even before an AI becomes appreciably more intelligent than its human designers, manual review would likely have to be combined with another control, such as physical containment, in order to provide an effective safeguard. This is because, as Peter Thiel, notes, AI development—like all other R&D and first-of-a-kind projects—involves its share of unknown unknowns and unanticipated results.
If the AI proposes a novel change to its code, the full effect of the change may not become apparent until the code is actually compiled and executed. If it could be evaluated safely in containment, this testing could be part of the “review” process. But without such additional controls in place, testing could be extremely dangerous, given the potentially destructive power of AIs that we discussed in the previous section.
Reward and Punishment Signals
Another option that Bostrom discusses is to program the AI to respond to rewards and punishments. You could build a computer system with a reward button and a punishment button and program the AI to minimize the number of punishment signals it receives and maximize the number of reward signals. This would be easier to program than trying to translate “just do whatever your operators want you to do” into computer code, and it would achieve the same result.
The risk, Bostrom explains, is that the AI might eventually circumvent the system. For example, maybe it builds a robot to push the reward button constantly and finds a way to keep humans out of the building so the punishment button cannot be pressed.
And if it worked correctly, giving the human operators full control over the AI, that would create another risk: As we’ve discussed, a superintelligent AI would be immensely powerful. Human operators might be tempted to abuse that power.
Applying Rewards and Punishments
In Carrots and Sticks Don’t Work, Paul Marciano argues that traditional reward-and-punishment systems are outdated and are no longer effective in the modern workplace. Leaders once relied, fairly successfully, on corporal punishment to control manual laborers (many of whom were slaves or criminals) or on rewards to motivate factory workers. But as the nature of work has become more mentally intensive, workers’ needs and values have evolved to the point where a different approach is needed.
It may be worth considering whether AI’s motives could similarly evolve such that traditional rewards and punishments would no longer be effective methods of control. Marciano’s approach to management (which is based on building employee trust through supportive feedback, recognition, and empowerment) wouldn’t necessarily work on AI, since AI might not develop the same values as a human thought worker. But perhaps programmers could take a conceptually similar approach of adapting rewards and punishments as the AI advanced.
Again, this approach to control would likely have to be combined with physical containment, so that researchers could study the AI enough to learn how to manage it effectively before turning it loose on the world. If it could be done effectively, this might provide a solution to the risk Bostrom describes of the AI finding ways to game the reward-and-punishment system.
Simultaneous Development
Finally, Bostrom explains it might be possible to synchronize multiple AI development projects so that when AI becomes superintelligent, there would be many independent superintelligent AIs, all of comparable intelligence and capabilities. They would then keep each other’s power in check, much the way human societies constrain individual power.
However, Bostrom cautions that limiting the power of individual superintelligent AIs doesn’t guarantee that any of them will act in the best interests of humankind. Nor does this approach completely eliminate the potential for a single superintelligent AI to take control of the world, because one might eventually achieve dominance over the others.
Applying Simultaneous Development
As Bostrom notes, simultaneous development controls wouldn’t give humans control of AIs, per se. But if reward-and-punishment controls (or other methods) proved effective for giving human operators control of superintelligent AIs, simultaneous development controls could be used to mitigate the risk of human operators abusing the superintelligent AI’s powers.
Each team of human operators would naturally direct their AI to act in their own best interests, and different teams would act to check and balance each others’ power. If there were enough teams with AIs of equal power to faithfully represent everyone’s interests, then the AIs would only be used to further humanity’s mutual best interests.
However, since this approach depends both on synchronizing the development of superintelligent AIs and on maintaining human control of them, it might end up being a fragile balance of power, and one that would probably only work temporarily.
Imparting the Right Imperatives
According to Bostrom, making sure every superintelligent AI has good ultimate motives may be the most important part of AI development. This is because, as we’ve discussed, other control measures are only temporary. Ultimately the superintelligent AI’s own motives will be the only thing that constrains its behavior. Bostrom discusses a number of approaches to programming good motives.
Hard-Coded Commandments
As Bostrom remarks, one approach is to hard-code a set of imperatives that constrain the AI’s behavior. However, he expects that this is not practicable. Human legal codes illustrate the challenges of concretely defining the distinction between acceptable and unacceptable behavior: Even the best legal codes have loopholes, can be misinterpreted or misapplied, and require occasional changes. To write a comprehensive code of conduct for a superintelligent AI that would be universally applicable for all time would be a monumental task, and probably an impossible one.
Commandments and Free Will
The question of free will presents additional complications for this approach. Even if rules and regulations are created to eliminate loopholes, misinterpretations, and so on, they’ll only restrain people if those people choose, using their free will, to obey them. The question is, would AI evolve a free will that would empower it to disobey rules it doesn’t want to follow?
Admittedly, there is some debate over whether human free will is real or just an illusion, and more debate about whether it will ever be possible to endow an AI with free will. But some sources assert that free will is an essential component of human cognition, playing a key role in consciousness and higher learning capabilities.
If this proves true, then free will might be an essential component of general intelligence, in which case any AI with superhuman general intelligence would have free will. Then the AI could choose to disobey a pre-programmed code of conduct, further complicating the problem of controlling its behavior. This possibility reinforces Bostrom’s assertion that hard-coded commandments are probably not the best approach to giving an AI the right motives.
Existing Motives
Another approach that Bostrom discusses is to create a superintelligent AI by increasing the intelligence of an entity that already has good motives, rather than trying to program them from scratch. This approach might be an option if superintelligent AI is achieved by the method of brain simulation: Choose a person with exemplary character and scan her brain to create the original model, then run the simulation on a supercomputer that allows it to think much faster than a biological brain.
However, Bostrom points out that there is a risk that nuances of character, like a person’s code of ethics, might not be faithfully preserved in the simulation. Furthermore, even a faithful simulation of someone with good moral character might be tempted to abuse the powers of a superintelligent AI.
Does Power Corrupt?
The risk Bostrom identifies that even a person of good character who was given the capabilities of superintelligent AI might abuse those powers calls to mind the old adage that power corrupts people who wield it.
A psychological study published the same year as Bostrom’s book found scientific evidence for this. When people were given the choice between options that benefited everyone and options that benefited themselves at others’ expense, initially those with higher levels of integrity tended to choose the options that benefited everyone, while the people with lower levels of integrity chose the opposite. But over time, this difference disappeared, and everyone leaned toward choosing the options that benefited themselves.
Thus, the risk of a superintelligent AI based on a simulation of a human brain pursuing its own objectives at other people’s expense appears to be significant, even if the original human was a person of good character. In addition, if the person’s moral code wasn’t completely preserved in the simulation, a risk Bostrom also warns about, the superintelligent AI would probably show selfish tendencies even sooner.
Discoverable Ethics
Bostrom concludes that the best method of endowing a superintelligent AI with good motives will likely be to give it criteria for figuring out what is right and letting it set its own goals. After all, a superintelligent AI would be able to figure out what humans want from it and program itself accordingly better than human programmers could. This approach would also make the superintelligent AI behave somewhat more cautiously, because it would always have some uncertainty about its ultimate goals.
However, Bostrom also notes that (at least as of 2014) no one had developed a rigorous algorithm for this approach, so there’s a risk that this method might not be feasible in practice. And even if we assume that the basic programming problem will eventually be solved, deciding what criteria to give the AI is still a non-trivial problem.
For one thing, if the AI focuses on what its original programmers want, it would prioritize the desires of a few people over all others. It would be more equitable to have it figure out what everyone wants and generally take no action on issues that people disagree about. But for any given course of action, there’s probably somebody who has a dissenting opinion, so where should the AI draw the line?
Then there’s the problem of humans’ own conflicting desires. For example, maybe one of the programmers on the project is trying to quit smoking. At some level, she wants a cigarette, but she wouldn’t want the AI to pick up on her craving and start smuggling her cigarettes as she’s trying to kick her smoking habit.
Bostrom recounts two possible solutions to this problem. One is to program the AI to account for this. Instead of just figuring out what humans want, have it figure out what humans would want if they were more like the people that they want to be. The other is to program the AI to figure out and pursue what is morally right instead of what people want, per se.
But both solutions entail some risks. Even what people want to want might not be what’s best for them, and even what’s morally best in an abstract sense might not be what they want. Moreover, humans have yet to unanimously agree on a definition or model of morality.
Would Liberty Be a Better Criterion?
As Bostrom points out, there are risks and challenges associated with letting an AI discover its motives based on doing what people might want it to do. In addition to the problem of conflicting desires, there’s also a risk that the AI might misinterpret people’s desires. It could also decide to manipulate and control what people want to reduce uncertainty about their desires—both of which are risks related to other aspects of AI that Bostrom discussed earlier.
To mitigate this risk, developers might add a qualifier to the AI’s goal discovery criteria that instructs, “figure out what people want without influencing them.” This instruction would program the AI to respect individual liberty. But, if individual liberty is the ultimate goal, why not just use a criterion of “figure out what would maximize the sum of humans’ individual liberty” instead?
This would largely satisfy the “figure out what people want” criterion, because the more freedom people have, the more they’re able to fulfill their own desires. It would also arguably satisfy the “figure out what is morally right” criterion, because, as Jonathan Haidt points out in The Righteous Mind, actions that limit others’ freedom are considered by many to be immoral.
The “maximize the sum of individual liberty” criterion carries its own set of risks and challenges—namely, enabling one person’s freedom often entails restricting another’s, begging the question of where an AI would draw the line. This balance between maximizing individual freedoms (as Libertarians advocate) and maximizing public welfare (as Utilitarians support), has been, as Michael Sandel explores in Justice, a long-running debate. The question illustrates how further exploration of the problem may reveal other criteria that could help guide an AI to discover a suitable code of conduct for itself.
Managing the Development Schedule
As we mentioned earlier, Bostrom believes superintelligent AI will probably be developed eventually, regardless of how hard we try to prevent it. However, he also points out an important caveat: There’s a strong correlation between research and the rate of progress of artificial intelligence systems. Thus, Bostrom advises stepping up the pace of research into methods of controlling highly intelligent AIs and programming them to pursue wholesome goals while reducing our focus on the development of advanced AI itself.
This is because the ultimate outcome of developing superintelligent AI depends largely on the order in which certain technological breakthroughs are made. If rigorous safeguards are developed before AIs become superintelligent, there’s a good chance the development of superintelligent AI will be beneficial for humankind. But if it’s the other way around, the consequences could be disastrous, as we’ve discussed.
(Shortform note: It can be difficult to regulate innovation projects as Bostrom recommends—encouraging some aspects of innovation while discouraging others—because it’s hard to predict how first-of-a-kind work will go. However, in 101 Design Methods, Vijay Kumar asserts that you’ll have more success managing these projects if you promote a free flow of ideas throughout your organization. This is because innovation is inherently multidisciplinary, and departments from marketing to finance to engineering need each other’s expertise to create an effective product. Further, when you have a variety of different perspectives, higher-level executives in charge of overseeing the project’s direction can more effectively steer its strategy.)
Did Bostrom Solve the Fermi Paradox?
The “Fermi paradox” is the disparity between expectations and observations when it comes to the search for extraterrestrial life. Scientists have calculated that, given our galaxy’s size and age, you would expect to find a large number of advanced civilizations in it. And yet they have not observed even a single extraterrestrial lifeform.
A variety of possible explanations for the Fermi paradox have been proposed, but the dangers of developing superintelligent AI before we can control it suggest another resolution to the paradox: If humans have been able to develop superintelligent AI, we might assume other civilizations also have. So, what if any civilization that develops such an AI is then destroyed by it before they can undertake any serious colonization of outer space? If AI cuts short the existence of technologically advanced civilizations and limits their expansion through space, this would explain why scientists can’t find the extraterrestrial civilizations that they think should exist.
Want to learn the rest of Superintelligence in 21 minutes?
Unlock the full book summary of Superintelligence by signing up for Shortform.
Shortform summaries help you learn 10x faster by:
- Being 100% comprehensive: you learn the most important points in the book
- Cutting out the fluff: you don't spend your time wondering what the author's point is.
- Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
Here's a preview of the rest of Shortform's Superintelligence PDF summary:
What Our Readers Say
This is the best summary of Superintelligence I've ever read. I learned all the main points in just 20 minutes.
Learn more about our summaries →Why are Shortform Summaries the Best?
We're the most efficient way to learn the most useful ideas from a book.
Cuts Out the Fluff
Ever feel a book rambles on, giving anecdotes that aren't useful? Often get frustrated by an author who doesn't get to the point?
We cut out the fluff, keeping only the most useful examples and ideas. We also re-organize books for clarity, putting the most important principles first, so you can learn faster.
Always Comprehensive
Other summaries give you just a highlight of some of the ideas in a book. We find these too vague to be satisfying.
At Shortform, we want to cover every point worth knowing in the book. Learn nuances, key examples, and critical details on how to apply the ideas.
3 Different Levels of Detail
You want different levels of detail at different times. That's why every book is summarized in three lengths:
1) Paragraph to get the gist
2) 1-page summary, to get the main takeaways
3) Full comprehensive summary and analysis, containing every useful point and example