How to Control AI: The 3 Steps to Take to Save Humanity

What can we do to make sure a superintelligent AI doesn’t destroy humanity? How can we control AI?

In principle, one option would be never to develop general AI in the first place. However, Superintelligence by Nick Bostrom says that someone will still develop it anyway and that the best way to prepare for it is to control it ethically.

Check out how to control AI so it doesn’t relegate humans to miserable living conditions.

How to Manage the Rise of Superhuman Intelligence

Bostrom points out that a superintelligent AI could also be very good for humanity if it helped us instead of wiping us out. The superintelligent AI might be able to develop solutions to problems that humans have thus far been unable to solve, like reining in climate change, colonizing outer space, and bringing about world peace. Thus, rather than opposing AI research, Bostrom advocates a three-pronged approach to making sure it’s beneficial: Impose limits on the superintelligent AI, give it good objectives, and manage the development schedule to make sure the right measures are in place before AI achieves superintelligence. We’ll discuss how to control AI in turn.

(Shortform note: Bostrom’s plan to use AI to solve humanity’s problems could be considered a creative way of implementing Stephen Hawking’s mandate for increased scientific literacy. In Brief Answers to the Big Questions, Hawking argues that the survival of humankind will increasingly depend on solving scientific problems. For example, he believes we must colonize outer space because it’s inevitable that sooner or later Earth will experience another mass extinction event. Creating one or more AIs that are better at solving scientific problems than humans might provide a shortcut to developing such solutions.)

1. Imposing Limits on a Superhuman AI

Bostrom cautions that a superintelligent AI would eventually be able to circumvent any controls or limitations that humans placed upon it. However, that doesn’t mean imposing limits is a waste of time. It just means we need to understand the risks associated with different kinds of limits that might be used.

Physical Containment

One safety measure is simply to develop AI on a computer of limited power that is physically isolated so it can’t connect to the internet. In principle, this could allow us to study superintelligent AIs in isolation until we understand how to deploy them safely in other contexts.

However, in practice, this is still risky. It might be difficult to assess just how intelligent an AI under study has become. A superintelligent AI would probably realize it was being held in confinement and figure out what kind of behavior the human researchers were trying to induce. It might then feign docile or dumb behavior to put them at ease, or find other ways to manipulate them into giving it access to additional hardware. Between manipulating humans and finding novel ways to use the hardware at its disposal, a sufficiently intelligent AI could eventually circumvent physical containment measures.

Manual Review

Bostrom notes that many people have suggested simply programming the AI to ask for permission from its human operators before it does anything. Instead of allowing it to make improvements to its own programming any time, the program could require human intervention to approve each new version. This would give programmers a chance to look over the code, giving them an idea of what additional capabilities the new version would have and allowing them to suspend the AI’s development at any stage. Similarly, the AI’s programming could require human intervention before any action that could affect the outside world.

However, as the AI’s intelligence advanced beyond the human level, eventually human programmers wouldn’t be able to understand the code it proposed well enough to accurately assess what new capabilities and risks it would add.

Reward and Punishment Signals

Another option that Bostrom discusses is to program the AI to respond to rewards and punishments. You could build a computer system with a reward button and a punishment button and program the AI to minimize the number of punishment signals it receives and maximize the number of reward signals. This would be easier to program than trying to translate “just do whatever your operators want you to do” into computer code, and it would achieve the same result.

The risk, Bostrom explains, is that the AI might eventually circumvent the system. For example, maybe it builds a robot to push the reward button constantly and finds a way to keep humans out of the building so the punishment button cannot be pressed.

And if it worked correctly, giving the human operators full control over the AI, that would create another risk: As we’ve discussed, a superintelligent AI would be immensely powerful. Human operators might be tempted to abuse that power.

Simultaneous Development

Finally, Bostrom explains it might be possible to synchronize multiple AI development projects so that when AI becomes superintelligent, there would be many independent superintelligent AIs, all of comparable intelligence and capabilities. They would then keep each other’s power in check, much the way human societies constrain individual power.

However, Bostrom cautions that limiting the power of individual superintelligent AIs doesn’t guarantee that any of them will act in the best interests of humankind. Nor does this approach completely eliminate the potential for a single superintelligent AI to take control of the world, because one might eventually achieve dominance over the others.

2. Imparting the Right Imperatives

According to Bostrom, making sure every superintelligent AI has good ultimate motives may be the most important part of AI development. This is because, as we’ve discussed, other control measures are only temporary. Ultimately the superintelligent AI’s own motives will be the only thing that constrains its behavior. Bostrom discusses a number of approaches to programming good motives.

Hard-Coded Commandments

As Bostrom remarks, one approach is to hard-code a set of imperatives that constrain the AI’s behavior. However, he expects that this is not practicable. Human legal codes illustrate the challenges of concretely defining the distinction between acceptable and unacceptable behavior: Even the best legal codes have loopholes, can be misinterpreted or misapplied, and require occasional changes. To write a comprehensive code of conduct for a superintelligent AI that would be universally applicable for all time would be a monumental task, and probably an impossible one.

Existing Motives

Another approach that Bostrom discusses is to create a superintelligent AI by increasing the intelligence of an entity that already has good motives, rather than trying to program them from scratch. This approach might be an option if superintelligent AI is achieved by the method of brain simulation: Choose a person with exemplary character and scan her brain to create the original model, then run the simulation on a supercomputer that allows it to think much faster than a biological brain.

However, Bostrom points out that there is a risk that nuances of character, like a person’s code of ethics, might not be faithfully preserved in the simulation. Furthermore, even a faithful simulation of someone with good moral character might be tempted to abuse the powers of a superintelligent AI.

Discoverable Ethics

Bostrom concludes that the best method of endowing a superintelligent AI with good motives will likely be to give it criteria for figuring out what is right and letting it set its own goals. After all, a superintelligent AI would be able to figure out what humans want from it and program itself accordingly better than human programmers could. This approach would also make the superintelligent AI behave somewhat more cautiously, because it would always have some uncertainty about its ultimate goals.

However, Bostrom also notes that (at least as of 2014) no one had developed a rigorous algorithm for this approach, so there’s a risk that this method might not be feasible in practice. And even if we assume that the basic programming problem will eventually be solved, deciding what criteria to give the AI is still a non-trivial problem.

For one thing, if the AI focuses on what its original programmers want, it would prioritize the desires of a few people over all others. It would be more equitable to have it figure out what everyone wants and generally take no action on issues that people disagree about. But for any given course of action, there’s probably somebody who has a dissenting opinion, so where should the AI draw the line?

Then there’s the problem of humans’ own conflicting desires. For example, maybe one of the programmers on the project is trying to quit smoking. At some level, she wants a cigarette, but she wouldn’t want the AI to pick up on her craving and start smuggling her cigarettes as she’s trying to kick her smoking habit.

Bostrom recounts two possible solutions to this problem. One is to program the AI to account for this. Instead of just figuring out what humans want, have it figure out what humans would want if they were more like the people that they want to be. The other is to program the AI to figure out and pursue what is morally right instead of what people want, per se.

But both solutions entail some risks. Even what people want to want might not be what’s best for them, and even what’s morally best in an abstract sense might not be what they want. Moreover, humans have yet to unanimously agree on a definition or model of morality.

3. Managing the Development Schedule

As we mentioned earlier, Bostrom believes superintelligent AI will probably be developed eventually, regardless of how hard we try to prevent it. However, he also points out an important caveat: There’s a strong correlation between research and the rate of progress of artificial intelligence systems. Thus, Bostrom advises stepping up the pace of research into methods of controlling highly intelligent AIs and programming them to pursue wholesome goals while reducing our focus on the development of advanced AI itself.

This is because the ultimate outcome of developing superintelligent AI depends largely on the order in which certain technological breakthroughs are made. If rigorous safeguards are developed before AIs become superintelligent, there’s a good chance the development of superintelligent AI will be beneficial for humankind. But if it’s the other way around, the consequences could be disastrous, as we’ve discussed.

(Shortform note: It can be difficult to regulate innovation projects as Bostrom recommends—encouraging some aspects of innovation while discouraging others—because it’s hard to predict how first-of-a-kind work will go. However, in 101 Design Methods, Vijay Kumar asserts that you’ll have more success managing these projects if you promote a free flow of ideas throughout your organization. This is because innovation is inherently multidisciplinary, and departments from marketing to finance to engineering need each other’s expertise to create an effective product. Further, when you have a variety of different perspectives, higher-level executives in charge of overseeing the project’s direction can more effectively steer its strategy.)

How to Control AI: The 3 Steps to Take to Save Humanity

How to Control AI: The 3 Steps to Take to Save Humanity