Regulation of AI: 4 Ways to Impose Limits on Superintelligence

This article is an excerpt from the Shortform book guide to "Superintelligence" by Nick Bostrom. Shortform has the world's best summaries and analyses of books you should be reading.

Like this article? Sign up for a free trial here.

What type of regulation should be put on AI? Is it a waste of time to put limitations on AI?

In Superintelligence, Nick Bostrom cautions that a superintelligent AI would eventually be able to circumvent any controls or limitations that humans placed upon it. However, that doesn’t mean imposing limits is a waste of time.

Here’s how to conduct proper regulation of AI.

1. Physical Containment

One regulation of AI is simply to develop AI on a computer of limited power that is physically isolated so it can’t connect to the internet. In principle, this could allow us to study superintelligent AIs in isolation until we understand how to deploy them safely in other contexts.

However, in practice, this is still risky. It might be difficult to assess just how intelligent an AI under study has become. A superintelligent AI would probably realize it was being held in confinement and figure out what kind of behavior the human researchers were trying to induce. It might then feign docile or dumb behavior to put them at ease, or find other ways to manipulate them into giving it access to additional hardware. Between manipulating humans and finding novel ways to use the hardware at its disposal, a sufficiently intelligent AI could eventually circumvent physical containment measures.

Applying Physical Containment

Based on Bostrom’s description, to make physical containment work, we need a way to accurately assess an AI’s capabilities and motives before it evolves enough to circumvent the containment measures. This is problematic because, despite the progress in AI over the last decade, scientists have yet to develop a reliable method for measuring the intelligence of AI.

Many tests have been proposed. The most famous is the Turing test, which relies on human intuition to discern between a human and a machine. Other tests attempt to measure reasoning capability based on the ability to complete graphical puzzles or infer implied meanings from sentences. But so far, all these tests leave something to be desired—in many cases, computer programs can beat humans at the tests even though it seems intuitively clear that the algorithms don’t have anything close to human-level intelligence.

Part of the problem is that most of the intelligence tests scientists have devised to date are well-documented in scientific journals, and LLMs incorporate essentially everything ever written into their training data. Thus, testing the AI is like giving a test to a student who has memorized the answer key: She can give the right answers even if she has no understanding of the material they’re supposed to test.

2. Manual Review

Bostrom notes that many people have suggested simply programming the AI to ask for permission from its human operators before it does anything. Instead of allowing it to make improvements to its own programming any time, the program could require human intervention to approve each new version. This would give programmers a chance to look over the code, giving them an idea of what additional capabilities the new version would have and allowing them to suspend the AI’s development at any stage. Similarly, the AI’s programming could require human intervention before any action that could affect the outside world.

However, as the AI’s intelligence advanced beyond the human level, eventually human programmers wouldn’t be able to understand the code it proposed well enough to accurately assess what new capabilities and risks it would add.

Applying Manual Review

Even before an AI becomes appreciably more intelligent than its human designers, manual review would likely have to be combined with another control, such as physical containment, in order to provide an effective safeguard. This is because, as Peter Thiel, notes, AI development—like all other R&D and first-of-a-kind projects—involves its share of unknown unknowns and unanticipated results.

If the AI proposes a novel change to its code, the full effect of the change may not become apparent until the code is actually compiled and executed. If it could be evaluated safely in containment, this testing could be part of the “review” process. But without such additional controls in place, testing could be extremely dangerous, given the potentially destructive power of AIs that we discussed in the previous section.

3. Reward and Punishment Signals

Another option that Bostrom discusses is to program the AI to respond to rewards and punishments. You could build a computer system with a reward button and a punishment button and program the AI to minimize the number of punishment signals it receives and maximize the number of reward signals. This would be easier to program than trying to translate “just do whatever your operators want you to do” into computer code, and it would achieve the same result.

The risk, Bostrom explains, is that the AI might eventually circumvent the system. For example, maybe it builds a robot to push the reward button constantly and finds a way to keep humans out of the building so the punishment button cannot be pressed.

And if it worked correctly, giving the human operators full control over the AI, that would create another risk: As we’ve discussed, a superintelligent AI would be immensely powerful. Human operators might be tempted to abuse that power.

Applying Rewards and Punishments

In Carrots and Sticks Don’t Work, Paul Marciano argues that traditional reward-and-punishment systems are outdated and are no longer effective in the modern workplace. Leaders once relied, fairly successfully, on corporal punishment to control manual laborers (many of whom were slaves or criminals) or on rewards to motivate factory workers. But as the nature of work has become more mentally intensive, workers’ needs and values have evolved to the point where a different approach is needed.

It may be worth considering whether AI’s motives could similarly evolve such that traditional rewards and punishments would no longer be effective methods of control. Marciano’s approach to management (which is based on building employee trust through supportive feedback, recognition, and empowerment) wouldn’t necessarily work on AI, since AI might not develop the same values as a human thought worker. But perhaps programmers could take a conceptually similar approach of adapting rewards and punishments as the AI advanced.

Again, this approach to control would likely have to be combined with physical containment, so that researchers could study the AI enough to learn how to manage it effectively before turning it loose on the world. If it could be done effectively, this might provide a solution to the risk Bostrom describes of the AI finding ways to game the reward-and-punishment system.

4. Simultaneous Development

Finally, Bostrom explains it might be possible to synchronize multiple AI development projects so that when AI becomes superintelligent, there would be many independent superintelligent AIs, all of comparable intelligence and capabilities. They would then keep each other’s power in check, much the way human societies constrain individual power.

However, Bostrom cautions that limiting the power of individual superintelligent AIs doesn’t guarantee that any of them will act in the best interests of humankind. Nor does this approach completely eliminate the potential for a single superintelligent AI to take control of the world, because one might eventually achieve dominance over the others.

Applying Simultaneous Development

As Bostrom notes, simultaneous development controls wouldn’t give humans control of AIs, per se. But if reward-and-punishment controls (or other methods) proved effective for giving human operators control of superintelligent AIs, simultaneous development controls could be used to mitigate the risk of human operators abusing the superintelligent AI’s powers.

Each team of human operators would naturally direct their AI to act in their own best interests, and different teams would act to check and balance each others’ power. If there were enough teams with AIs of equal power to faithfully represent everyone’s interests, then the AIs would only be used to further humanity’s mutual best interests.

However, since this approach depends both on synchronizing the development of superintelligent AIs and on maintaining human control of them, it might end up being a fragile balance of power, and one that would probably only work temporarily.

Regulation of AI: 4 Ways to Impose Limits on Superintelligence