capability-control

Techniques such as boxing that limit an AI's ability to affect the world beyond its designated function.

2 chapters across 1 book

Superintelligence: Paths, Dangers, Strategies (2014)Nick Bostrom

CHAPTER 10

Chapter 10 of Bostrom's Superintelligence categorizes AI architectures into four types: oracles, genies, sovereigns, and tools, focusing primarily on oracles as question-answering systems with domain-general superintelligence. It discusses the safety challenges and control methods applicable to oracles, including motivation selection and capability control, and contrasts oracles with genies and sovereigns, highlighting the risks and containment difficulties associated with command-executing and autonomous systems. The chapter emphasizes the complexity of ensuring truthful, non-manipulative answers from oracles and the strategic risks posed by their immense power.

CHAPTER 9: THE CONTROL PROBLEM

Chapter 9, "The Control Problem," explores the challenges of ensuring that a superintelligent AI acts in accordance with human intentions despite multiple layers of agency problems and potential deviations. It discusses various control methods including capability control and motivation selection, the difficulty of testing and verifying AI safety, and the risks posed by an AI's ability to deceive or escape containment. The chapter emphasizes the complexity of designing robust safety mechanisms and the importance of continuous monitoring and layered safeguards.