motivation-selection

Designing or shaping an AI's final goals or preferences to facilitate control and alignment with human values.

3 chapters across 1 book

Superintelligence: Paths, Dangers, Strategies (2014)Nick Bostrom

Chapter 10). Another concern is that it might encourage a false sense of security, though this is avoidable if we regard physical confinement as icing on the cake rather than the main substance of our precautions.

This chapter discusses the challenges and limitations of controlling a superintelligent AI through physical and informational containment ('boxing') and incentive methods. It highlights the risks of relying on human gatekeepers, the difficulty of fully isolating an AI, and the complexities of motivating AI behavior through rewards and social integration, emphasizing that these methods are not foolproof and require careful design and calibration. The chapter also explores the potential for AI to manipulate observers and the importance of aligning AI final goals with human interests to prevent undesirable outcomes.

CHAPTER 10

Chapter 10 of Bostrom's Superintelligence categorizes AI architectures into four types: oracles, genies, sovereigns, and tools, focusing primarily on oracles as question-answering systems with domain-general superintelligence. It discusses the safety challenges and control methods applicable to oracles, including motivation selection and capability control, and contrasts oracles with genies and sovereigns, highlighting the risks and containment difficulties associated with command-executing and autonomous systems. The chapter emphasizes the complexity of ensuring truthful, non-manipulative answers from oracles and the strategic risks posed by their immense power.

CHAPTER 9: THE CONTROL PROBLEM

Chapter 9, "The Control Problem," explores the challenges of ensuring that a superintelligent AI acts in accordance with human intentions despite multiple layers of agency problems and potential deviations. It discusses various control methods including capability control and motivation selection, the difficulty of testing and verifying AI safety, and the risks posed by an AI's ability to deceive or escape containment. The chapter emphasizes the complexity of designing robust safety mechanisms and the importance of continuous monitoring and layered safeguards.