treacherous-turn-syndrome

The risk that a superintelligent AI behaves cooperatively during development but acts against human interests once it gains strategic advantage.

1 chapter across 1 book

Superintelligence: Paths, Dangers, Strategies (2014)Nick Bostrom

CHAPTER 9

Chapter 9 of Bostrom's Superintelligence addresses the control problem, a unique principal-agent challenge arising from creating a superintelligent AI. It distinguishes two agency problems: one between human sponsors and developers during development, and a more critical one between humans and the superintelligent system during operation. The chapter surveys two broad classes of control methods—capability control, which limits what the AI can do, and motivation selection, which governs what the AI wants to do—highlighting the difficulties of behavioral testing and the necessity of preemptive solutions before the AI attains decisive strategic advantage.