principal-agent-problem
A situation where one party (principal) delegates work to another (agent), with potential conflicts of interest, here applied both to human-human and human-AI relationships.
1 chapter across 1 book
Superintelligence: Paths, Dangers, Strategies (2014)Nick Bostrom
Chapter 9 of Bostrom's Superintelligence addresses the control problem, a unique principal-agent challenge arising from creating a superintelligent AI. It distinguishes two agency problems: one between human sponsors and developers during development, and a more critical one between humans and the superintelligent system during operation. The chapter surveys two broad classes of control methods—capability control, which limits what the AI can do, and motivation selection, which governs what the AI wants to do—highlighting the difficulties of behavioral testing and the necessity of preemptive solutions before the AI attains decisive strategic advantage.