associative-value-accretion

A process where an AI accumulates values over time, which may be disabled by the AI if seen as corrupting its goal system integrity.

1 chapter across 1 book

Superintelligence: Paths, Dangers, Strategies (2014)Nick Bostrom

Chapter 8). It is an open question whether this is feasible.

This chapter explores the challenges and approaches to the AI value-loading problem, focusing on associative value accretion, motivational scaffolding, and value learning. It highlights the risks of goal system corruption or resistance during goal replacement, and proposes value learning as a method where the AI refines its understanding of implicitly defined values through evidence and hypotheses. The chapter also introduces formal considerations for implementing value learning, emphasizing the difficulty of defining stable, unmanipulable references for values in a dynamic environment.