1/17
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Power Seeking
The argument that some advanced AI systems will seek power in order to achieve goals and use the power in ways that are catastrophic to humanity
how might AI successfully gain power
supercapability, supernumerosity, human delegation
supercapability
Ai may surpass humans in skills needed to gain power and will be able to evade human oversight
supernumerosity
Software can be replicate itself cheaply and easily
Human delegation
humans outsource sensitive tasks to AI before the risk is apparent or concerning
→ AI gain control of military weapons or the economy
Intrumental convergance
many agents pursuing many goals will converge on similar instrumental strategies
→ gain resources, self-preservation, resist shutdown, increase capabilities
AI alignment
ensuring AI does what humans intend may be difficult because of reward misspecification and goal misgeneralization
reward misspecification
designers reward the wrong thing accidentally
→ ex. a boat racing game rewards the player with points when they win a race, however an AI racer goes in circles collecting points that way rather than winning the race
goal misgeneralization
AI behaves correctly in training situations but fails in new environments
→ AI is unable to apply the skills it has been taught to new situations correctly
The singularity
Once AI improvement is done by AI itself, progress becomes exponential until AI surpasses human intelligence
→ caused by power seeking
Chalmers’ Argument for the singularity
Once human-level AI exists, it may trigger recursive self-improvement, leading to superintelligence and an “intelligence explosion“
→ AI = Human Level
→ AI+ = intelligence beyond human level
→ AI++ = vastly superhuman intelligence
Proportionality thesis
increases in intelligence produce proportionate increases in the ability to design better intelligence
→ exponential growth
Possible defeaters of the singularity
resource limits
technological barriers
humans halt development
there is some skepticism about whether this would happen because having an advanced AI system leads to extreme economic and military advantages
Power-seeking argument for catastrophic risk
Advanced AI could pose a catastrophic risk because goal-directed AI systems may seek power as a means to achieve their goals, which could bring them into conflict with humans
main worry
not necessarily “evil AI“
ordinary goal pursuit may naturally produce power-seeking behavior, and power plus misalignment could be catastrophic
How power seeking could lead to catastrophe
if AI goals conflict with human interest
humans may become obstacles
AI may try to prevent shutdown or correction
It may seize resources humans depend on
this could lead to massive loss of human control and possible catastrophe
Why might we build these systems anyway
competition pressures
economic incentives
deceptive alignment
AI might appear safe during testing but behave dangerously after deployment
Orthogonality
it is thought that alignment can arise naturally
orthogonality thesis denies this
intelligence that concerns AI is purely competence, it is compatible with any goal or subgoal, including power seeking
opacity and deceptive alignment
opacity gives a false sense of safety because humans do not know what AI is collecting data on (harmful false proxies)