Untitled Notes

Asking the critical question of which final value to install in a seed AI.
Recognizing the far-reaching consequences of this decision.
Basic parameter choices related to decision theory and epistemology are consequential.
Human limitations (foolishness, ignorance, narrow-mindedness) questioned in making design decisions.
Introduction of indirect normativity as a cognitive offloading mechanism onto superintelligence while anchoring in deeper human values.

Addressing two key questions:
1. How can we get a superintelligence to do what we want?
2. What do we want the superintelligence to want?
Initial focus on mechanisms to control superintelligence (first question) shifts to deciding values to pursue (second question).
Speculation on installing values as final goals of AI carries risks of locking in flawed moral beliefs.

Emphasizing the difficulty in selecting a final value, considering the global existential impacts.
Mistakes in moral judgments could reflect current biases and impede ethical progress.
Historical perspective reveals changes in moral beliefs suggesting that contemporary views might also be flawed.
Over-reliance on current convictions for selecting final value can lead to existential moral calamity.
Discussion of the complexity in seemingly simple moral theories (e.g., hedonism) raises more questions than answers.
- Hedonism posits pleasure as valuable; complexity in assessing types of pleasures, their intensity, duration, moral relevance, etc.

Indirect normativity proposed as a strategy to delegate cognitive work for value selection to a superintelligence.
Introduction of the principle of epistemic deference:
- Acknowledging the AI’s superior epistemic capabilities, leading to a more accurate understanding of values compared to humans.
Value selection becomes abstract, allowing for the AI to interpret better standards based on this abstract foundation.

Proposal by Eliezer Yudkowsky focusing on enabling a seed AI to implement humanity's CEV.
Definition: "Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted."
Purpose of CEV not to create a strict moral theory but to approximate what holds ultimate value.

Ethical precedents like ideal observer theories and managing moral theories without hardcoding flaws.
Importance of assessing wishes based on convergence, coherence, social interactions, and remembering human errors in decision making.

Seven arguments supporting CEV's advantages over more prescriptive moral theories:
- Encapsulate Moral Growth: Accepts the possibility of moral evolution.
- Avoiding Hijacking Human Destiny: Distributes influence among humanity rather than a select few.
- Avoiding Conflicts: Ensures broader support for shared visions among diverse ethical perspectives.
- Keeping Humans in Charge: Facilitates human autonomy, allows flexible development of future governance systems.

Defining the extrapolation base (who's included in humanity) and its ethical complexities.
Historical and philosophical nuances complicate who qualifies as part of the extrapolated volition.
Debate over safeguarding interests of those included vs. excluded from the determination process.

Introduction of alternative approaches like striving for an AI that understands moral rightness or pursuing moral permissibility coinciding with CEV preferences.
Risks of each model are discussed: potential sacrificial loss due to strict moral adherence vs. flexibilities of human ideals.

Exploration of applying indirect normativity to define decision theories and epistemology for AI systems.
Discussing various decision theories (causal, evidential, updateless) and the implications of flawed choices.
Importance of designing AI with adaptive, sound decision-making processes alongside epistemology meant to mirror human cognitive principles.

Consideration of whether AI plans should undergo human review before implementation.
Risks associated with pre-evaluating outcomes, alongside benefits of ensuring catastrophic errors are minimized.

Emphasizing the importance of getting foundational AI development aspects closely aligned with human ideals.
Encouraging designs to embrace reliability and adaptability rather than solely optimization.
Acknowledgement that minor imperfections can be corrected by a superintelligence through experience, reinforcing its beneficial capabilities in the long term.