The Need for Indirect Normativity

Choosing the Criteria for Choosing

  • Asking the critical question of which final value to install in a seed AI.

  • Recognizing the far-reaching consequences of this decision.

  • Basic parameter choices related to decision theory and epistemology are consequential.

  • Human limitations (foolishness, ignorance, narrow-mindedness) questioned in making design decisions.

  • Introduction of indirect normativity as a cognitive offloading mechanism onto superintelligence while anchoring in deeper human values.

The Need for Indirect Normativity

  • Addressing two key questions:

    1. How can we get a superintelligence to do what we want?

    2. What do we want the superintelligence to want?

  • Initial focus on mechanisms to control superintelligence (first question) shifts to deciding values to pursue (second question).

  • Speculation on installing values as final goals of AI carries risks of locking in flawed moral beliefs.

The Problem of Selecting Values

  • Emphasizing the difficulty in selecting a final value, considering the global existential impacts.

  • Mistakes in moral judgments could reflect current biases and impede ethical progress.

  • Historical perspective reveals changes in moral beliefs suggesting that contemporary views might also be flawed.

  • Over-reliance on current convictions for selecting final value can lead to existential moral calamity.

  • Discussion of the complexity in seemingly simple moral theories (e.g., hedonism) raises more questions than answers.

    • Hedonism posits pleasure as valuable; complexity in assessing types of pleasures, their intensity, duration, moral relevance, etc.

Indirect Normativity and Its Applications

  • Indirect normativity proposed as a strategy to delegate cognitive work for value selection to a superintelligence.

  • Introduction of the principle of epistemic deference:

    • Acknowledging the AI’s superior epistemic capabilities, leading to a more accurate understanding of values compared to humans.

  • Value selection becomes abstract, allowing for the AI to interpret better standards based on this abstract foundation.

Coherent Extrapolated Volition (CEV)

  • Proposal by Eliezer Yudkowsky focusing on enabling a seed AI to implement humanity's CEV.

  • Definition: "Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted."

  • Purpose of CEV not to create a strict moral theory but to approximate what holds ultimate value.

Empirical Challenges and Analogues

  • Ethical precedents like ideal observer theories and managing moral theories without hardcoding flaws.

  • Importance of assessing wishes based on convergence, coherence, social interactions, and remembering human errors in decision making.

Rationales for CEV

  • Seven arguments supporting CEV's advantages over more prescriptive moral theories:

    • Encapsulate Moral Growth: Accepts the possibility of moral evolution.

    • Avoiding Hijacking Human Destiny: Distributes influence among humanity rather than a select few.

    • Avoiding Conflicts: Ensures broader support for shared visions among diverse ethical perspectives.

    • Keeping Humans in Charge: Facilitates human autonomy, allows flexible development of future governance systems.

Implementation of CEV Challenges

  • Defining the extrapolation base (who's included in humanity) and its ethical complexities.

  • Historical and philosophical nuances complicate who qualifies as part of the extrapolated volition.

  • Debate over safeguarding interests of those included vs. excluded from the determination process.

Morality Models as Alternatives

  • Introduction of alternative approaches like striving for an AI that understands moral rightness or pursuing moral permissibility coinciding with CEV preferences.

  • Risks of each model are discussed: potential sacrificial loss due to strict moral adherence vs. flexibilities of human ideals.

Proposals for AI Decision-Making

  • Exploration of applying indirect normativity to define decision theories and epistemology for AI systems.

  • Discussing various decision theories (causal, evidential, updateless) and the implications of flawed choices.

  • Importance of designing AI with adaptive, sound decision-making processes alongside epistemology meant to mirror human cognitive principles.

Ratification and Oversight

  • Consideration of whether AI plans should undergo human review before implementation.

  • Risks associated with pre-evaluating outcomes, alongside benefits of ensuring catastrophic errors are minimized.

Final Thoughts

  • Emphasizing the importance of getting foundational AI development aspects closely aligned with human ideals.

  • Encouraging designs to embrace reliability and adaptability rather than solely optimization.

  • Acknowledgement that minor imperfections can be corrected by a superintelligence through experience, reinforcing its beneficial capabilities in the long term.