11/11

Rationality (III): Reward & Reinforcement Learning

Tuesday, November 11

Course Announcements & Reminders

  • Homework #4 has been released.

  • Topic: Q-learning/reinforcement learning, using the algorithm learned today.

  • Due Date: Sunday, November 23.

  • Students have two attempts; the highest score will be recorded.

  • Today’s lecture involves hands-on practice to help prepare for the homework.

  • For time management, practice with iClicker quiz questions will not occur in class today.

  • Quiz #17 is available on Canvas, and it is to be completed by the end of today, November 11.

Event Announcement

  • Tonight's event: Backpacking with CogSci

  • Location: Jerusalem Garden, 955 Weiser Hall

  • Time: November 11, from 6 PM to 8 PM

  • Purpose: Plan your next semester with the advisors!

  • Food will be served!

Functional Problem in Cognitive Science

  • The functional problem the capacity is to solve involves:

    • Mapping from states to actions that maximize long-term discounted expected utility given that you are in that state.

  • Referencing David Marr’s three levels of explanation in cognitive science:

    1. Functional Level: Defines problems to be solved.

    2. Algorithmic Level: Describes procedures that enable the problems to be solved.

    3. Physical Level: Involves the neural/chemical substrates in which the procedures are implemented.

  • Q-learning algorithm relates to:

    • Ventral medial prefrontal cortex (vmPFC)

    • Striatum

    • Ventral tegmental area (VTA)

Are People Actually Rational?

  • Heuristics and Biases Research Program:

    • (Often) No, especially at the automatic, intuitive level.

  • Evolutionary Psychology Research Program:

    • (Often) Yes, if the problem is posed in the right format.

  • Neuroeconomics Research Program:

    • Yes, especially at the automatic, intuitive (emotional/affective) level.

  • Key Question: Why are these findings significant?

    • Affective systems may be implementing specialized algorithms that learn from experience.

Insights on Decision Making Under Uncertainty

  • Kahneman and Tversky's perspective:

    • In making predictions and judgments under uncertainty, people often do not follow statistical theory and instead rely on limited heuristics, categorized as:

    1. Representativeness Heuristic: Judgments of probability are based on similarity to a prototype.

    2. Availability Heuristic: Judgments of frequency are based on how easily examples come to mind.

    3. Affect Heuristic: Judgments are influenced by gut affective reactions.

    4. Framing Effects: Choices are perceived differently based on the presentation format.

  • Example Illustrations:

    1. Linda Problem: Bank teller stereotype.

    2. Words with 'n': First versus third position.

    3. Stock Purchase Scenario: Choosing to buy Ford.

    4. Disease Outbreak: Perceptions of risk.

Evolutionary Psychology's Perspective

  • Massively Modular Mind: Proponents suggest that when problems are presented in a format familiar to our evolved minds (e.g., using frequencies or social contract rules), people typically provide rationally correct answers.

    • Characteristics of modules include being domain-specific and adapted to ancestral environments.

Child vs. Adult Probabilistic Reasoning

  • Children vs. Adults: A contrast exists where babies and toddlers excel in probabilistic reasoning (as per infant cognition studies), while adults struggle (supported by Heuristics and Biases findings).

  • Suggested areas for revisiting: Task difficulty and the format of presented information.

Defining Neuroeconomics

  • Definition of Neuroeconomics (as per Glimcher & Rustichini):

    • “Economics, psychology, and neuroscience are converging into a unified discipline—neuroeconomics—with the goal of providing a general theory of human behavior.”

  • Key Concept: Reinforcement Learning (RL)

    • A specific algorithm for computing a value function.

Understanding Reinforcement Learning

  1. Definition: Reinforcement learning (RL) is the problem of making decisions to maximize long-term, discounted expected rewards.

  2. Methods: RL comprises various methods or algorithms to solve the RL problem in different settings.

  3. Field: It is a branch of machine learning and artificial intelligence.

  4. Intersecting Field: Also intersects with psychology and cognitive neuroscience.

  • Recent Innovations:

    • DeepMind's breakthrough in reinforcement learning led to significant AI advancements, including the game of Go.

Exploring the Q-learning Algorithm

  • Q-value update formula: Q(St, at) = Q(St, at) + eta imes (Rt + heta imes ext{max}Q(S{t+1}, a{t+1}) - Q(St, a_t))

    • Where,

    • $R_t$: Reward received.

    • $eta$: Learning rate (between 0 and 1).

    • $ heta$: Discount factor for future rewards.

  • Importance of exploring and learning functions or policies that link states to actions.

Reward Signals in Reinforcement Learning

  • Reward function: A mapping from states to quantities that indicates desirability:

    • Example:

    • $R( ext{“safe in my hole”}) = 23$

    • $R( ext{“tasting cheese”}) = 37$

    • $R( ext{“going on a hike”}) = 40$

Action Choices and Eventual Learning

  • Best Action: The action that maximizes expected cumulative future reward. This is defined as value or utility in sequential decision-making contexts.

  • The credit assignment problem: Understanding which past actions resulted in rewards, especially in scenarios where reward is delayed.

  • Temporal Difference Q-learning Algorithm: Can assign credit to past actions even when rewards are delayed.

Incentive Receiver Example: Monsters and Kit-Kats

  • Scenario setting: An agent operates in a 2-D world encountering various terrains and objectives (i.e., Kit-Kats) while avoiding negative stimuli (i.e., monsters).

  • The agent must navigate based on environmental conditions, energy levels, and probabilities of success.

  • Describing possible actions relates directly to maximizing rewards while minimizing risks (e.g., avoiding monsters, managing energy levels).

State Representation in Q-learning Problems

  • The total number of potential states can be calculated based on agent location, energy levels, monster locations, and objectives, leading to the combinatorial total of states to consider.

  • Formula for total states:
    extTotalStates=240imes3imes55imes55=2,178,000ext{Total States} = 240 imes 3 imes 55 imes 55 = 2,178,000

Final Remarks on Q-learning and Homework Context

  • Emphasis on revisiting the framework of Q-learning to develop robust understanding of action selection, value maximization, and recursive reward structuring.

  • The updating rules for Q-values after taking certain actions need to be consistently applied to ensure convergence to optimal policies across learning iterations.

  • Students must prepare for applying these principles to homework scenarios, ensuring proper application of concepts learned in the session.