ST

Lecture 31 Making Rational Decisions

Basics of Decision Theory

  • Decision theory is a field that uses utility functions to model how people make rational decisions. It assumes that when faced with choices, people will act in a way to maximize their personal satisfaction or 'utility'.

  • This lecture introduces some theoretical concepts that are essential for understanding sequential decision making, where decisions are not made in isolation but as part of a series of choices over time.

  • Sequential decision making involves optimizing utility across a sequence of decisions, often using Markov decision processes (MDPs) to model the problem.

  • Markov decision processes are mathematical frameworks used to model decision-making in situations where outcomes are uncertain and partly under the control of a decision-maker. These processes occur in uncertain environments, which are expressed with probabilities.

  • Solving Markov decision processes means identifying the best sequence of actions that will maximize the expected utility over time, considering the uncertainties of the environment.

  • The main conceptual difference between basic search algorithms (like A* search) and Markov decision processes is that MDPs incorporate uncertainty into the decision-making process.

  • Reinforcement Learning: Imagine you strip away the known elements from a Markov decision process, specifically the probabilities of different outcomes (transition model) and the immediate rewards gained from actions (reward function). What you're left with is a reinforcement learning problem, where an agent learns to make decisions through trial and error, receiving feedback on its actions but without a clear model of the environment.

  • The theoretical justification for using utility functions was partly developed by John von Neumann, a brilliant mathematician and physicist who also designed the Von Neumann architecture, which is the foundation of modern computers.

Making Rational Decisions

  • When decisions are made in uncertain situations, each choice can be seen as a lottery where the outcomes are not guaranteed.

  • Rational decisions can be systematically approached using the concept of lotteries combined with a set of logical rules or axioms.

  • The mathematical groundwork for this approach is provided by the Neumann-Morgenstern utility theorem, which explains how rational individuals make decisions when faced with risky choices.

  • This theorem leads to the creation of utility functions, which assign a numerical value to different outcomes, and the maximum expected utility principle, which guides decision-making.

  • Maximum Expected Utility Principle: The core idea is that the most rational choice is the one that maximizes expected utility. This is calculated by considering all possible outcomes, their associated probabilities, and their utility values, then choosing the option with the highest overall expected value.

  • This lecture will also explore methods for determining or 'eliciting' utility functions, particularly in situations where making decisions is complex, such as in medical treatment choices.

  • Understanding normalized lotteries can simplify the process of eliciting utility functions from people, making it easier to quantify preferences.

Lotteries and Representation

  • Each action taken can be thought of as a lottery with various possible outcomes. For instance, if there are two outcomes, A and B, with chances (probabilities) of p and (1-p), the lottery is written as [p: A, (1-p): B].

  • When there are multiple potential outcomes, the representation expands to [p1: A, p2: B, …], ensuring all probabilities add up to 1.

Preferences of an Agent

  • To use utility functions to model rational behavior, it's essential to understand and represent an agent's preferences.

  • Notation:

    • A \succ B: The agent prefers outcome A over outcome B.

    • A \sim B: The agent is indifferent between outcomes A and B; they provide equal satisfaction.

    • B \prec A: The agent prefers outcome A over outcome B.

    • A \succeq B: The agent does not prefer B to A; the agent might also be indifferent between them.

Axioms of Utility Theory

  • Axioms are fundamental rules that describe rational behavior. The von Neumann-Morgenstern utility theory relies on these axioms to ensure consistency in decision-making.

  • Six key axioms:

    1. Completeness: When faced with any two outcomes, A and B, an agent must either prefer A over B, prefer B over A, or be indifferent between them. One of these must be true: A \succ B, B \succ A, or A \sim B.

    2. Transitivity: If an agent prefers A over B and B over C, then they must prefer A over C. Written as: if A \succ B and B \succ C, then A \succ C.

    3. Continuity: If A is preferred over B, and B is preferred over C, there exists a probability p such that the agent is indifferent between receiving B for sure and participating in a lottery where outcomes are A (with probability p) or C (with probability (1-p)). Expressed as: if A \succ B \succ C, then there exists a p such that B \sim [p: A, (1-p): C].

    4. Substitutability: If an agent is indifferent between outcomes A and B, then A can replace B in any lottery without affecting the agent's preferences. If A \sim B, the agent will have the same preferences between lotteries containing either A or B.

    5. Monotonicity: If A is preferred over B, then a lottery with a higher chance of winning A is preferred over one with a lower chance of winning A. If A \succ B, then p > q implies [p: A, (1-p): B] \succ [q: A, (1-q): B], and vice versa.

    6. Decomposability: A lottery that includes another lottery as one of its outcomes can be simplified into a single lottery by combining the probabilities.

Violation of Axioms

  • Not following these axioms can lead to choices that are not rational. For example, violating transitivity can result in a 'money pump' situation where an agent continuously loses money.

  • Example: Suppose an agent prefers A to B, B to C, and C to A. They would pay to switch from A to C, then from C to B, and finally from B back to A, losing money with each transaction.

Relationship to Utility Functions

  • If an agent's decisions align with the axioms of utility theory, it implies that a utility function exists that can accurately represent the agent's rational behavior.

  • If A \succ B, then U(A) > U(B), and if A \sim B, then U(A) = U(B).

  • The utility of a lottery with N outcomes is the sum of each outcome's utility multiplied by its probability: U(Lottery) = \sum
    {i=1}^{N} P(O*i) * U(O_i).

  • In this equation, Oi represents each possible outcome, and U(Oi) is the utility function value associated with that outcome.

Utility Function Properties

  • Utility functions are not unique; if U is a valid utility function, then U' = aU + b is also valid, where a and b are constants. This means you can scale and shift a utility function without changing the behavior it predicts.

  • It’s possible for behavior to be rational even if a person doesn't consciously use utility functions.

  • Utility functions can be used as a tool to describe and predict rational behavior.

Decision Making with Utility Functions

  • To make decisions, calculate the expected utility for each possible action a using the formula: EU(a) = \sum_{s} P(s|a) * U(s), where P(s|a) is the probability of outcome s given action a, and U(s) is the utility of outcome s.

  • Select the action that gives the highest expected utility: a^* = \text{argmax}_a EU(a).

  • This method requires a utility function that respects the axioms of utility theory and a reliable model that provides accurate probabilities.

Eliciting a Utility Function

  • Utility functions are typically derived from individuals or experts by asking them questions about their preferences.

  • To simplify the process, utility values are often normalized to range between 0 and 1. This can be done without loss of generality because utility functions are not unique and can be linearly transformed.

  • The formula for linear transformation is: U{\text{normalized}} = \frac{U - U{\text{min}}}{U{\text{max}} - U{\text{min}}}.

  • Start by identifying the worst possible outcome (S{\text{worst}}) and assign it a utility of 0, then find the best possible outcome (S{\text{best}}) and give it a utility of 1.

  • For any outcome s for which you want to determine the utility, create a standard lottery that involves only S{\text{best}} and S{\text{worst}}. Adjust the probability p of achieving S_{\text{best}} until the individual is indifferent between receiving s for sure and participating in the lottery.

  • At this point, the utility of s is equal to p.

  • If the person is indifferent between getting s or entering the lottery, then U(s) = U(Lottery).

  • Using the probabilities and utilities: U(Lottery) = pU(S{\text{best}}) + (1-p)U(S{\text{worst}}).

  • Substituting the values: U(Lottery) = p1 + (1-p)0 .

  • Simplifying, we find: U(Lottery) = p .

  • Therefore, U(s) = p .

Utility of Money

  • The actual monetary value of something often does not serve as a good utility function because people's subjective satisfaction does not increase linearly with money.

  • The expected monetary value (EMV) of a lottery is calculated as: EMV = p * \text{MonetaryValue}(A) + (1-p) * \text{MonetaryValue}(B).

  • Often, the utility of participating in a lottery is less than the utility of receiving the expected monetary value for certain, reflecting risk aversion.

  • Utility functions for money usually show a non-linear relationship, indicating that the psychological value of money changes as the amount increases.

  • Generally, when dealing with large sums of money, people tend to prefer a guaranteed smaller amount over a risky chance at a larger amount, demonstrating risk aversion.

  • When it comes to losses, people may become less sensitive to increasingly large losses after a certain threshold, showing that the impact of money is not uniform across positive and negative values.