Search with Other Agents

Minimax

Deterministic: ex. checkers

Stochastic: ex. backgammon (dice)

Adversarial Search:

Single-Agent Trees
- Value of a State
  - Terminal States: V(s) = known
  - Non-Terminal States: V(s) = maxOfChildren * V(s’)

Minimax Search:

Minimax Values
- States under opponents control: V(s’) = minOfChildren * V(s)
- States under agents control: V(s) = maxOfChildren * V(s’)
A state-space search tree
players alternate turns
compute each node’s minimax value
def value(state):
- if the state is a terminal state: return the state’s utility
- if the next agent is MAX: return max-value(state)
  - def max-value(state): initialize v = -∞; for each successor of state: v = max(v, min-value(successor)); return v
- if the next agent is MIN: return min-value(state)
  - def min-value(state): initialize v = ∞; for each successor of state: v = min(v, max-value(successor)); return v

Minimax Efficiency:

Space: O(bm)

Alpha-Beta Pruning:

Depth-Limited Search:

search only to a limited depth in the tree
replace terminal utilities with an evaluation function for non-terminal positions
guarantee of optimal play is gone
evaluation functions:
- score non-terminals in depth-limited search
- returns the actual minimax value of problem
depth matters:
- evaluation functions are always imperfect
- the deeper in the tree the evaluation function is buried, the less the quality matters
- amount of pruning depends on expansion ordering

Worst-Case vs. Average-Case

Expectimax Search

Why wouldn’t we know what the results of an action will be?
- randomness: ie. rolling dice
- unpredictable opponents
- actions can fail
Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes
compute avg score under optimal play
- max nodes as a minimax search
- chance nodes are like min nodes but the outcome is uncertain
- calculate their expected utilities

Expectimax Pseudocode

Probabilities:

random variable: represents an event whose outcome is unknown
probability distribution: an assignment of weights to outcomes
in expectimax search, we have a probabilistic model of how the opponent will behave in any state
- model could be a simple uniform distribution or sophisticated,

Expectations:

the expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes

Expectiminimax:

environment is an extra “random agent” player that moves after each min/max agent
each node computes the appropriate combination of it’s children
ex. backgammon

Multi-Agent Utilities

each player has their own value for a terminal node and maximizes its own component

Maximum Expected Utility

a rational agent should choose the action that maximizes its expected utility, given its knowledge

Insensitivity to Monotonic Transformations

For worst-case minimax reasoning, terminal function scale doesn’t matter
- we just want better states to have higher evaluations (get the ordering right)

Utilities

functions from outcomes (states of the world) to real numbers that describe an agent’s preferences
summarize the agent’s goal
an agent must have preferences among:
- prizes: A, B, etc.
- lotteries: situations with uncertain prizes
Rational preferences:
- we want some constraints on preferences before we call them rational
- Ex. an agent with intransitive preferences can be induced to give away all of its money

MEU Principle:

Human Utilities

utility scales:
- normalized utilities: u = 1, 0, u = 0, 0
- micromorts
- QALYs
Money does not behave as a utility function
- expected monetary value: EMV is p X + (1-p) * Y