Search with Other Agents
Minimax
Deterministic: ex. checkers
States: s
Players: P = {1…N}
Solution for a player is a Policy:
Policy: S → A
Actions: A
Transition Function: SxA → S
Terminal Test: S → {t, f}
Terminal Utilities: SxP → R
Stochastic: ex. backgammon (dice)
Adversarial Search:
Single-Agent Trees
Value of a State
Terminal States: V(s) = known
Non-Terminal States: V(s) = maxOfChildren * V(s’)
Minimax Search:
Minimax Values
States under opponents control: V(s’) = minOfChildren * V(s)
States under agents control: V(s) = maxOfChildren * V(s’)
A state-space search tree
players alternate turns
compute each node’s minimax value
def value(state):
if the state is a terminal state: return the state’s utility
if the next agent is MAX: return max-value(state)
def max-value(state): initialize v = -∞; for each successor of state: v = max(v, min-value(successor)); return v
if the next agent is MIN: return min-value(state)
def min-value(state): initialize v = ∞; for each successor of state: v = min(v, max-value(successor)); return v
Minimax Efficiency:
Time: O(bm)
Space: O(bm)
Alpha-Beta Pruning:
General configuration
computing the Min-value at some node n
looping over n’s children
n’s estimate of the childrens min is dropping
MAX cares about n’s value
let a be the best value that max can get at any choice point along the current path from the root
if n becomes worse than a, max will avoid it, so we can stop considering n’s other children
Max version is symmetric
def max-value(state, α, β):
initialize v = -∞
for each successor of state:
v = max(v, value(successor, α, β))
if v ≥ β: return v
α = max(α, v)
return v
def min-value(state, α, β):
initialize v = ∞
for each successor of state:
v = min(v, value(successor, α, β))
if v ≤ α: return v
β = min(β, v)
return v
Has no effect on minimax value computed at the root
Value of intermediate nodes might be wrong
Depth-Limited Search:
search only to a limited depth in the tree
replace terminal utilities with an evaluation function for non-terminal positions
guarantee of optimal play is gone
evaluation functions:
score non-terminals in depth-limited search
returns the actual minimax value of problem
depth matters:
evaluation functions are always imperfect
the deeper in the tree the evaluation function is buried, the less the quality matters
amount of pruning depends on expansion ordering
Expectimax
Worst-Case vs. Average-Case
uncertain outcomes controlled by chance
Expectimax Search
Why wouldn’t we know what the results of an action will be?
randomness: ie. rolling dice
unpredictable opponents
actions can fail
Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes
compute avg score under optimal play
max nodes as a minimax search
chance nodes are like min nodes but the outcome is uncertain
calculate their expected utilities
Expectimax Pseudocode
def value(state)
if the state is a terminal state: return the state’s utility
if the next agent is MAX: return max-value(state):
def max-value(state)
initialize v = -∞
for each successor of state
v = max(v, min-value(successor))
return v
if the next agent is EXP: return exp-value(state)
def exp-value(state)
initialize v = 0
for each successor of state
p = probability(successor)
v += p * value(successor)
return v
Probabilities:
random variable: represents an event whose outcome is unknown
probability distribution: an assignment of weights to outcomes
in expectimax search, we have a probabilistic model of how the opponent will behave in any state
model could be a simple uniform distribution or sophisticated,
Expectations:
the expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes
Expectiminimax:
environment is an extra “random agent” player that moves after each min/max agent
each node computes the appropriate combination of it’s children
ex. backgammon
Utilities
Multi-Agent Utilities
each player has their own value for a terminal node and maximizes its own component
Maximum Expected Utility
a rational agent should choose the action that maximizes its expected utility, given its knowledge
Insensitivity to Monotonic Transformations
For worst-case minimax reasoning, terminal function scale doesn’t matter
we just want better states to have higher evaluations (get the ordering right)
Utilities
functions from outcomes (states of the world) to real numbers that describe an agent’s preferences
summarize the agent’s goal
an agent must have preferences among:
prizes: A, B, etc.
lotteries: situations with uncertain prizes
Rational preferences:
we want some constraints on preferences before we call them rational
Ex. an agent with intransitive preferences can be induced to give away all of its money

MEU Principle:
choose the action that maximizes the expected utility
formula?
Human Utilities
utility scales:
normalized utilities: u = 1, 0, u = 0, 0
micromorts
QALYs
Money does not behave as a utility function
expected monetary value: EMV is p X + (1-p) * Y