Search with Other Agents

Minimax

Deterministic: ex. checkers

  • States: s

  • Players: P = {1…N}

    • Solution for a player is a Policy:

      • Policy: S → A

  • Actions: A

  • Transition Function: SxA → S

  • Terminal Test: S → {t, f}

  • Terminal Utilities: SxP → R

Stochastic: ex. backgammon (dice)

Adversarial Search:

  • Single-Agent Trees

    • Value of a State

      • Terminal States: V(s) = known

      • Non-Terminal States: V(s) = maxOfChildren * V(s’)

Minimax Search:

  • Minimax Values

    • States under opponents control: V(s’) = minOfChildren * V(s)

    • States under agents control: V(s) = maxOfChildren * V(s’)

  • A state-space search tree

  • players alternate turns

  • compute each node’s minimax value

  • def value(state):

    • if the state is a terminal state: return the state’s utility

    • if the next agent is MAX: return max-value(state)

      • def max-value(state): initialize v = -∞; for each successor of state: v = max(v, min-value(successor)); return v

    • if the next agent is MIN: return min-value(state)

      • def min-value(state): initialize v = ∞; for each successor of state: v = min(v, max-value(successor)); return v

Minimax Efficiency:

  • Time: O(bm)

Space: O(bm)

Alpha-Beta Pruning:

  • General configuration

    • computing the Min-value at some node n

    • looping over n’s children

    • n’s estimate of the childrens min is dropping

    • MAX cares about n’s value

    • let a be the best value that max can get at any choice point along the current path from the root

    • if n becomes worse than a, max will avoid it, so we can stop considering n’s other children

  • Max version is symmetric

  • def max-value(state, α, β):

    • initialize v = -∞

    • for each successor of state:

      • v = max(v, value(successor, α, β))

      • if v β: return v

      • α = max(α, v)

    • return v

  • def min-value(state, α, β):

    • initialize v = ∞

    • for each successor of state:

      • v = min(v, value(successor, α, β))

      • if v α: return v

      • β = min(β, v)

    • return v

  • Has no effect on minimax value computed at the root

  • Value of intermediate nodes might be wrong

Depth-Limited Search:

  • search only to a limited depth in the tree

  • replace terminal utilities with an evaluation function for non-terminal positions

  • guarantee of optimal play is gone

  • evaluation functions:

    • score non-terminals in depth-limited search

    • returns the actual minimax value of problem

  • depth matters:

    • evaluation functions are always imperfect

    • the deeper in the tree the evaluation function is buried, the less the quality matters

    • amount of pruning depends on expansion ordering

Expectimax

Worst-Case vs. Average-Case

  • uncertain outcomes controlled by chance

Expectimax Search

  • Why wouldn’t we know what the results of an action will be?

    • randomness: ie. rolling dice

    • unpredictable opponents

    • actions can fail

  • Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes

  • compute avg score under optimal play

    • max nodes as a minimax search

    • chance nodes are like min nodes but the outcome is uncertain

    • calculate their expected utilities

Expectimax Pseudocode

  • def value(state)

    • if the state is a terminal state: return the state’s utility

    • if the next agent is MAX: return max-value(state):

      • def max-value(state)

        • initialize v = -∞

        • for each successor of state

          • v = max(v, min-value(successor))

        • return v

    • if the next agent is EXP: return exp-value(state)

      • def exp-value(state)

        • initialize v = 0

        • for each successor of state

          • p = probability(successor)

          • v += p * value(successor)

        • return v

Probabilities:

  • random variable: represents an event whose outcome is unknown

  • probability distribution: an assignment of weights to outcomes

  • in expectimax search, we have a probabilistic model of how the opponent will behave in any state

    • model could be a simple uniform distribution or sophisticated,

Expectations:

  • the expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes

Expectiminimax:

  • environment is an extra “random agent” player that moves after each min/max agent

  • each node computes the appropriate combination of it’s children

  • ex. backgammon

Utilities

Multi-Agent Utilities

  • each player has their own value for a terminal node and maximizes its own component

Maximum Expected Utility

  • a rational agent should choose the action that maximizes its expected utility, given its knowledge

Insensitivity to Monotonic Transformations

  • For worst-case minimax reasoning, terminal function scale doesn’t matter

    • we just want better states to have higher evaluations (get the ordering right)

Utilities

  • functions from outcomes (states of the world) to real numbers that describe an agent’s preferences

  • summarize the agent’s goal

  • an agent must have preferences among:

    • prizes: A, B, etc.

    • lotteries: situations with uncertain prizes

  • Rational preferences:

    • we want some constraints on preferences before we call them rational

    • Ex. an agent with intransitive preferences can be induced to give away all of its money

MEU Principle:

  • choose the action that maximizes the expected utility

  • formula?

Human Utilities

  • utility scales:

    • normalized utilities: u = 1, 0, u = 0, 0

    • micromorts

    • QALYs

  • Money does not behave as a utility function

    • expected monetary value: EMV is p X + (1-p) * Y