Module 9-Adversial Search

Adversarial Search and Games

What kind of games are there?

Golf,chess,pocker

Games most commonly studied in AI are:

  • Determinisitc

  • Two-Player

  • Turn-taking

  • Perfect information

  • Zero-sum

Approaches to modelling adversarial

1. Consider agents together as an economy.
No need to predict the action of individual agents.
Can capture large characteristics of the system, such as laws of supply and demand.
2. Consider adversarial agents as part of the environment.
Models the probabilistic behavior of agents as a dynamic system.
Does not explicitly take into account that agents may have conflicting goals.
3. Model agents using adversarial game-tree search.
Explicitly models players as adversarial agents.
Only suitable for specific games.

Demo: grind
Rules

‣ Two players take turn in moving the robot.

‣ Robot can only move in horizontal or vertical direction.

‣ Robot cannot move back to earlier state.

‣ First player to reach the cup wins.

‣ If there is no legal move left, the game is a draw.

You just put ´-1 in the same line until you get to A which is last so max

Two-player zero-sum games

Formalization

‣ S0: Initial state of the game

‣ To-Move(s): player whose turn it is to move in state s

‣ Action(s): Set of legal moves in state s

‣ Result(s, a): Transition model, defining the state resulting from taking action a in state s. '

‣ Is-Terminal(s): True, when the game is over, false otherwise.

‣ Utility(s, p): numeric value to player p when the game ends in state s

State space graph

‣ Initial state + Actions + Result → state space graph

‣ Vertices are states, edges are moves.

‣ Some states may be reached by multiple paths

‣ Game Tree: Full representation of state-space graph

‣ Tree that follows every sequence of moves all the way to a terminal state.

‣ May be infinite (in case of repeatable states)

‣ Search Tree

‣ Partial representation of state-space graph

‣ Used to determine what move to make

Tic-Tac-Toe

Ordinary search vs Adversarial Search

‣ In a normal search such as for the 8-puzzle, we could end the game by finding a path to a good end position.

‣ However, in adversarial search, the other player co-determines the path

Minimax search

‣ Two players, MAX and MIN, take turns in the game.

‣ MAX must plan ahead against each of MIN’s possible moves*.

* A move by a player is also called a ply.

MINIMAX(S)=

‣ If we are at a terminal node → utility of terminal node

‣ If it’s MAX’s turn to move → maximum of descendant’s utilities Picture from AIMA, 4th edition

‣ If it’s MIN’s turn to move → minimum of descendant’s utitilties

MINIMAX search Algorithm

‣ Depth-first exploration of the tree.

‣ Recursively descends each branch of the tree.

‣ Computes utility for terminal nodes. Picture from AIMA, 4th edition

‣ Goes back up, assigning minimax value to each node.

Complexity of MINIMAX

‣ Time complexity of MINIMAX is exponential: O(bm)

‣ b = (average) branching factor

‣ m = maximum depth of the tree

‣ More efficient way to search the game tree → Alpha-Beta pruning

Alpha-Beta pruning

‣ α = the value of the best (i.e., highest-value) choice we have found so far at any choice point along the path for MAX. Think: α = “at least.”

/

‣ β = the value of the best (i.e., lowest-value) choice we have found so far at any choice point along the path for MIN. Think: β = “at most.”

Move ordering

‣ If we have information on which moves are generally better than others, we can improve alpha-beta pruning by first evaluating the utility of nodes which are considered good moves.

‣ For instance, in chess: capture > threat > forward move > backward move

Transposition tables

In games like chess, the same positions can occur as a result of different moves → this is called a transposition

‣ Exploring the search-tree from that point again would be at least double work.

‣ Results of search for positions can be stored in a transposition table.

‣ Lookup from transposition table instead of search.

‣ Chess positions can be converted into unique indexes using special hashing techniques so that lookup has O(1) time complexity.

Heuristic strategies

Shannon (1950)

‣ Type A strategy (historically used for chess)

Consider wide but shallow part of the tree and estimate the utility at that point.

‣ Type B strategy (historically used for Go)

Consider promising parts of the tree deeply and ignore unpromising paths.

Heuristic Alpha-Beta Tree Search

‣ Can treat non-terminal nodes as if they were terminal

‣ Utility function, which is certain, is replaced by an evaluation function, which provides an estimate.

• e.g. queen=9, knight=3, bishop=3, rook=5, pawn=1, …

• Typically a weighted linear function of the values

• … but can be any function of the features

‣ H-MINIMAX (s, d)

• If cut-off reached → compute expected utility of node (=true utility for terminal nodes)

• If it’s MAX’s turn to move → maximum of descendant’s expected utilities

• If it’s MIN’s turn to move → minimum of descendant’s expected utilities

Forward Pruning

‣ Prune moves that appear to be bad (based on experience)

‣ Type B strategy

‣ PROBCUT: Forward pruning version of alpha-beta search that prunes nodes that are probably outside the α-β window.

‣ Late move reduction: reduces depth to which ordered moves are searched. Backs up to full search if higher α value is found.

Monte Carlo Tree Search

Complexity of games like Go is far greater than that of chess:

‣ Go starts with a branching factor of 361 and continues with an average branching factor of 150.

‣ Alpha-beta search is useless because it would not be possible to see far ahead.

‣ Instead, multiple simulations of complete games are played-out, starting from a given position.

‣ Expected utility of a move is percentage of play-outs with a win given that move.

‣ Usually combined with exploitation of past experiences (lookup).

‣ Combined with reinforcement learning → neural-network based game programs that learns by playing against itself (e.g.,. Alpha Zero)

Summary

‣ Games can be formalized by their initial state, the legal actions, the result of each action, a terminal test, and a utility function.

‣ The MINIMAX algorithm can determine the optimal-moves for two-player, discrete, deterministic, turn-taking, zero-sum games, with perfect information.

‣ Alpha-beta pruning can remove subtrees that are provably irrelevant.

‣ Heuristic evaluation functions must be used when the entire game-tree cannot be explored (i.e., when the utility of the terminal nodes can’t be computed).

‣ Monte-Carlo tree search is an alternative which plays-out entire games repeatedly and chooses the next move based on the proportion of winning 51 play-outs.