Module 9-Adversial Search

Adversarial Search and Games

What kind of games are there?

Golf,chess,pocker

Games most commonly studied in AI are:

Determinisitc
Two-Player
Turn-taking
Perfect information
Zero-sum

Approaches to modelling adversarial

1. Consider agents together as an economy.
‣ No need to predict the action of individual agents.
‣ Can capture large characteristics of the system, such as laws of supply and demand.
2. Consider adversarial agents as part of the environment.
‣ Models the probabilistic behavior of agents as a dynamic system.
‣ Does not explicitly take into account that agents may have conflicting goals.
3. Model agents using adversarial game-tree search.
‣ Explicitly models players as adversarial agents.
‣ Only suitable for specific games.

Demo: grind
Rules

‣ Two players take turn in moving the robot.

‣ Robot can only move in horizontal or vertical direction.

‣ Robot cannot move back to earlier state.

‣ First player to reach the cup wins.

‣ If there is no legal move left, the game is a draw.

You just put ´-1 in the same line until you get to A which is last so max

Two-player zero-sum games

Formalization

‣ S0: Initial state of the game

‣ To-Move(s): player whose turn it is to move in state s

‣ Action(s): Set of legal moves in state s

‣ Result(s, a): Transition model, defining the state resulting from taking action a in state s. '

‣ Is-Terminal(s): True, when the game is over, false otherwise.

‣ Utility(s, p): numeric value to player p when the game ends in state s

State space graph

‣ Initial state + Actions + Result → state space graph

‣ Vertices are states, edges are moves.

‣ Some states may be reached by multiple paths

‣ Game Tree: Full representation of state-space graph

‣ Tree that follows every sequence of moves all the way to a terminal state.

‣ May be infinite (in case of repeatable states)

‣ Search Tree

‣ Partial representation of state-space graph

‣ Used to determine what move to make

Tic-Tac-Toe

Ordinary search vs Adversarial Search

‣ In a normal search such as for the 8-puzzle, we could end the game by finding a path to a good end position.

‣ However, in adversarial search, the other player co-determines the path

Minimax search

‣ Two players, MAX and MIN, take turns in the game.

‣ MAX must plan ahead against each of MIN’s possible moves*.

* A move by a player is also called a ply.

MINIMAX(S)=

‣ If we are at a terminal node → utility of terminal node

‣ If it’s MAX’s turn to move → maximum of descendant’s utilities Picture from AIMA, 4th edition

‣ If it’s MIN’s turn to move → minimum of descendant’s utitilties

MINIMAX search Algorithm

‣ Depth-first exploration of the tree.

‣ Recursively descends each branch of the tree.

‣ Computes utility for terminal nodes. Picture from AIMA, 4th edition

‣ Goes back up, assigning minimax value to each node.

Complexity of MINIMAX

‣ Time complexity of MINIMAX is exponential: O(bm)

‣ b = (average) branching factor

‣ m = maximum depth of the tree

‣ More efficient way to search the game tree → Alpha-Beta pruning

Alpha-Beta pruning

‣ α = the value of the best (i.e., highest-value) choice we have found so far at any choice point along the path for MAX. Think: α = “at least.”

‣ β = the value of the best (i.e., lowest-value) choice we have found so far at any choice point along the path for MIN. Think: β = “at most.”

Move ordering

‣ If we have information on which moves are generally better than others, we can improve alpha-beta pruning by first evaluating the utility of nodes which are considered good moves.

‣ For instance, in chess: capture > threat > forward move > backward move

Transposition tables

In games like chess, the same positions can occur as a result of different moves → this is called a transposition

‣ Exploring the search-tree from that point again would be at least double work.

‣ Results of search for positions can be stored in a transposition table.

‣ Lookup from transposition table instead of search.

‣ Chess positions can be converted into unique indexes using special hashing techniques so that lookup has O(1) time complexity.

Heuristic strategies

Shannon (1950)

‣ Type A strategy (historically used for chess)

Consider wide but shallow part of the tree and estimate the utility at that point.

‣ Type B strategy (historically used for Go)

Consider promising parts of the tree deeply and ignore unpromising paths.

Heuristic Alpha-Beta Tree Search

‣ Can treat non-terminal nodes as if they were terminal

‣ Utility function, which is certain, is replaced by an evaluation function, which provides an estimate.

• e.g. queen=9, knight=3, bishop=3, rook=5, pawn=1, …

• Typically a weighted linear function of the values

• … but can be any function of the features

‣ H-MINIMAX (s, d)

• If cut-off reached → compute expected utility of node (=true utility for terminal nodes)

• If it’s MAX’s turn to move → maximum of descendant’s expected utilities

• If it’s MIN’s turn to move → minimum of descendant’s expected utilities

Forward Pruning

‣ Prune moves that appear to be bad (based on experience)

‣ Type B strategy

‣ PROBCUT: Forward pruning version of alpha-beta search that prunes nodes that are probably outside the α-β window.

‣ Late move reduction: reduces depth to which ordered moves are searched. Backs up to full search if higher α value is found.

Monte Carlo Tree Search

Complexity of games like Go is far greater than that of chess:

‣ Go starts with a branching factor of 361 and continues with an average branching factor of 150.

‣ Alpha-beta search is useless because it would not be possible to see far ahead.

‣ Instead, multiple simulations of complete games are played-out, starting from a given position.

‣ Expected utility of a move is percentage of play-outs with a win given that move.

‣ Usually combined with exploitation of past experiences (lookup).

‣ Combined with reinforcement learning → neural-network based game programs that learns by playing against itself (e.g.,. Alpha Zero)

Summary

‣ Games can be formalized by their initial state, the legal actions, the result of each action, a terminal test, and a utility function.

‣ The MINIMAX algorithm can determine the optimal-moves for two-player, discrete, deterministic, turn-taking, zero-sum games, with perfect information.

‣ Alpha-beta pruning can remove subtrees that are provably irrelevant.

‣ Heuristic evaluation functions must be used when the entire game-tree cannot be explored (i.e., when the utility of the terminal nodes can’t be computed).

‣ Monte-Carlo tree search is an alternative which plays-out entire games repeatedly and chooses the next move based on the proportion of winning 51 play-outs.