Notes on Irrevocable and Adversarial Search (FIT3080)

Notes on Irrevocable and Adversarial Search (FIT3080)

  • Topic focus: Irrevocable search algorithms and adversarial search in AI, with practical examples and complexity considerations.

Learning objectives (overview)

  • Problem formulation and problem-solving agents

  • Search algorithms and strategies

    • Tentative algorithms: Backtrack (Ch. 5.1, 5.3, 5.4)

    • Tree and graph search strategies

    • Uninformed: BFS, UCS, DFS, DLS, IDS (Ch. 3.1–3.4.4, 3.4.6)

    • Informed: Greedy best-first search, A, A* (Ch. 3.5.1–3.5.4)

    • Irrevocable algorithms: Informed variants like Hill climbing, Local beam search, Simulated annealing, Genetic algorithms (Ch. 4.1)

  • Adversarial search algorithms (Ch. 6.1–6.3)

    • Optimal decisions via Minimax and α-β pruning


Irrevocable Search Algorithms

  • Local search overview

    • Keep a single current state and attempt to improve it

    • Often used for optimization where the goal is a feasible configuration

    • State space: set of complete configurations

    • Example application: constraints satisfaction (e.g., n-queens)

  • Local search examples and concepts

    • n-Queens problem: place n queens on an n×n board so that no two attack each other (row/column/diagonal).

    • Hill climbing and Local Beam Search are described as alternative irrevocable methods.

  • Hill climbing and its procedure (Hill Climbing(current-state))

    • 1. If current-state = goal-state, return it.

    • 2. Otherwise, repeat until a solution is found or no operators remain:

    • a) Select an operator not yet applied to current-state and apply it to generate new-state

    • b) Evaluate new-state:

      • i) If new-state = goal-state, return it

      • ii) Else if new-state is better than current-state, set current-state ← new-state

    • Variant noted: Steepest ascent hill-climbing selects the best operator.

    • Key limitation: can get stuck in local maxima.

  • Hill climbing example: 8-puzzle visual intuition

    • Objective: minimize a heuristic such as the number of tiles out of place.

    • Example heuristic values shown as f = - (number of tiles out of place).

    • Demonstrates progression from initial state toward improving states, with potential stagnation at local maxima.

  • Local Beam Search

    • Maintains k states instead of a single current state.

    • Initialize with k randomly generated states.

    • Each iteration: generate all successors of all k states; if any successor is a goal, return it.

    • Else, select the k best successors from the complete list and repeat.

  • Simulated Annealing (SA)

    • Inspiration from physical annealing processes.

    • Idea: escape local maxima/minima by allowing worse moves with decreasing frequency over time (temperature T).

    • Annealing schedule: how T is lowered over time.

    • SA procedure (current-state):

    • 1. If current-state = goal-state, return it

    • 2. BestSoFar ← current-state

    • 3. Initialize T according to the schedule

    • 4. Until no more operators can be applied do:

      • a) Generate new-state from an untried operator

      • b) Compute ΔE = Value(current-state) − Value(new-state)

      • i) If new-state = goal-state, return it

      • ii) Else if ΔE < 0 (new-state better than current-state), set current-state ← new-state; if new-state better than BestSoFar, update BestSoFar

      • iii) Else, with probability Pr = e^{−ΔE / T}, set current-state ← new-state

      • c) Revise T according to schedule

      • d) If T = 0, return BestSoFar

    • Maximization problems are typically targeted in SA examples.

  • Properties of Simulated Annealing

    • If T decreases slowly enough, SA can find a global optimum with probability approaching 1.

    • Widely used in domains like VLSI layout and airline scheduling.

  • Genetic Algorithms (GAs)

    • Start with a population of k randomly generated states (chromosomes).

    • A chromosome is a string over a finite alphabet (often binary 0/1).

    • A successor is generated by combining two parent states (crossover).

    • Evaluation function (fitness): higher values indicate better states.

    • GA loop: produce next generation via selection, crossover, and mutation.

  • GA example: 8-Queens Problem (representation and fitness)

    • Chromosome: one gene per column (8 genes per chromosome)

    • Gene: row number (1–8) of the queen in that column

    • Fitness: number of non-attacking queen pairs (min = 0, max = 28)

    • Maximum possible: 8 imes 7 / 2 = 28

    • Example selection probabilities (illustrative): ~31%, ~29%, ~26%, ~14% for different chromosomes (as shown in slides).

    • Demonstrates initial population, selection, crossover, and evolution toward higher fitness.

  • Summary of graph-search informedness (brief)

    • Informedness in graph-search uses f(n) = g(n) + h(n).

    • For A*: f(n) = g(n) + h(n) with constraints:

    • g(n) \ge g^*(n) (g is a lower bound on the true cost to reach the goal)

    • 0 \le h(n) \le h^*(n)

    • Uninformed graph-search: BFS, UCS, DFS, DLS, IDS correspond to specific choices of g and h (e.g., h = 0 for BFS/UCS).

    • Informed graph-search: Greedy best-first search uses g(n) = 0,\ h(n) \ge 0 leading to a greedy approach.

    • A* uses a balance between g and h to achieve optimality under admissible heuristics.


Adversarial Search Algorithms

  • Real-world examples of adversarial play

    • Chess: Kasparov vs Deep Blue (1997) – about 10^40 states? The slide notes approx. 10^40 states in the context of processing power; Deep Blue searched roughly 30 billion positions per move, sometimes extending to deeper plies.

    • Go: AlphaGo (2016) – Go’s branching factor is enormous (Go > 361 possible moves per ply), requiring advanced methods beyond brute-force search.

  • Types of games (context for AI search)

    • Perfect information vs imperfect information (deterministic and fully observable vs stochastic/partially observable)

    • 2-player games; sequential; turn-based vs real-time; zero-sum vs non-zero-sum

    • The focus is on 2-player, sequential, perfect information, zero-sum games (e.g., chess, tic-tac-toe).

  • Searching game trees: conventions

    • Players are MAX and MIN; a MAX-favorable position has a positive value, MIN-favorable is negative.

    • Goal: find a winning strategy for MAX (MAX aims to maximize utility; MIN aims to minimize).

  • Example: Tic-tac-toe game tree (illustrative)

    • Root: initial board; moves lead to child nodes alternating MAX and MIN turns; leaves are terminal with utilities like +1 (win for MAX), 0 (draw), -1 (loss for MAX).

  • Game-tree concepts

    • Root: initial configuration

    • Ply: one level of moves (one move by one player)

    • Leaves/terminal: win, loss, draw or utility value

    • Frontier: current boundary of explored nodes

    • Each node has a minimax value or utility

  • Solving game trees

    • A solution labels the root with an outcome (win, loss, draw, or a utility value)

    • Each opponent move is a subproblem; all subproblems must be solved to solve the parent node

  • Winning strategy criteria (for MAX)

    • For every MIN move possible, MAX can still win from at least one resulting position

    • For every MAX move, MAX should be able to force a win from at least one resulting position

  • Games vs search problems

    • Opponent is unpredictable: must account for many possible opponent replies

    • Time limits: not all games can be fully searched; select a good first move instead

  • Minimax search

    • Core idea: recursively evaluate game tree using backward induction

    • Key definitions:

    • S0: initial state

    • ACTIONS(s): set of moves from state s

    • RESULT(s, a): state after applying action a in s

    • IS-TERMINAL(s): whether to stop down the tree

    • UTILITY(s): payoff for the current player at state s

    • Minimax search pseudocode (conceptual):

    • MAX-VALUE(state): if IS-TERMINAL(state) return UTILITY(state)

      • for each a ∈ ACTIONS(state): v ← MIN-VALUE(RESULT(state, a)); track max v and best move

    • MIN-VALUE(state): similarly, but minimizes over MAX-VALUE(RESULT(state, a))

    • Depth-first exploration: complete to depth m; time complexity O(b^m); space complexity O(b^m) for DFS, or O(m) for standard backtracking

    • Illustrative 2-ply example shows how values propagate up the tree.

  • Minimax example: Tic-tac-toe evaluation

    • Evaluation function in the example: difference in available lines for MAX vs MIN

    • Example values illustrate how moves influence future opportunities (e.g., center control is strong, etc.)

  • Alpha-Beta pruning (α-β pruning)

    • Definitions:

    • α-value for a MAX node: best value found so far for MAX (lower bound on MAX’s value)

    • β-value for a MIN node: best value found so far for MIN (upper bound on MIN’s value)

    • Pruning rules:

    • α-cutoff: can prune below a MIN node if its β-value ≤ α-value of some MAX ancestor; set the MIN node’s value to its β-value

    • β-cutoff: can prune below a MAX node if its α-value ≥ β-value of some MIN ancestor; set the MAX node’s value to its α-value

    • Termination condition: once all successors of the start node have final backed-up values, the best first move is the successor with the highest value

    • Practice examples and figures show α and β propagation and when cutoffs occur

  • Move ordering and search efficiency

    • α-β effectiveness depends on ordering of examined states

    • With perfect ordering, time complexity can be O(b^(m/2)) – the depth of search effectively doubles

    • Dynamic ordering schemes improve pruning efficiency toward the theoretical limit

  • Resource limits and practical considerations

    • Time per move is limited; e.g., 100 seconds per move, exploring ~10^6 nodes per move

    • If terminal states are not reached within time, treat the frontier as a terminal node and rely on an evaluation function to estimate utility

  • Evaluation functions (for non-terminal searches)

    • From experience (e.g., chess):

    • Piece material values (pawn = 1, …, queen = 9)

    • Linear weighted sum of features: Eval = w1 f1 + w2 f2 + … + wn fn

    • Example features: difference in piece counts, positional heuristics, king safety, mobility, etc.

    • From preprocessing (e.g., endgame databases in chess): use pre-computed endgame data for the last few plies

  • Search strategies (high-level naming and roles)

    • Depth-limited search with enhancements: quiescence search, singular extension

    • Iterative deepening search

    • Beam search (prune to n-best moves)

    • Two broad types:

    • Type A: wide but shallow

    • Type B: deep but narrow

  • Deterministic games in practice (highlights and landmarks)

    • Checkers (Chinook): defeated world champion in 1990 using α-β search with a huge endgame database (~39 trillion endgame positions)

    • Chess: Deep Blue (1997) searched ~30 billion positions per move, depth often 14, extended up to 40 for promising lines; heuristic reductions reduced effective branching factor (EBF) to ~3 in practice

    • Othello: computers defeated world champions (1997) 6-0

    • Go: branching factor too large for brute-force α-β; 2016 AlphaGo used deep learning and search synergistically to beat the world champion 4-1

  • Summary of insights from adversarial search

    • Perfection is unattainable; we must approximate optimal play

    • These problems force us to think about what to think about (which nodes to keep or discard and to what depth)


Reading and references

  • Primary text: Russell, S. and Norvig, P., Artificial Intelligence – A Modern Approach (4th ed), Prentice Hall

    • Chapter 4, Section 4.1 – Irrevocable search algorithms

    • Chapter 6, Sections 6.1–6.3 – Adversarial search algorithms


Next lecture topic

  • Bayesian Networks


Quick glossary of key equations and concepts

  • Heuristic search scoring: f(n) = g(n) + h(n)

    • g(n) \ge g^*(n)

    • 0 \le h(n) \le h^*(n)

  • A* optimality requires admissible heuristics (h(n) never overestimates the true cost)

  • Simulated annealing acceptance probability: P( ext{accept}) = e^{-\Delta E / T}

  • Genetic algorithm fitness: higher fitness means better candidate; selection/crossover/mutation drive evolution toward high-fitness solutions

  • n-Queens fitness example: max fitness for 8-queens is 28 = 8 \times 7 / 2

  • Minimax complexity: O(b^m) time, O(b^m) space for classical DFS exploration; optimizations with α-β pruning aim to reduce explored nodes

  • Notation recap

    • MAX and MIN denote the two players

    • Terminal leaves correspond to win/loss/draw or utility values

    • Depth (ply) indicates how many moves ahead are considered

    • Branching factor b, depth m, leading to rough complexity estimates

If you’d like, I can tailor these notes further to specific topics you expect on the exam (e.g., focus on Minimax vs Alpha-Beta, or detailed pseudocode refinements).