Week 4/5 - Adversarial Search and Machine Learning

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/25

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

26 Terms

1
New cards

How is a two-player deterministic game represented as a search problem?

By defining the initial state, actions, terminal test, and utility function. In zero-sum games, utilities are equal and opposite.

2
New cards

What is the minimax algorithm?

An algorithm that chooses moves to maximize the player's minimum guaranteed payoff, assuming the opponent plays optimally.

3
New cards

What are the properties of minimax?

Complete (if tree is finite), Optimal (against optimal opponent), Time and Space complexity: O(b^m).

4
New cards

What is a cutoff test and evaluation function in game search?

A cutoff test ends the search early (e.g. depth limit), and an evaluation function estimates utility at cutoff states using heuristics.

5
New cards

Why are monotonic transformations acceptable in deterministic games?

They preserve the order of utility values, so the relative preference between moves is unchanged.

6
New cards

What is alpha-beta pruning?

An optimization of minimax that prunes branches that cannot affect the final decision, using α (max’s best) and β (min’s best) values.

7
New cards

What is the time complexity of minimax with alpha-beta pruning under perfect ordering?

O(b^(m/2)), effectively doubling the depth that can be searched.

8
New cards

What strategy was used by Chinook in checkers?

Chinook used alpha-beta pruning with an endgame database for perfect play with ≤8 pieces, covering ~400 billion positions.

9
New cards

How did Deep Blue outperform Kasparov in chess?

Deep Blue used alpha-beta pruning, advanced evaluation functions, and could search up to 40-ply with massive parallel processing.

10
New cards

What made Go especially difficult for traditional AI?

Its large branching factor (b > 300) and pattern-recognition complexity made conventional search ineffective.

11
New cards

What is the expectiminimax algorithm?

An extension of minimax for games with chance nodes, computing the expected utility over probabilistic outcomes.

12
New cards

Why can't expectiminimax use arbitrary utility transformations?

Exact values matter for averaging; only positive linear transformations preserve behavior, unlike monotonic ones in minimax.

13
New cards

How did AlphaGo defeat human champions?

By combining deep learning, reinforcement learning, and Monte Carlo tree search, exploiting patterns and evaluation learning.

14
New cards

What is TD-Gammon and how did it perform?

A backgammon program using depth-2 search with a learned evaluation function; performed at near world-champion level.

15
New cards
What is the main challenge with hand-crafted evaluation functions in complex games like Go?
Heuristics often fail to predict outcomes reliably due to large branching factors and complex positional dynamics, making it difficult to design effective evaluation functions.
16
New cards
What is book learning in game AI?
Learning sequences of strong opening or endgame moves by memorizing outcomes of previously seen positions.
17
New cards
What is search control learning?
Learning how to adjust search parameters like move ordering and cutoff depth to make search more efficient.
18
New cards
What is Monte Carlo Tree Search (MCTS)?
MCTS is a search method that estimates the utility of a game state by performing many random or guided playouts from that state to terminal outcomes.
19
New cards
What are the four main steps in MCTS?
1. Selection 2. Expansion 3. Simulation (playout) 4. Backpropagation
20
New cards
How does MCTS estimate the value of a game state?
By averaging the outcomes (win/loss/draw) of multiple playouts that start from that state and reach a terminal state.
21
New cards
What is the benefit of simulation-based evaluation over heuristic functions?
Simulations provide real outcome-based evidence, while heuristics rely on potentially inaccurate approximations.
22
New cards
What types of policies can be used during MCTS playouts?

1. Random moves (slow learning)

2. Game-specific heuristics

3. Learned evaluation policies (e.g. from self-play)"

23
New cards
What is the UCB1 formula used in MCTS selection?
UCB1(n) = U(n)/N(n) + C * sqrt(ln(N(parent(n))/N(n))) where C balances exploration and exploitation.
24
New cards
What does the UCB1 formula aim to balance?
It balances exploitation (choosing nodes with high win rate) and exploration (choosing nodes with fewer visits).
25
New cards
After all playouts, how does MCTS decide which move to make?
The move with the **most simulations (visits)** is chosen, not necessarily the one with the highest win rate.
26
New cards
How does self-play improve MCTS agents?
It allows the agent to refine playout strategies and learn better evaluation functions from experience, as done in AlphaGo.