Week 4/5 - Adversarial Search and Machine Learning

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/25

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

26 Terms

New cards

How is a two-player deterministic game represented as a search problem?

By defining the initial state, actions, terminal test, and utility function. In zero-sum games, utilities are equal and opposite.

New cards

What is the minimax algorithm?

An algorithm that chooses moves to maximize the player's minimum guaranteed payoff, assuming the opponent plays optimally.

New cards

What are the properties of minimax?

Complete (if tree is finite), Optimal (against optimal opponent), Time and Space complexity: O(b^m).

New cards

What is a cutoff test and evaluation function in game search?

A cutoff test ends the search early (e.g. depth limit), and an evaluation function estimates utility at cutoff states using heuristics.

New cards

Why are monotonic transformations acceptable in deterministic games?

They preserve the order of utility values, so the relative preference between moves is unchanged.

New cards

What is alpha-beta pruning?

An optimization of minimax that prunes branches that cannot affect the final decision, using α (max’s best) and β (min’s best) values.

New cards

What is the time complexity of minimax with alpha-beta pruning under perfect ordering?

O(b^(m/2)), effectively doubling the depth that can be searched.

New cards

What strategy was used by Chinook in checkers?

Chinook used alpha-beta pruning with an endgame database for perfect play with ≤8 pieces, covering ~400 billion positions.

New cards

How did Deep Blue outperform Kasparov in chess?

Deep Blue used alpha-beta pruning, advanced evaluation functions, and could search up to 40-ply with massive parallel processing.

New cards

What made Go especially difficult for traditional AI?

Its large branching factor (b > 300) and pattern-recognition complexity made conventional search ineffective.

New cards

What is the expectiminimax algorithm?

An extension of minimax for games with chance nodes, computing the expected utility over probabilistic outcomes.

New cards

Why can't expectiminimax use arbitrary utility transformations?

Exact values matter for averaging; only positive linear transformations preserve behavior, unlike monotonic ones in minimax.

New cards

How did AlphaGo defeat human champions?

By combining deep learning, reinforcement learning, and Monte Carlo tree search, exploiting patterns and evaluation learning.

New cards

What is TD-Gammon and how did it perform?

A backgammon program using depth-2 search with a learned evaluation function; performed at near world-champion level.

New cards

What is the main challenge with hand-crafted evaluation functions in complex games like Go?

Heuristics often fail to predict outcomes reliably due to large branching factors and complex positional dynamics, making it difficult to design effective evaluation functions.

New cards

What is book learning in game AI?

Learning sequences of strong opening or endgame moves by memorizing outcomes of previously seen positions.

New cards

What is search control learning?

Learning how to adjust search parameters like move ordering and cutoff depth to make search more efficient.

New cards

What is Monte Carlo Tree Search (MCTS)?

MCTS is a search method that estimates the utility of a game state by performing many random or guided playouts from that state to terminal outcomes.

New cards

What are the four main steps in MCTS?

1. Selection 2. Expansion 3. Simulation (playout) 4. Backpropagation

New cards

How does MCTS estimate the value of a game state?

By averaging the outcomes (win/loss/draw) of multiple playouts that start from that state and reach a terminal state.

New cards

What is the benefit of simulation-based evaluation over heuristic functions?

Simulations provide real outcome-based evidence, while heuristics rely on potentially inaccurate approximations.

New cards

What types of policies can be used during MCTS playouts?

1. Random moves (slow learning)

2. Game-specific heuristics

3. Learned evaluation policies (e.g. from self-play)"

New cards

What is the UCB1 formula used in MCTS selection?

UCB1(n) = U(n)/N(n) + C * sqrt(ln(N(parent(n))/N(n))) where C balances exploration and exploitation.

New cards

What does the UCB1 formula aim to balance?

It balances exploitation (choosing nodes with high win rate) and exploration (choosing nodes with fewer visits).

New cards

After all playouts, how does MCTS decide which move to make?

The move with the **most simulations (visits)** is chosen, not necessarily the one with the highest win rate.

New cards

How does self-play improve MCTS agents?

It allows the agent to refine playout strategies and learn better evaluation functions from experience, as done in AlphaGo.

Explore top notes

Market Supply and Demand Externalities

Updated 970d ago

Note

Chapter 25: The Basics of Credit

Updated 1063d ago

Note

Chapter 7: Founding A Nation (1783-1791)

Updated 965d ago

Note

Formation- Tropical Depression, Tropical Storm, and Hurricane

Updated 956d ago

Note

1.5 Musical Form and Musical Style

Updated 933d ago

Note

Physical and Chemical Properties and Changes

Updated 972d ago

Note

CH 13 The Reformation

Updated 1010d ago

Note

Most common books on the AP Literature Exam

Updated 210d ago

Note

Explore top flashcards

Heart Failure

Updated 369d ago

Flashcards (30)

English Vocab 2022-2023

Updated 783d ago

Flashcards (131)

Unit 3 Lesson 1 Vocab Spanish 4

Flashcards (51)

Flashcards (120)

Flashcards (131)

Unidad 5 Prueba de Vocabulario 1 Terms 1-20

Updated 47d ago

Flashcards (20)

Sadlier Level F - Unit 5

Updated 892d ago

Flashcards (20)

Biochimica 2 - domande esame

Updated 356d ago

Flashcards (61)