RL Chapter 3

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/5

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 2:55 AM on 6/15/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

6 Terms

1
New cards

MDP

markov decision process

2
New cards

markov decision process

Takes into account total reward including future actions, rather than just the current action rewards such as in bandit problems.

3
New cards

dynamics function sum(s’∈S) and sum(r∈R) over p(s,r ∣ s,a) =1

s — the current state

a — the action you took

s’ — a possible next state

r— a possible reward
S — the set of all states (the state space); sS means "sum over every possible next state

R — set of all possible rewards

4
New cards

What is part of the agent?

Anything that can be modified by the agent

5
New cards

agent environment boundary

the boundary that determines where the agents actions that affect the world ends and where the unchangeable environment begins

6
New cards