RL Chapter 3

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/5

Looks like no tags are added yet.

Last updated 2:55 AM on 6/15/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

New cards

MDP

markov decision process

New cards

markov decision process

Takes into account total reward including future actions, rather than just the current action rewards such as in bandit problems.

New cards

dynamics function sum(s’∈S) and sum(r∈R) over p(s′,r ∣ s,a) =1

s — the current state

a — the action you took

s’ — a possible next state

r— a possible reward
S — the set of all states (the state space); s′∈S means "sum over every possible next state

R — set of all possible rewards

New cards

What is part of the agent?

Anything that can be modified by the agent

New cards

agent environment boundary

the boundary that determines where the agents actions that affect the world ends and where the unchangeable environment begins

New cards