1/5
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
MDP
markov decision process
markov decision process
Takes into account total reward including future actions, rather than just the current action rewards such as in bandit problems.
dynamics function sum(s’∈S) and sum(r∈R) over p(s′,r ∣ s,a) =1
s — the current state
a — the action you took
s’ — a possible next state
r— a possible reward
S — the set of all states (the state space); s′∈S means "sum over every possible next state
R — set of all possible rewards
What is part of the agent?
Anything that can be modified by the agent
agent environment boundary
the boundary that determines where the agents actions that affect the world ends and where the unchangeable environment begins