1/10
Basic RL notation
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is a State?
State = s in S
the state S represents a “snapshot” of the environment it includes the info a agent needs to take an action within this state
s could for example represent a position in a maze

What is a action?
action = a in A(s)
the action a is a choice the agent can make within the environment at a certain state A(s) represents all actions within state s

What is a policy
Policy a = π(s)
This indicates that the agent takes action a based on the policy π at state s in a deterministic policy this will always be action a, in stochastic policies action a is chosen based on probabilities

What is the reward?
Reward (r) = r(s,a) or rt
this is a feedback value that is returned to the agent after taking action a in state s. This indicates how good the taken action was for the goal
This of something like +1 to move towards the goal and -1 to move away

What is a value function?
Value function = V(s)
The value function V(s) represents the cumulative reward the agent can get from starting from state s and following a certain policy π
V(s) could indicate how good it is to be in state s for future rewards (reaching the goal)

What is a action-value function?
Action-value function (Q value) = Q(s,a)
The action-value function Q(s,a) represents the cumulative reward when taking action a in state s and then following policy π
In Q-learning Q(s, a) gives a value for each state-action pair guiding the agents decisions

What is the discount factor?
Discount factor γ
The discount factor is a number between 0 and 1 that determines the importance of future rewards relative to immediate rewards. a γ close to 1 means the agent values future rewards nearly as much as immediate rewards

What is the return?
Return (G) = Gt
This is the total accumulated reward from timestep t onwards, also typically discounted
Gt represents the sum of rewards the agent expects to receive from time step t onward

What is the bellman equation?
Bellman equation = V(s) = E[r(s,a) + γ V(s')]
This is a recursive equation that helps compute the value of a state. It expresses the state value as the expected reward plus the discounted value of the next state
![<p>Bellman equation = V(s) = E[r(s,a) + <span>γ</span> V(s')]</p><ul><li><p>This is a recursive equation that helps compute the value of a state. It expresses the state value as the expected reward plus the discounted value of the next state</p></li></ul><p></p>](https://knowt-user-attachments.s3.amazonaws.com/b4677a10-2c50-43a1-99d9-39cdcae481b6.png)
When is a method shallow?
When it only uses information from a single transition
When is a model wide?
when it considers all actions that can be taken in a state