1/10
Basic RL notation
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is a State?
State = s in S
the state S represents a “snapshot” of the environment it includes the info a agent needs to take an action within this state
s could for example represent a position in a maze
What is a action?
action = a in A(s)
the action a is a choice the agent can make within the environment at a certain state A(s) represents all actions within state s
What is a policy
Policy a = π(s)
This indicates that the agent takes action a based on the policy π at state s in a deterministic policy this will always be action a, in stochastic policies action a is chosen based on probabilities
What is the reward?
Reward (r) = r(s,a) or rt
this is a feedback value that is returned to the agent after taking action a in state s. This indicates how good the taken action was for the goal
This of something like +1 to move towards the goal and -1 to move away
What is a value function?
Value function = V(s)
The value function V(s) represents the cumulative reward the agent can get from starting from state s and following a certain policy π
V(s) could indicate how good it is to be in state s for future rewards (reaching the goal)
What is a action-value function?
Action-value function (Q value) = Q(s,a)
The action-value function Q(s,a) represents the cumulative reward when taking action a in state s and then following policy π
In Q-learning Q(s, a) gives a value for each state-action pair guiding the agents decisions
What is the discount factor?
Discount factor Îł
The discount factor is a number between 0 and 1 that determines the importance of future rewards relative to immediate rewards. a Îł close to 1 means the agent values future rewards nearly as much as immediate rewards
What is the return?
Return (G) = Gt
This is the total accumulated reward from timestep t onwards, also typically discounted
Gt represents the sum of rewards the agent expects to receive from time step t onward
What is the bellman equation?
Bellman equation = V(s) = E[r(s,a) + Îł V(s')]
This is a recursive equation that helps compute the value of a state. It expresses the state value as the expected reward plus the discounted value of the next state
When is a method shallow?
When it only uses information from a single transition
When is a model wide?
when it considers all actions that can be taken in a state