Basic Notation

0.0(0)

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/10

Earn XP

Description and Tags

Basic RL notation

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

11 Terms

New cards

What is a State?

State = s in S

the state S represents a “snapshot” of the environment it includes the info a agent needs to take an action within this state
s could for example represent a position in a maze

New cards

What is a action?

action = a in A(s)

the action a is a choice the agent can make within the environment at a certain state A(s) represents all actions within state s

New cards

What is a policy

Policy a = π(s)

This indicates that the agent takes action a based on the policy π at state s in a deterministic policy this will always be action a, in stochastic policies action a is chosen based on probabilities

<p>Policy a = <span><em>π(s)</em></span></p><ul><li><p>This indicates that the agent takes action a based on the policy <span><em>π</em></span> at state s in a deterministic policy this will always be action a, in stochastic policies action a is chosen based on probabilities</p></li></ul><p></p>

New cards

What is the reward?

Reward (r) = r(s,a) or r_t

this is a feedback value that is returned to the agent after taking action a in state s. This indicates how good the taken action was for the goal
- This of something like +1 to move towards the goal and -1 to move away

<p>Reward (r) = r(s,a) or r<sub>t</sub></p><ul><li><p>this is a feedback value that is returned to the agent after taking action a in state s. This indicates how good the taken action was for the goal</p><ul><li><p>This of something like +1 to move towards the goal and -1 to move away</p></li></ul></li></ul><p></p>

New cards

What is a value function?

Value function = V(s)

The value function V(s) represents the cumulative reward the agent can get from starting from state s and following a certain policy π
- V(s) could indicate how good it is to be in state s for future rewards (reaching the goal)

New cards

What is a action-value function?

Action-value function (Q value) = Q(s,a)

The action-value function Q(s,a) represents the cumulative reward when taking action a in state s and then following policy π
- In Q-learning Q(s, a) gives a value for each state-action pair guiding the agents decisions

New cards

What is the discount factor?

Discount factor γ

The discount factor is a number between 0 and 1 that determines the importance of future rewards relative to immediate rewards. a γ close to 1 means the agent values future rewards nearly as much as immediate rewards

<p>Discount factor <span>γ</span></p><ul><li><p>The discount factor is a number between 0 and 1 that determines the importance of future rewards relative to immediate rewards. a <span>γ</span> close to 1 means the agent values future rewards nearly as much as immediate rewards</p></li></ul><p></p>

New cards

What is the return?

Return (G) = G_t

This is the total accumulated reward from timestep t onwards, also typically discounted
- G_t represents the sum of rewards the agent expects to receive from time step t onward

<p>Return (G) = G<sub>t</sub></p><ul><li><p>This is the total accumulated reward from timestep t onwards, also typically discounted</p><ul><li><p>G<sub>t</sub> represents the sum of rewards the agent expects to receive from time step t onward</p></li></ul></li></ul><p></p>

New cards

What is the bellman equation?

Bellman equation = V(s) = E[r(s,a) + γ V(s')]

This is a recursive equation that helps compute the value of a state. It expresses the state value as the expected reward plus the discounted value of the next state

<p>Bellman equation = V(s) = E[r(s,a) + <span>γ</span> V(s')]</p><ul><li><p>This is a recursive equation that helps compute the value of a state. It expresses the state value as the expected reward plus the discounted value of the next state</p></li></ul><p></p>

New cards

When is a method shallow?

When it only uses information from a single transition

New cards

When is a model wide?

when it considers all actions that can be taken in a state