Reinforcement Learning

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/9

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

10 Terms

1
New cards

Explain the key differences between Reinforcement Learning, Supervised Learning, and Unsupervised Learning. Provide an example of a task well-suited for each learning paradigm.

- Reinforcement Learning: an agent interact with an environment, takes action and receives rewards or penalties as feedback

ex: Bot in video games

- Supervised Learning: the model is trained on labeled data

ex: Face Recognition

- Unsupervised Learning: the model is trained on unlabeled data

ex: Abnormal Behaviour Detection, Clustering

2
New cards

Describe the Reinforcement Learning cycle, detailing the interaction between the agent and the environment, and define the core components involved (state, action, policy, reward).

- The agent observes the current State (St)

- Then the agent selects an action (At) based on its policy (Pt)

- The agent executes the action within the environment

- The environment transitions to new state(St+1) and receives a reward (Rt+1)

- The agent updates its policy based on experienced transition

3
New cards

Define the State-Value Function (V(s)) and the Action-Value Function (Q(s,a)). Explain the Bellman Expectation Equations for both functions and discuss their significance in Reinforcement Learning.

- The state-value function, denoted as Vπ(s), represents the expected return when

starting in state s and subsequently following policy π: -> quantifies the expected

cumulative reward in state s under policy π

- The action-value function, denoted as Qπ(s,a),represents the expected return when

starting in state s,taking action a,and subsequently following policy π: -> quantifies the

expected cumulative reward in state s, taking action a and follwing policy π.

- Bellman Expectation Functions for

+ Value: This equation expresses the value of a state s as the expected immediate

reward plus the discounted expected value of subsequent state St+1

+ Q: this equation expresses the value of taking action a in state s as the expected

immediate reward plus the discounted expected value of subsequent state St+1

4
New cards

Contrast Model-Free and Model-Based Reinforcement Learning. Provide examples of algorithms that fall under each category and discuss the trade-offs between these approaches in terms of sample efficiency and planning capabilities.

- Model-Free RL learn directly from experiences, it is simpler but do not efficient

per sample

- Model-Based RL learn a model of environment and uses it for planning. It is far more

sample-efficient but depends on model's accuracy.

- Tradeoff: Model-Free is more robust and simpler but require more data, while ModelBased is far more sample-efficient and have better planing capabilities

5
New cards

Discuss the concept of Deep Reinforcement Learning. Explain how deep

neural networks are integrated into RL algorithms and highlight the advantages

of Deep RL, particularly in handling high-dimensional state and action spaces.

- Deep RL is the combination of RL and Deep Learning

- Deep RL is using deep neural networks to approximate values, function, models,

and policies

Advantages:

- Handling High-Dimensional Space: Deep neural networks can extract meaningful

representations from raw sensory inputs, such as images or high-dimensional state

vectors.

- Enable RL solve complicated tasks.

- Automatic Feature Learning: The networks can automatically learn relevant

features from the input data, reducing the need for manual feature engineering.

6
New cards

Discuss the key factors used to classify different Reinforcement Learning

algorithms. Provide specific examples of algorithms that fall into different

categories according to these factors

- Key factors include

+ Policy-based vs. value-based: (e.g., Policy Gradient vs. Q-learning)

+ Model-free vs. model-based: (e.g., Q-learning vs. Dyna-Q)

+ On-policy vs. off-policy: (e.g., SARSA (on-policy) vs. Q-learning (off-policy))

+ Discrete vs. continuous actions: (e.g., DQN for discrete; DDPG for continuous)

7
New cards

In the context of reinforcement learning for complex decision-making

tasks, discuss the potential benefits and drawbacks of using auxiliary

networks to assist the primary learning agent. How might these auxiliary

networks contribute to or detract from overall performance, and what are some

strategies to optimize their integration?

Auxiliary networks are additional networks that support the primary RL agent (e.g.,

target networks, replay buffers, curiosity modules).

Benefits

+ Improve learning stability (e.g., target network in DQN)

+ Enhance exploration (e.g., intrinsic motivation/curiosity networks)

+ Allow for multi-task or auxiliary task learning, which can speed up or regularize the

main learning process

Drawbacks:

+ Increase computational and memory cost

+ Can complicate training and introduce instability if not integrated carefully

Strategies:

+ Carefully schedule updates (e.g., delayed update for target networks)

+ Use regularization and careful design to avoid interference

+ Monitor the contribution of auxiliary losses to ensure they benefit the main task\

8
New cards

Name four significant challenges in reinforcement learning that distinguish it

from supervised learning, and briefly explain why each poses a unique difficulty

for learning agents.

- Delayed Reward

In RL, rewards are usually received long after taking action, it is

hard for the agent to determine which actions have good/bad outcome

- Exploration vs Exploitation: Must find a balance to avoid local optima/missing best

strategies

- Non-Stationary/ Unknown Environment: Must learn while interacting, no fixed data

distribution

- Partial Observability: Agent may lack all necessary information

9
New cards

Describe the Reinforcement Learning cycle and define the core components-> state, action,

policy, and reward.

The RL cycle involves the agent observing a state, selecting an action based on a policy,

receiving a reward, and transitioning to a new state.

State (s): Current situation of the environment.

Action (a): Decision made by the agent.

Policy (π): Strategy mapping states to actions.

Reward (r): Feedback signal indicating the value of an action

10
New cards

Contrast Model-Free and Model-Based Reinforcement Learning. Provide examples and

discuss trade-offs

Model-Free RL learns directly from experience (e.g., Q-learning, SARSA). It is simpler

but less sample-efficient.

Model-Based RL learns a model of the environment and uses it for planning (e.g., MCTS,

Dyna-Q). It is more sample-efficient but depends on model accuracy.

Trade-offs: Model-based methods can plan ahead but may suffer from model errors;

model-free methods are robust but require more data.