Reinforcement Learning

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/7

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

8 Terms

New cards

Explain the key differences between Reinforcement Learning, Supervised Learning, and Unsupervised Learning. Provide an example of a task well-suited for each learning paradigm.

- Reinforcement Learning: an agent interact with an environment, takes action and receives rewards or penalties as feedback

ex: Bot in video games

- Supervised Learning: the model is trained on labeled data

ex: Face Recognition

- Unsupervised Learning: the model is trained on unlabeled data

ex: Abnormal Behaviour Detection, Clustering
- Semi-Supervised Learning: Combining labeled and unlabeled data for training.

ex: Utilized when labeled data is scarce or expensive to obtain.

New cards

Describe the Reinforcement Learning cycle, detailing the interaction between the agent and the environment, and define the core components involved (state, action, policy, reward).

- The agent observes the current State (St)

- Then the agent selects an action (At) based on its policy (Pt)

- The agent executes the action within the environment

- The environment transitions to new state(St+1) and receives a reward (Rt+1)

- The agent updates its policy based on experienced transition

New cards

Define the State-Value Function (V(s)) and the Action-Value Function (Q(s,a)). Explain the Bellman Expectation Equations for both functions and discuss their significance in Reinforcement Learning.

- The state-value function, denoted as Vπ(s), represents the expected return when

starting in state s and subsequently following policy π: -> quantifies the expected

cumulative reward in state s under policy π

- The action-value function, denoted as Qπ(s,a),represents the expected return when

starting in state s,taking action a,and subsequently following policy π: -> quantifies the

expected cumulative reward in state s, taking action a and follwing policy π.

- Bellman Expectation Functions for

+ Value: This equation expresses the value of a state s as the expected immediate

reward plus the discounted expected value of subsequent state St+1

+ Q: this equation expresses the value of taking action a in state s as the expected

immediate reward plus the discounted expected value of subsequent state St+1

New cards

Discuss the concept of Deep Reinforcement Learning. Explain how deep neural networks are integrated into RL algorithms and highlight the advantages of Deep RL, particularly in handling high-dimensional state and action spaces.

- Deep RL is the combination of RL and Deep Learning

- Deep RL is using deep neural networks to approximate values, function, models, and policies

Advantages:

- Handling High-Dimensional Space: Deep neural networks can extract meaningful representations from raw sensory inputs, such as images or high-dimensional state vectors.

- Enable RL solve complicated tasks.

- Automatic Feature Learning: The networks can automatically learn relevant features from the input data, reducing the need for manual feature engineering.

New cards

Discuss the key factors used to classify different Reinforcement Learning algorithms. Provide specific examples of algorithms that fall into different categories according to these factors

- Key factors include:

+ Policy-based vs. value-based: (e.g., Policy Gradient vs. Q-learning)

+ Model-free vs. model-based: (e.g., Q-learning vs. Dyna-Q)

+ On-policy vs. off-policy: (e.g., SARSA (on-policy) vs. Q-learning (off-policy))

+ Discrete vs. continuous actions: (e.g., DQN for discrete; DDPG for continuous)

New cards

In the context of reinforcement learning for complex decision-making tasks, discuss the potential benefits and drawbacks of using auxiliary networks to assist the primary learning agent. How might these auxiliary networks contribute to or detract from overall performance, and what are some strategies to optimize their integration?

Auxiliary networks are additional networks that support the primary RL agent (e.g., target networks, replay buffers, curiosity modules).

Benefits:

+ Improve learning stability (e.g., target network in DQN)

+ Enhance exploration (e.g., intrinsic motivation/curiosity networks)

+ Allow for multi-task or auxiliary task learning, which can speed up or regularize the

main learning process

Drawbacks:

+ Increase computational and memory cost

+ Can complicate training and introduce instability if not integrated carefully

Strategies:

+ Carefully schedule updates (e.g., delayed update for target networks)

+ Use regularization and careful design to avoid interference

+ Monitor the contribution of auxiliary losses to ensure they benefit the main task\

New cards

Name four significant challenges in reinforcement learning that distinguish it from supervised learning, and briefly explain why each poses a unique difficulty for learning agents.

- Delayed Reward: In RL, rewards are usually received long after taking action, it is hard for the agent to determine which actions have good/bad outcome

- Exploration vs Exploitation: Must find a balance to avoid local optima/missing best strategies

- Non-Stationary/ Unknown Environment: Must learn while interacting, no fixed data distribution

- Partial Observability: Agent may lack all necessary information

New cards

Contrast Model-Free and Model-Based Reinforcement Learning. Provide examples and discuss trade-offs

Model-Free RL learns directly from experience (e.g., Q-learning, SARSA). It is simpler but less sample-efficient.

Model-Based RL learns a model of the environment and uses it for planning (e.g., MCTS, Dyna-Q). It is more sample-efficient but depends on model accuracy.

Trade-offs: Model-based methods can plan ahead but may suffer from model errors; model-free methods are robust but require more data.