1/9
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Explain the key differences between Reinforcement Learning, Supervised Learning, and Unsupervised Learning. Provide an example of a task well-suited for each learning paradigm.
- Reinforcement Learning: an agent interact with an environment, takes action and receives rewards or penalties as feedback
ex: Bot in video games
- Supervised Learning: the model is trained on labeled data
ex: Face Recognition
- Unsupervised Learning: the model is trained on unlabeled data
ex: Abnormal Behaviour Detection, Clustering
Describe the Reinforcement Learning cycle, detailing the interaction between the agent and the environment, and define the core components involved (state, action, policy, reward).
- The agent observes the current State (St)
- Then the agent selects an action (At) based on its policy (Pt)
- The agent executes the action within the environment
- The environment transitions to new state(St+1) and receives a reward (Rt+1)
- The agent updates its policy based on experienced transition
Define the State-Value Function (V(s)) and the Action-Value Function (Q(s,a)). Explain the Bellman Expectation Equations for both functions and discuss their significance in Reinforcement Learning.
- The state-value function, denoted as Vπ(s), represents the expected return when
starting in state s and subsequently following policy π: -> quantifies the expected
cumulative reward in state s under policy π
- The action-value function, denoted as Qπ(s,a),represents the expected return when
starting in state s,taking action a,and subsequently following policy π: -> quantifies the
expected cumulative reward in state s, taking action a and follwing policy π.
- Bellman Expectation Functions for
+ Value: This equation expresses the value of a state s as the expected immediate
reward plus the discounted expected value of subsequent state St+1
+ Q: this equation expresses the value of taking action a in state s as the expected
immediate reward plus the discounted expected value of subsequent state St+1
Contrast Model-Free and Model-Based Reinforcement Learning. Provide examples of algorithms that fall under each category and discuss the trade-offs between these approaches in terms of sample efficiency and planning capabilities.
- Model-Free RL learn directly from experiences, it is simpler but do not efficient
per sample
- Model-Based RL learn a model of environment and uses it for planning. It is far more
sample-efficient but depends on model's accuracy.
- Tradeoff: Model-Free is more robust and simpler but require more data, while ModelBased is far more sample-efficient and have better planing capabilities
Discuss the concept of Deep Reinforcement Learning. Explain how deep
neural networks are integrated into RL algorithms and highlight the advantages
of Deep RL, particularly in handling high-dimensional state and action spaces.
- Deep RL is the combination of RL and Deep Learning
- Deep RL is using deep neural networks to approximate values, function, models,
and policies
Advantages:
- Handling High-Dimensional Space: Deep neural networks can extract meaningful
representations from raw sensory inputs, such as images or high-dimensional state
vectors.
- Enable RL solve complicated tasks.
- Automatic Feature Learning: The networks can automatically learn relevant
features from the input data, reducing the need for manual feature engineering.
Discuss the key factors used to classify different Reinforcement Learning
algorithms. Provide specific examples of algorithms that fall into different
categories according to these factors
- Key factors include
+ Policy-based vs. value-based: (e.g., Policy Gradient vs. Q-learning)
+ Model-free vs. model-based: (e.g., Q-learning vs. Dyna-Q)
+ On-policy vs. off-policy: (e.g., SARSA (on-policy) vs. Q-learning (off-policy))
+ Discrete vs. continuous actions: (e.g., DQN for discrete; DDPG for continuous)
In the context of reinforcement learning for complex decision-making
tasks, discuss the potential benefits and drawbacks of using auxiliary
networks to assist the primary learning agent. How might these auxiliary
networks contribute to or detract from overall performance, and what are some
strategies to optimize their integration?
Auxiliary networks are additional networks that support the primary RL agent (e.g.,
target networks, replay buffers, curiosity modules).
Benefits
+ Improve learning stability (e.g., target network in DQN)
+ Enhance exploration (e.g., intrinsic motivation/curiosity networks)
+ Allow for multi-task or auxiliary task learning, which can speed up or regularize the
main learning process
Drawbacks:
+ Increase computational and memory cost
+ Can complicate training and introduce instability if not integrated carefully
Strategies:
+ Carefully schedule updates (e.g., delayed update for target networks)
+ Use regularization and careful design to avoid interference
+ Monitor the contribution of auxiliary losses to ensure they benefit the main task\
Name four significant challenges in reinforcement learning that distinguish it
from supervised learning, and briefly explain why each poses a unique difficulty
for learning agents.
- Delayed Reward
In RL, rewards are usually received long after taking action, it is
hard for the agent to determine which actions have good/bad outcome
- Exploration vs Exploitation: Must find a balance to avoid local optima/missing best
strategies
- Non-Stationary/ Unknown Environment: Must learn while interacting, no fixed data
distribution
- Partial Observability: Agent may lack all necessary information
Describe the Reinforcement Learning cycle and define the core components-> state, action,
policy, and reward.
The RL cycle involves the agent observing a state, selecting an action based on a policy,
receiving a reward, and transitioning to a new state.
State (s): Current situation of the environment.
Action (a): Decision made by the agent.
Policy (π): Strategy mapping states to actions.
Reward (r): Feedback signal indicating the value of an action
Contrast Model-Free and Model-Based Reinforcement Learning. Provide examples and
discuss trade-offs
Model-Free RL learns directly from experience (e.g., Q-learning, SARSA). It is simpler
but less sample-efficient.
Model-Based RL learns a model of the environment and uses it for planning (e.g., MCTS,
Dyna-Q). It is more sample-efficient but depends on model accuracy.
Trade-offs: Model-based methods can plan ahead but may suffer from model errors;
model-free methods are robust but require more data.