1/7
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Explain the key differences between Reinforcement Learning, Supervised Learning, and Unsupervised Learning. Provide an example of a task well-suited for each learning paradigm.
- Reinforcement Learning: an agent interact with an environment, takes action and receives rewards or penalties as feedback
ex: Bot in video games
- Supervised Learning: the model is trained on labeled data
ex: Face Recognition
- Unsupervised Learning: the model is trained on unlabeled data
ex: Abnormal Behaviour Detection, Clustering
- Semi-Supervised Learning: Combining labeled and unlabeled data for training.
ex: Utilized when labeled data is scarce or expensive to obtain.
Describe the Reinforcement Learning cycle, detailing the interaction between the agent and the environment, and define the core components involved (state, action, policy, reward).
- The agent observes the current State (St)
- Then the agent selects an action (At) based on its policy (Pt)
- The agent executes the action within the environment
- The environment transitions to new state(St+1) and receives a reward (Rt+1)
- The agent updates its policy based on experienced transition
Define the State-Value Function (V(s)) and the Action-Value Function (Q(s,a)). Explain the Bellman Expectation Equations for both functions and discuss their significance in Reinforcement Learning.
- The state-value function, denoted as Vπ(s), represents the expected return when
starting in state s and subsequently following policy π: -> quantifies the expected
cumulative reward in state s under policy π
- The action-value function, denoted as Qπ(s,a),represents the expected return when
starting in state s,taking action a,and subsequently following policy π: -> quantifies the
expected cumulative reward in state s, taking action a and follwing policy π.
- Bellman Expectation Functions for
+ Value: This equation expresses the value of a state s as the expected immediate
reward plus the discounted expected value of subsequent state St+1
+ Q: this equation expresses the value of taking action a in state s as the expected
immediate reward plus the discounted expected value of subsequent state St+1
Discuss the concept of Deep Reinforcement Learning. Explain how deep neural networks are integrated into RL algorithms and highlight the advantages of Deep RL, particularly in handling high-dimensional state and action spaces.
- Deep RL is the combination of RL and Deep Learning
- Deep RL is using deep neural networks to approximate values, function, models, and policies
Advantages:
- Handling High-Dimensional Space: Deep neural networks can extract meaningful representations from raw sensory inputs, such as images or high-dimensional state vectors.
- Enable RL solve complicated tasks.
- Automatic Feature Learning: The networks can automatically learn relevant features from the input data, reducing the need for manual feature engineering.
Discuss the key factors used to classify different Reinforcement Learning algorithms. Provide specific examples of algorithms that fall into different categories according to these factors
- Key factors include:
+ Policy-based vs. value-based: (e.g., Policy Gradient vs. Q-learning)
+ Model-free vs. model-based: (e.g., Q-learning vs. Dyna-Q)
+ On-policy vs. off-policy: (e.g., SARSA (on-policy) vs. Q-learning (off-policy))
+ Discrete vs. continuous actions: (e.g., DQN for discrete; DDPG for continuous)
In the context of reinforcement learning for complex decision-making tasks, discuss the potential benefits and drawbacks of using auxiliary networks to assist the primary learning agent. How might these auxiliary networks contribute to or detract from overall performance, and what are some strategies to optimize their integration?
Auxiliary networks are additional networks that support the primary RL agent (e.g., target networks, replay buffers, curiosity modules).
Benefits:
+ Improve learning stability (e.g., target network in DQN)
+ Enhance exploration (e.g., intrinsic motivation/curiosity networks)
+ Allow for multi-task or auxiliary task learning, which can speed up or regularize the
main learning process
Drawbacks:
+ Increase computational and memory cost
+ Can complicate training and introduce instability if not integrated carefully
Strategies:
+ Carefully schedule updates (e.g., delayed update for target networks)
+ Use regularization and careful design to avoid interference
+ Monitor the contribution of auxiliary losses to ensure they benefit the main task\
Name four significant challenges in reinforcement learning that distinguish it from supervised learning, and briefly explain why each poses a unique difficulty for learning agents.
- Delayed Reward: In RL, rewards are usually received long after taking action, it is hard for the agent to determine which actions have good/bad outcome
- Exploration vs Exploitation: Must find a balance to avoid local optima/missing best strategies
- Non-Stationary/ Unknown Environment: Must learn while interacting, no fixed data distribution
- Partial Observability: Agent may lack all necessary information
Contrast Model-Free and Model-Based Reinforcement Learning. Provide examples and discuss trade-offs
Model-Free RL learns directly from experience (e.g., Q-learning, SARSA). It is simpler but less sample-efficient.
Model-Based RL learns a model of the environment and uses it for planning (e.g., MCTS, Dyna-Q). It is more sample-efficient but depends on model accuracy.
Trade-offs: Model-based methods can plan ahead but may suffer from model errors; model-free methods are robust but require more data.