1/15
Flashcards for Reinforcement Learning Lecture
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Reinforcement Learning
Goal-directed learning from interaction, where a system adapts to new situations and learns from the past through interaction and feedback to achieve a desired state.
RL Problem
Formalizing reinforcement learning, analyzing its bounds, and mapping concrete applications to the abstract reinforcement learning problem.
RL Solutions
Algorithms to solve defined reinforcement learning problems and methods to approximate these solutions.
RL Field of Study
Everything surrounding reinforcement learning, including problem definition, solutions, preprocessing, and related aspects.
Trial-and-error search
A key aspect of RL involving a loop of action and feedback, where gradient descent cannot be directly used and all possible solutions cannot be enumerated.
Delayed Reward
A key aspect of RL where the value of an action may not be immediately known, requiring planning and non-greedy approaches.
Supervised Learning vs RL
Supervised learning uses correct actions and labels given a priori with a fixed set of examples, separate training/testing, and passive involvement, while RL evaluates actions during deployment, gathers new examples, interacts directly with the environment, and focuses on causal action.
Unsupervised Learning vs RL
Unsupervised learning aims to discover hidden structure with separate phases and passive involvement, while RL maximizes reward, can use unsupervised learning as a subtask, and involves direct interaction with the environment.
Exploitation-Exploration Tradeoff
RL focuses on the whole real-world problem rather than just a subproblem, balancing exploitation and exploration, while supervised and unsupervised learning can be used as subproblems within RL.
Agent (in RL)
The component in RL that senses its environment, observes its state, and takes actions.
Environment (in RL)
The component in RL that provides feedback based on an agent’s actions, changes over time, and is affected by both external events and the agent’s actions.
Policy
Maps from environment state to action, perhaps stochastically.
Reward
Encodes long-term goal via short-term sensations/rewards and is relatively easy to define and observe/estimate.
Value
Represents long-term value of an environment state or action and is hard to define and estimate.
Model of the Environment
Enables the agent to hypothesize about future states of the environment (e.g., planning) and can be a physics simulation environment or an ML prediction model.
Interaction in RL
The key difference from other learning paradigms because actions can both change the environment (e.g., causal effects) and determine which data is collected during learning.