1/20
These flashcards cover key concepts and terminology related to Reinforcement Learning, as outlined in the lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Reinforcement Learning
A type of machine learning where an agent learns to take actions in an environment to maximize cumulative rewards.
Agent
The learner or decision maker in a reinforcement learning problem that interacts with the environment.
Environment
The external system with which the agent interacts and within which it operates.
Reward Signal
A numeric signal received by the agent from the environment, used to evaluate the success of its actions.
Markov Decision Process (MDP)
A mathematical framework for modeling decision-making, characterized by states, actions, rewards, and transitions.
Optimal Policy
A strategy that maximizes the expected cumulative reward over time in a reinforcement learning context.
Exploration vs. Exploitation
The trade-off in reinforcement learning between exploring new actions to find better rewards and exploiting known actions that yield good rewards.
Q-Learning
A model-free, value-based reinforcement learning algorithm that seeks to learn the value of actions taken in given states.
Learning Rate (α)
A parameter that determines how much of the newly acquired information overrides old information in the learning process.
Discount Factor (γ)
A parameter used to weigh future rewards, with values between 0 and 1, affecting the importance of immediate versus long-term rewards.
State (s)
The current situation in which the agent finds itself within the environment.
Action (a)
A decision made by the agent that affects the state of the environment.
Exploit
To make use of known good actions or strategies to maximize immediate rewards.
Explore
To try new actions or strategies to gather more information that may lead to better long-term rewards.
Model-based RL
Reinforcement learning that uses a model of the environment to make decisions.
Model-free RL
Reinforcement learning where the agent learns directly from its experiences without a model of the environment.
Value Function
A function that estimates the expected cumulative reward from a given state following a certain policy.
Q-value function
A function that estimates the expected cumulative reward for taking a specific action in a specific state and following a policy thereafter.
Blame Attribution Problem
The challenge of determining which specific action was responsible for a received reward or punishment.
Trajectory
A sequence of states and actions produced by following a policy over time.
Episode
A sequence of actions taken by the agent that ends when the goal is reached or a failure occurs.