Reinforcement Learning, Rewards, and Human Behavior

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/16

There's no tags or description

Looks like no tags are added yet.

Last updated 10:06 PM on 6/17/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

17 Terms

New cards

Reward

Feedback that tells an RL agent how good an outcome/action was.

New cards

Dense rewards

Frequent feedback given throughout the task.

New cards

Sparse rewards

Rare feedback, such as only win/loss at the end of a game.

New cards

Intermediate rewards

Rewards given at steps before the final outcome to guide learning.

New cards

Continuous rewards

Rewards available over a continuous range or ongoing process.

New cards

Model-based reinforcement learning

Uses or learns a model of the environment to plan actions.

New cards

Model-free reinforcement learning

Learns from experience without explicitly modeling the environment.

New cards

Q-learning

Learns action values based on expected future rewards.

New cards

Policy search

Learns a policy/rule for choosing actions.

New cards

Active reinforcement learning

Learns by exploring actions and updating values/policies.

New cards

Passive reinforcement learning

Learns values while following a fixed policy.

New cards

Inverse reinforcement learning

Infers a human’s goals/preferences/reward function by observing choices.

New cards

Taxi route prediction from GPS

Example of inferring destination/route preferences from observed human behavior.

New cards

Inverted pendulum

Control/RL benchmark problem, not the IRL human-behavior example here.

New cards

Chess neural network

Game-playing ML example, not the taxi-driver IRL example.

New cards

Condition-action rule for braking

Rule learned from sensor patterns and human braking behavior near stop signs.

New cards

Self-driving car agent

Autonomous agent that uses sensors, models/rules, and decisions in a safety-critical environment.