1/44
Vocabulary flashcards covering key concepts, terms, and examples from the reinforcement learning for trading lecture.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Reinforcement Learning (RL)
An AI method where an agent learns to make decisions by interacting with an environment, receiving rewards, and aiming to maximize cumulative return without supervised labels.
Deep Reinforcement Learning
RL that uses neural networks to approximate value or policy functions, replacing explicit Q tables with deep learning models.
Q-table
A table that stores the expected future reward (Q-value) for each state-action pair in traditional Q-learning.
Bellman Equation
The core RL relation that defines the optimal Q-value as the immediate reward plus the discounted best future value.
Gamma (discount factor)
A factor between 0 and 1 that weighs future rewards; gamma = 1 uses full horizon, gamma = 0 focuses on immediate rewards.
Greedy
Choosing the action with the highest estimated value based on current information, favoring immediate reward.
State
The current description of the environment, e.g., market conditions such as volatility, momentum, and liquidity.
Action
The decision the agent can take, such as go long, go short, or hold in trading.
Reward
Feedback received after taking an action, used to guide learning; in trading it can be profit or a virtual score.
Environment
The external system the agent interacts with; in trading, the market constitutes the environment.
Backward Induction
Solving the Bellman equation by starting from the end of the horizon and iterating backward to compute values.
Deep Q-Network (DQN)
A neural network that approximates the Q-table, enabling deep reinforcement learning in complex environments.
Activation Function
A nonlinear function applied to a neuron's input to produce its output, enabling complex representations.
Sigmoid
An S-shaped activation function mapping inputs to the 0–1 range, commonly used in neural networks.
Hidden Layer
A layer of neurons between input and output that transforms inputs into more abstract features.
Weights
Learnable parameters that scale inputs in a neural network, determining the network output.
Bias
A constant input added to neurons to improve learning flexibility and stability.
Loss Function
A measure of the difference between the network output and the true target; lower loss means better predictions.
Optimizer
Algorithm used to adjust weights to minimize the loss; examples include SGD and Adam.
Adam Optimizer
An adaptive learning rate optimizer that adjusts step sizes during training for faster convergence.
Overfitting
When a model captures noise rather than the underlying pattern, especially problematic in finance and hard to cure.
Supervised Learning
Learning from labeled data where the model predicts known targets; includes regression and classification.
Unsupervised Learning
Learning from data without labels to discover structure, such as PCA and clustering.
Regression
A supervised learning task where the target is continuous, e.g., predicting a price.
Classification
A supervised learning task where the target is discrete classes, e.g., buy/hold/sell decisions.
Principal Component Analysis (PCA)
An unsupervised dimensionality reduction method that finds orthogonal components capturing variance.
Autocorrelation
The correlation of a time series with its past values at a given lag, indicating repeating patterns.
Autocorrelation Function (ACF)
A function that quantifies the correlation of a series with its lagged versions across lags.
Walk-Forward Optimization
Backtesting by training on a moving window and testing on the next period to avoid look ahead bias.
Look-Ahead Bias
Unintentionally using future information to train or test a model, inflating apparent performance.
Stationarity
A property where a time series has constant mean and variance over time; returns are often stationary while prices are not.
Levy Stable / Cauchy Distributions
Heavy-tailed distributions used to describe non-Gaussian market returns with unpredictable variance.
Marshmallow Experiment
A metaphor for delayed gratification in RL, where waiting yields larger future rewards.
Keynesian Beauty Contest
Predicting what others think will win, illustrating markets as a contest of predicting others expectations.
Reward Function
The RL signal that assigns value to actions to shape behavior, can incorporate profits, risk, and holding time.
Gamify Trades
Treating each trade as a game with states, actions, and scores to train RL agents.
Sine Wave Testing
Using a simple sine wave as a proxy price to validate RL learning before real markets.
Features
Input variables describing market state, such as OHLCV, indicators, time features, and position size.
Market State
The current set of features describing market conditions used as input to RL.
Action Space
The set of possible actions the agent can take, such as long, short, or hold.
Returns vs Prices
In ML for finance, returns are preferred as inputs because they are more stationary than prices.
Nonlinear Correlations
Relationships in data that are not captured by linear autocorrelation but can be learned by deep models.
Q-Learning vs Deep Q-Learning
Q-Learning uses a Q-table; Deep Q-Learning replaces the table with a neural network.
Pong (RL Example)
A classic reinforcement learning demonstration where an agent learns to play a simple game.
AlphaGo
A landmark reinforcement learning system that mastered the game of Go through self play and learning.