Time Series Analytics

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/32

There's no tags or description

Looks like no tags are added yet.

Last updated 6:57 PM on 7/9/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

33 Terms

New cards

Definition: Model Formulation

Intuition: How do different variables relate to an outcome

Definition: The core assumption is a linear relationship between a dependent variable Y and one or more independent variables X_i

Components:

Y: Dependent variable
X_i: Independent variables (predictors)
B_0, B_i, e: Intercept, coefficients, error term (residual

Used: The basis for models like CAPM, factor models, and many trading strategies

New cards

Definition: Ordinary Least Squares (OLS) Estimation - Matrix Form

Intuition: Finds the smallest error term by trying different coefficients (B)

Definition: Finds the coefficients B that minimize the Residual Sum of Squares

Components:

Data matrix X
Response vector y
Done using transformations and inverses

Uses: B is needed in linear regression model

New cards

Definition: OLS Assumption - Linearity

Intuition: The true relationship is linear

Definition: The coefficients ( $\beta$ ’s) can only appear as first degree multipliers. They can’t be squared, exponentiated, etc.

Components:

Coefficients B
Not observable random variable $\epsilon$

Uses: Make sure we don’t have a non-linear relationship

New cards

Definition: OLS Assumption - No multicollinearity

Intuition: The predictors are all independent of each other

Definition: X^TX is invertible. There is no perfect linear relationship between predictors.

Components:

Data matrix X
Data matrix X transposed (^T)
Matrix multiplication between the two is invertible

Uses: Prevents inflated standard errors and unstable coefficient estimates

New cards

Definition: OLS Assumption - Homoscedasticity

Intuition: There’s equal variance around the line if we use different data

Definition: The error variance is constant across all observations

Components:

Error term for each X (take the variance error term given X)
Ensure (1) equals variance ( $\sigma^2$ )

Uses: High-return periods that also have high volatility (OLS is already unbiased)

New cards

Definition: OLS Assumption - No autocorrelation

Intuition: Error terms (residual) are normally distributed

Definition: Errors are uncorrelated across observations

Components:

Covariance between two error terms given X
(1) is equal to zero
The two error terms aren’t the same

Uses/Importance: Common in time series data (e.g., momentum strategies). If we have any clue what the next error may look like, there’s still some structure in the data that our model failed to capture.

New cards

Definition: OLS Assumption - Maximum Likelihood Estimator

Intuition: The OLS estimator could be the MLE depending if the errors are a bell curve around 0

Definition: If we add the assumption that $\epsilon$ ~ N(0, $\sigma^2$ ), the OLS estimator is also the MLE

Components:

First five assumptions (Linearity, exogenous, no multicollinearity, homoscedasticity, no autocorrelation)
Error term is normally distributed

Uses: Let’s us use a t-test and f-test (unless CLT exists for this)

New cards

Definition: OLS Assumption - Strictly Exogenous

Intuition: Error term (true range of a coefficient such as $\beta_1$ ) always has expected value of 0 no matter the value of the independent variables

Definition: The error term is uncorrelated with the predictors

Components:

The expectation of each error term given the data set X is equal to zero

Uses: Crucial violation in finance, can lead to biased estimators

New cards

Definition: $R^2$

Intuition: How much variance in the result is explained by the data matrix

Definition: Proportion of the variance in Y that is predictable from X

Components:

1 - (RSS / TSS)

RSS: Residual Dum of Squares (variation in error between observed data and modeled values)
TSS: Total Sum of Squares (variation in the observed data)

Uses: Compare models with same number of predictors

New cards

Definition: Adjusted $R^2$

Intuition: How much variance in the result is explained by the data matrix, prioritized models with less irrelevant predictors

Definition: Proportion of the variance in Y that is predictable from X

Components:

1 - ( (RSS / (m - p - 1)) / ( TSS / (m - 1) ) )

RSS and TSS
Number of predictors (p)

Uses: Compare models with different numbers of predictors, lower is better

New cards

Definition: Standard Error (SE) of $\beta_{i}$

Intuition: How much would this estimate jump around if it got lucky/unlucky with which data I happened to sample?

Definition: Estimated standard deviation of a parameter estimate

Components:

Square root of variance of coefficient

Uses: Construct confidence intervals and perform hypothesis tests on individual coefficients

New cards

Definition: t-statistic $\beta_{i}$

Intuition: if the true $\beta_{i}$ were actually 0, how far away is our estimate of $\beta_{i}$ from 0? That distance is the t-statistic, and a high one means a low probability that $\beta_{i}$ is actually 0. Tests each in isolation.

Definition: the ratio of the difference in a number’s (coefficient’s) estimated value from its assumed value (0) to its standard error

Components:

t = ( $\beta$ / SE( $\beta$ ))

Uses: test null hypothesis H0: $\beta_{i}$ = 0. Follows a t-distribution with m-p-1 degrees of freedom

New cards

Definition: F-statistic

Intuition: Are all of the $\beta_{i}$ jointly eqial to zero?

Definition: Ratio that compares explained variance per parameter to unexplained variance per remaining degree of freedom

Components:

Numerator: How large the sum of squared residuals becomes in %
Denominator: Accounts for sampling variability

Uses: Test against multicollinearity. If $x_1$

and $x_2$ are highly correlated, the model can’t tell whose “credit” the effect on y belongs to, so each SE inflates, each t-stat shrinks, and neither $\beta_1$ or $\beta_2$ look significant alone. But together they explain a lot of variation in y, so F-test rejects.

New cards

Definition: Ridge Regression

Intuition: Improves prediction by shrinking coefficient magnitudes to reduce variance at the cost of introductions some bias

Definition: Regularized linear regression that minimizes squared errors plus an L2 penalty on the coefficients

Components:

Loss function: RSS measuring fit to the data
L2 penalty: Squared magnitude of coefficients that discourages large weights
Regularization parameter (lambda): Controls strength of coefficient shrinkage

Uses: Handles multicollinearity and improve out of sample performance in high dimensional regressions

New cards

Definition: Lasso Regression

Intuition: performs both shrinkage and variable selection by forcing some coefficients exactly to zero

Definition: regularized linear regression that minimizes squared errors plus an L1 penalty on the coefficients

Components:

Loss function: residual sums of squared capturing model fit
L1 penalty: Absolute values of coefficients that promote sparsity
Regularization parameter (lambda): Determines shrinkage and variable elimination

Uses: Feature selection and choosing when predictors may be irrelevant

New cards

Definition: Bias

Intuition: Error from approximating a real-world function with a simpler model (underfitting)

Definition: Error when the expected value of an estimator does not equal true parameter value

Components:

True parameter: The actual coefficient values generating the data
Estimator expectation: Average value of the estimated coefficients across samples
Model constraints: Assumptions or regularization that distort the estimator toward simpler models

Uses: Understand the bias-variance trade off and to justify regularization methods like ridge and lasso

New cards

Definition: Variance

Intuition: Error from model being too sensitive to training data (overfitting)

Definition: Expected squared deviation of a model’s prediction from its own average prediction across different training datasets

Components:

Training sample randomness: Different datasets drawn from the same process lead to different fitted models.
Estimator instability: Sensitivity of coefficients or prediction to changes in the data

Uses: Understand overfitting risk and to motivate regularization methods that stabilize model estimates

New cards

Definition: Bias-Variance Tradeoff

Intuition: finding the middle ground of complex or simple a model should be

Definition: finding the optimal balance between complex models

Components:

Complex model (high degree polynomial): low bias but high variance (overfitting)
Simpler model (OLS): high bias but low variance (underfitting)

Uses: Obtain the least amount of prediction error

New cards

Definition: Decision Tree (Regression)

Intuition: A sequence of if-else rules that split the data into groups, where each group predicts the average outcome of the observations inside it.

Definition: Partitions the feature space into a set of non-overlapping regions. For any given observation, the model predicts the mean of the response values of the training points that fall into the same region.

Components:

Splits: At each node, the algorithm chooses a feature and a split point. The goal is to make the response values in the resulting child nodes as similar as possible.
Splitting criterion: For regression, splits are chosen to minimize squared error, usually measured by RSS or MSE. In short, we split it where it reduces variability the most.
Leaves: Each terminal node outputs a constant prediction, equal to the mean of the response values in that node.
Training: The tree is built greedily, one split at a time, choosing the locally best split at each step

Uses: Nonlinear regression, baseline model before moving onto more complex ones

New cards

Decision Tree Pros

Easy to interpret (white-box model)
- Start at root, ask a series of yes/no questions, end in a lead with a fixed numerical prediction (the mean)
- You can interpret locally and don’t need to understand entire model, just path your observation took
Can handle non-linear relationship
- Represents the response function as a piece-wise constant function over a partitioned feature space
  - Instead of fitting one smooth curve everywhere, the tree says “in this part of the space, behave this way; in another part, behave differently.”

New cards

Decision Tree Cons

High variance (small changes in data can lead to a very different tree)
Prone to overfitting
Generally lower predictive accuracy than ensemble methods

New cards

Ensemble Methods

Intuition: Reduces variance and bias

Definition: Combines multiple individual decision trees

Types:

Bagging (bootstrapping aggregation)
Random forests (improve bagging)
Boosting (sequentially building trees)

Uses: Improve overall predictive performance and robustness

New cards

Bagging (Bootstrapping Aggregating)

Intuition: Decision trees are noisy - small changes in the data can change them a lot. Bagging reduces that noise by training many trees on slightly different versions of the data and then averaging their predictions

Definition: A technique that reduces the variance of a learning algorithm by repeatedly resampling the training data with replacement, fitting the model on each resample, and aggregating the predictions

Components:

Bootstrapped samples: Each model is trained on a dataset created by sampling with replacement from the original data
Base learner (decision trees): Uses full, unpruned decision trees, which are high-variance learners
Aggregation: predictions are averaged (regression)
Out-of-bag-error: Each tree sees 2/3 of the data; the remaining 1/3 can be used to estimate test error without cross-validation

Uses: Used primarily to reduce variance in unstable models like decision trees

New cards

Random Forests

Intuition: Improve on bagging by making individual tress less similar. They do this by randomly limiting which features a tree is allowed to consider at each split. so different trees learn different structures.

Definition: An ensemble of decision trees trained on bootstrapped samples, where each split considers only a random subset of predictors. The final prediction is the average of the individual tree predictions.

Components:

Bootstrap sampling: same as bagging - each tree is trained on a resampled dataset
Random feature selection: At each split, only m out of p predictors are considered
Hyper parameter m: controls how aggressive the decorrelstion is
- Smaller m → more randomness → lower correlation → lower variance
Aggregation: Predictions are averaged over trees
Feature importance: R.F.s naturally reduce variable importance measures by tracking how much splits reduce errors across trees

Uses: When you need a strong, general-purpose regression model that works well when relationships are nonlinear, interactions are important, and interpretability is secondary to performance

New cards

Boosting

Intuition: Reduces bias by fitting weak learners sequentially, where each learner is trained to correct the residual errors of the current ensemble

Definition: Builds models sequentially by fitting new trees to the residuals of the current model, with each tree’s contribution scales by a learning rate to control overfitting.

Components:

Residuals: The errors made by the current ensemble
Weak learners: Typically small, shallow trees
Learning rate (gamma): controls how much each new tree contributes to the ensemble
- Smaller learning rate → slower learning → better generalization
Number of trees: controls model complexity

Uses:

Often used when maximum predictive accuracy is required and interpretability is secondary

New cards

Definition: p-value

Intuition: If the true slope were actually zero, how surprising would this result be?

Definition: Measures how unlikely the observed t-statistic for the coefficient is if the true coefficient were zero.

Components:

Null Hypothesis $H_0$ : Typically $\beta_1$ = 0, meaning no linear relationship between predictor and response
Test statistic (t-statistic): how many standard errors away from zero is the estimate

Uses: if p-value is less than 0.05, observed effect is unlikely if coefficient = 0 and we can reject null hypothesis

New cards

Definition: Stationarity

Intuition: A time series is stationary if its statistical behavior does not change over time.

Definition: Stationary if its mean, variance, and autocovariance structure do not depend on time.

Components:

Constant mean: no drift/trend
Constant volatility: variability stays the same and won’t explode
Autocovariance depends on lag: Dependence between today and tomorrow is the same regardless of when today occurs (yesterday affects today the same way in 2010 as in 2025)

Uses: ARMA and ARIMA models which assume this, and forecasting (stable prediction)

New cards

Confidence Interval

Intuition: We have an estimate from a sample, but samples are noisy. Therefore we need a range of plausible values for the true parameter in the form of a confidence interval.

Definition: Over repeated sampling, a confidence interval contains the true parameter 100(1-a)% of the time.

Components:

$\overline{X}$ or $\beta$
SE: standard error
z or t to determine confidence level

Uses:

Quantify uncertainty around an estimate
Hypothesis testing (e.g. check if 0 is outside coefficient’s CI)
Compare precision of different estimates

New cards

Method of Moments

Intuition: Assume data is a well known distribution (normal, poisson, binomial, etc). Estimate parameter values by matching what we observe in samples to what the distribution predicts.

Definition: Estimates unknown parameters by setting samples moments equal to population moments and solving for parameters

Parts:

Common moments are mean, variance, skewness, and kurtosis

Uses:

Estimate unknown distribution parameters
Simple alternative to MLE

New cards

Hypothesis Testing

Intuition: Want to determine if our model is “good”

Components:

Null hypothesis: Default assumption is TRUE (ex. $\mu=0$ )
Alternative hypothesis: alternative interpretation

Uses:

Do not prove alternative hypothesis, either reject or fail
Approach: test probability of observing samples assuming null hypothesis is true
If sufficiently low: reject null hypothesis

New cards

Type 1 and Type 2 Errors

Type 1: Rejecting a true null hypothesis

Type 2: Not rejecting a false null hypothesis

New cards

Residual Sum of Squared

Intuition: The sum of distance between actual and predicted values, squared.

Components:

The summation of:
- (Actual - predicted)²

Uses:

This is what is minimized in gradient descent

New cards

MSE (mean squared error)

Definition: RSS divided by number of observations

Components:

RSS equation
Divided by N

Uses:

While RSS is what gets minimized, MSE is meaningful to compare across different models