L12 - Deep Learning for Time Series Forecasting

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/74

There's no tags or description

Looks like no tags are added yet.

Last updated 7:36 PM on 4/14/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

75 Terms

New cards

What is time-series data?

Data indexed in temporal order where dependence exists across observations.

New cards

What assumption distinguishes time-series from iid data?

Observations are not independent; temporal dependence exists.

New cards

What is the forecasting constraint in time series?

Only past and present information can be used to predict future values.

New cards

What is an AR(L) model?

A linear model where the current value is a linear combination of the previous L lagged values.

New cards

What is the role of lag length L?

Defines the memory window of the model.

New cards

What is the error term in AR models?

A stochastic disturbance capturing unexplained variation.

New cards

What type of relationships do AR models capture?

Only linear dependencies.

New cards

Why do AR models have high bias?

They impose a strict linear structure on the data.

New cards

Why does polynomial expansion in AR increase complexity?

It introduces interaction and nonlinear terms, increasing feature dimensionality exponentially.

New cards

What is the dimensionality issue with large L?

Number of predictors grows rapidly with lag length and interactions.

New cards

What is a feed-forward neural network in time series?

A mapping from fixed lag inputs to output using nonlinear transformations.

New cards

What is the structural limitation of feed-forward models for sequences?

They do not preserve temporal order beyond fixed inputs.

New cards

Why do feed-forward networks require fixed lag length?

Input dimension must be predefined.

New cards

Why do feed-forward models scale poorly with large L?

Parameter count increases with number of lagged inputs.

New cards

Why can feed-forward networks overfit in time series?

High parameterization increases estimation variance.

New cards

What is a Recurrent Neural Network (RNN)?

A neural network that processes sequential data using a recursive hidden state.

New cards

What is the recurrence relation in RNNs?

Current hidden state depends on current input and previous hidden state.

New cards

What is the hidden state?

A latent vector summarizing past sequence information.

New cards

What is the key advantage of weight sharing in RNNs?

Reduces parameter count and improves generalization. :contentReference[oaicite:0]{index=0}

New cards

What is the dimensionality benefit of RNNs?

Compresses long sequences into fixed-size hidden states.

New cards

How do RNNs differ from AR models?

They are nonlinear and sequential rather than fixed linear combinations.

New cards

How do RNNs differ from feed-forward networks?

They maintain memory across time steps.

New cards

What is the role of activation functions in RNNs?

Introduce nonlinearity and control numerical stability.

New cards

Why is tanh commonly used?

It bounds outputs between −1 and 1.

New cards

What is the effect of tanh on gradients?

Its derivative is less than or equal to 1, contributing to gradient shrinkage.

New cards

What is Backpropagation Through Time (BPTT)?

Gradient computation method that unfolds the RNN across time and applies chain rule.

New cards

Why must gradients be propagated through time?

Each hidden state depends recursively on previous states.

New cards

What is the computational cost of BPTT?

Scales with sequence length L.

New cards

What are vanishing gradients?

Gradients decay exponentially toward zero as they propagate backward.

New cards

What are exploding gradients?

Gradients grow exponentially and destabilize training.

New cards

What mathematical cause leads to vanishing gradients?

Repeated multiplication by values less than 1 (e.g., derivatives of tanh).

New cards

What mathematical cause leads to exploding gradients?

Repeated multiplication by values greater than 1.

New cards

What is the effect of vanishing gradients on learning?

Early time steps receive negligible updates.

New cards

What is the effect of exploding gradients on optimization?

Parameter updates become unstable and diverge. :contentReference[oaicite:1]{index=1}

New cards

What is the long-term dependency problem?

Inability to capture relationships between distant time steps.

New cards

Why do standard RNNs fail on long sequences?

Gradient signal deteriorates over time.

New cards

What is a Long Short-Term Memory (LSTM) network?

An RNN variant designed to preserve long-term dependencies using gated memory.

New cards

What is the cell state in LSTM?

A persistent memory vector that carries information across time.

New cards

Why is the cell state effective?

It enables near-linear gradient flow.

New cards

What are gates in LSTM?

Sigmoid-based mechanisms controlling information flow.

New cards

What is the range of gate outputs?

Values between 0 and 1.

New cards

What is the forget gate?

Controls how much past information is retained.

New cards

What is the input gate?

Controls how much new information is written.

New cards

What is the candidate state?

Proposed new content for the cell state using tanh transformation.

New cards

What is the update rule for cell state?

Combination of retained past and new candidate information.

New cards

Why does additive updating help gradients?

Avoids repeated multiplication, preventing decay.

New cards

What is the output gate?

Controls how much of the cell state is exposed as hidden state.

New cards

What is the relationship between hidden state and cell state?

Hidden state is a filtered version of the cell state.

New cards

What is a GRU (Gated Recurrent Unit)?

A simplified gated RNN that merges cell and hidden states.

New cards

What is the update gate in GRU?

Controls interpolation between previous state and candidate state.

New cards

What is the reset gate in GRU?

Controls how much past information contributes to candidate computation.

New cards

Why does GRU have fewer parameters than LSTM?

It combines multiple gates and removes separate cell state.

New cards

How do RNNs handle multiple predictors?

Inputs at each time step are vectors of features.

New cards

What happens to parameter count when sequence length increases in RNNs?

It remains constant due to weight sharing.

New cards

Why do feed-forward models have higher variance than RNNs for long sequences?

They require separate parameters for each lag.

New cards

What is autocorrelation?

Correlation between a variable and its lagged values.

New cards

Why is autocorrelation useful?

It indicates predictive structure in time series.

New cards

What is a lag window?

A fixed-length subsequence used as model input.

New cards

How are training samples constructed for RNNs?

By sliding a window over the time series to form input-output pairs.

New cards

Why can a single time series produce many training samples?

Each overlapping subsequence is treated as a separate observation.

New cards

What is sequence-to-one mapping?

A sequence input produces a single output.

New cards

What is sequence-to-sequence mapping?

A sequence input produces a sequence output.

New cards

How is text modeled in RNNs?

As sequences of word embeddings.

New cards

Why is padding required in text models?

To ensure uniform input length.

New cards

What is masking in sequence models?

Ignoring padded elements during forward and backward passes.

New cards

Bias-variance tradeoff in AR models?

High bias, low variance due to simplicity.

New cards

Bias-variance tradeoff in neural networks?

Low bias, high variance due to flexibility.

New cards

How do RNNs improve bias?

They model nonlinear temporal dependencies.

New cards

How do RNNs control variance?

Through parameter sharing across time steps.

New cards

Why can RNNs still overfit?

Large hidden state size increases model complexity.

New cards

What determines RNN model capacity?

Number of hidden units K and depth of sequence processing.

New cards

What is the tradeoff in choosing hidden size K?

Larger K reduces bias but increases variance.

New cards

What is the role of loss function in RNN training?

Measures prediction error across sequences.

New cards

Why is squared error commonly used?

It penalizes large deviations and is differentiable.

New cards

What is the training objective in RNNs?

Minimize loss over all sequences using gradient descent.