1/74
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is time-series data?
Data indexed in temporal order where dependence exists across observations.
What assumption distinguishes time-series from iid data?
Observations are not independent; temporal dependence exists.
What is the forecasting constraint in time series?
Only past and present information can be used to predict future values.
What is an AR(L) model?
A linear model where the current value is a linear combination of the previous L lagged values.
What is the role of lag length L?
Defines the memory window of the model.
What is the error term in AR models?
A stochastic disturbance capturing unexplained variation.
What type of relationships do AR models capture?
Only linear dependencies.
Why do AR models have high bias?
They impose a strict linear structure on the data.
Why does polynomial expansion in AR increase complexity?
It introduces interaction and nonlinear terms, increasing feature dimensionality exponentially.
What is the dimensionality issue with large L?
Number of predictors grows rapidly with lag length and interactions.
What is a feed-forward neural network in time series?
A mapping from fixed lag inputs to output using nonlinear transformations.
What is the structural limitation of feed-forward models for sequences?
They do not preserve temporal order beyond fixed inputs.
Why do feed-forward networks require fixed lag length?
Input dimension must be predefined.
Why do feed-forward models scale poorly with large L?
Parameter count increases with number of lagged inputs.
Why can feed-forward networks overfit in time series?
High parameterization increases estimation variance.
What is a Recurrent Neural Network (RNN)?
A neural network that processes sequential data using a recursive hidden state.
What is the recurrence relation in RNNs?
Current hidden state depends on current input and previous hidden state.
What is the hidden state?
A latent vector summarizing past sequence information.
What is the key advantage of weight sharing in RNNs?
Reduces parameter count and improves generalization. :contentReference[oaicite:0]{index=0}
What is the dimensionality benefit of RNNs?
Compresses long sequences into fixed-size hidden states.
How do RNNs differ from AR models?
They are nonlinear and sequential rather than fixed linear combinations.
How do RNNs differ from feed-forward networks?
They maintain memory across time steps.
What is the role of activation functions in RNNs?
Introduce nonlinearity and control numerical stability.
Why is tanh commonly used?
It bounds outputs between −1 and 1.
What is the effect of tanh on gradients?
Its derivative is less than or equal to 1, contributing to gradient shrinkage.
What is Backpropagation Through Time (BPTT)?
Gradient computation method that unfolds the RNN across time and applies chain rule.
Why must gradients be propagated through time?
Each hidden state depends recursively on previous states.
What is the computational cost of BPTT?
Scales with sequence length L.
What are vanishing gradients?
Gradients decay exponentially toward zero as they propagate backward.
What are exploding gradients?
Gradients grow exponentially and destabilize training.
What mathematical cause leads to vanishing gradients?
Repeated multiplication by values less than 1 (e.g., derivatives of tanh).
What mathematical cause leads to exploding gradients?
Repeated multiplication by values greater than 1.
What is the effect of vanishing gradients on learning?
Early time steps receive negligible updates.
What is the effect of exploding gradients on optimization?
Parameter updates become unstable and diverge. :contentReference[oaicite:1]{index=1}
What is the long-term dependency problem?
Inability to capture relationships between distant time steps.
Why do standard RNNs fail on long sequences?
Gradient signal deteriorates over time.
What is a Long Short-Term Memory (LSTM) network?
An RNN variant designed to preserve long-term dependencies using gated memory.
What is the cell state in LSTM?
A persistent memory vector that carries information across time.
Why is the cell state effective?
It enables near-linear gradient flow.
What are gates in LSTM?
Sigmoid-based mechanisms controlling information flow.
What is the range of gate outputs?
Values between 0 and 1.
What is the forget gate?
Controls how much past information is retained.
What is the input gate?
Controls how much new information is written.
What is the candidate state?
Proposed new content for the cell state using tanh transformation.
What is the update rule for cell state?
Combination of retained past and new candidate information.
Why does additive updating help gradients?
Avoids repeated multiplication, preventing decay.
What is the output gate?
Controls how much of the cell state is exposed as hidden state.
What is the relationship between hidden state and cell state?
Hidden state is a filtered version of the cell state.
What is a GRU (Gated Recurrent Unit)?
A simplified gated RNN that merges cell and hidden states.
What is the update gate in GRU?
Controls interpolation between previous state and candidate state.
What is the reset gate in GRU?
Controls how much past information contributes to candidate computation.
Why does GRU have fewer parameters than LSTM?
It combines multiple gates and removes separate cell state.
How do RNNs handle multiple predictors?
Inputs at each time step are vectors of features.
What happens to parameter count when sequence length increases in RNNs?
It remains constant due to weight sharing.
Why do feed-forward models have higher variance than RNNs for long sequences?
They require separate parameters for each lag.
What is autocorrelation?
Correlation between a variable and its lagged values.
Why is autocorrelation useful?
It indicates predictive structure in time series.
What is a lag window?
A fixed-length subsequence used as model input.
How are training samples constructed for RNNs?
By sliding a window over the time series to form input-output pairs.
Why can a single time series produce many training samples?
Each overlapping subsequence is treated as a separate observation.
What is sequence-to-one mapping?
A sequence input produces a single output.
What is sequence-to-sequence mapping?
A sequence input produces a sequence output.
How is text modeled in RNNs?
As sequences of word embeddings.
Why is padding required in text models?
To ensure uniform input length.
What is masking in sequence models?
Ignoring padded elements during forward and backward passes.
Bias-variance tradeoff in AR models?
High bias, low variance due to simplicity.
Bias-variance tradeoff in neural networks?
Low bias, high variance due to flexibility.
How do RNNs improve bias?
They model nonlinear temporal dependencies.
How do RNNs control variance?
Through parameter sharing across time steps.
Why can RNNs still overfit?
Large hidden state size increases model complexity.
What determines RNN model capacity?
Number of hidden units K and depth of sequence processing.
What is the tradeoff in choosing hidden size K?
Larger K reduces bias but increases variance.
What is the role of loss function in RNN training?
Measures prediction error across sequences.
Why is squared error commonly used?
It penalizes large deviations and is differentiable.
What is the training objective in RNNs?
Minimize loss over all sequences using gradient descent.