Ch5: Sequential Models

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/33

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 7:58 PM on 6/9/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

34 Terms

1
New cards

What is a sequence model?

A machine learning model that inputs or outputs sequences of data.

2
New cards

Give 3 examples of sequential data.

Text streams, audio clips, time-series data, video clips, weather data.

3
New cards

How is RNN different from a traditional neural network?

It output depends on prior elements in the sequence (has memory). Traditional NN inputs/outputs are independent.

4
New cards

Does RNN share parameters across layers?

Yes, parameters are reused/looped across each time step.

5
New cards

Name the 4 types of RNN architectures.

One to One, One to Many, Many to One, Many to Many

6
New cards

What type of RNN is used for image captioning?

One to Many (single image input → many words output)

7
New cards

What type of RNN is used for sentiment analysis?

Many to One (many words input → single sentiment output)

8
New cards

What type of RNN is used for machine translation?

Many to Many (many input words → many output words)

9
New cards

What type is used for music generation?

One to Many

10
New cards

What are the 3 main drawbacks of RNN?

Slow computation, short-term memory only, vanishing/exploding gradients

11
New cards

What is the vanishing gradient problem?

Gradients become very small during backpropagation through time, causing the model to stop learning.

12
New cards

What causes vanishing gradients?

Gradients < 1 multiplied repeatedly during chain rule → shrink exponentially.

13
New cards

What is the exploding gradient problem?

Gradients become very large and crash the model.

14
New cards

What happens when gradients are > 1?

They get exponentially larger and eventually blow up the model.

15
New cards

why does LSTM exist?

to prevent vanishing and exploding gradient problems.

16
New cards

What is the key difference between RNN and LSTM?

LSTM has a memory cell that can hold information for extended periods.

17
New cards

How many interacting layers does LSTM have?

4

18
New cards

Name the 3 gates in an LSTM cell.

Input gate, Forget gate, Output gate

19
New cards

What does the forget gate do?

Controls what information is removed from the memory cell.

20
New cards

What does the input gate do?

Controls what information is added to the memory cell.

21
New cards

What does the output gate do?

Controls what information is output from the memory cell.

22
New cards

Name 3 applications of LSTM.

Language translation, speech recognition, time series prediction, music generation, sentiment analysis.

23
New cards

How many gates does GRU have and what are they?

2 gates; update gate and reset gate.

24
New cards

Does GRU have a separate cell state (C)?

No. GRU only has hidden state (h).

25
New cards

What is the advantage of GRU over LSTM?

Simpler architecture → faster training time.

26
New cards

Does GRU solve the vanishing gradient problem?

Yes.

27
New cards

How does a bidirectional RNN differ from a standard RNN?

Standard RNN reads left to right; bidirectional reads left to right AND right to left.

28
New cards

When is bidirectional RNN useful?

When context from both directions is needed (e.g., sentiment analysis, speech recognition).

29
New cards

What two types of neural networks are used in image captioning?

CNN (encoder for images) + RNN/LSTM/Transformer (decoder for text)

30
New cards

What is the role of the encoder in image captioning?

Extracts visual features from the input image (using CNN like ResNet).

31
New cards

What is the role of the decoder in image captioning?

Generates caption text word by word (using RNN/LSTM/Transformer).

32
New cards

How does an LSTM generate new text?

Takes a seed sequence, predicts next character/word, appends it, repeats.

33
New cards

What loss function is commonly used for text generation training?

Categorical cross-entropy.

34
New cards

After training, what is fed back into the model during generation?

The predicted character/word becomes part of the input for the next prediction.