1/73
Flashcards to help review key concepts, facts, and details from a lecture on Unit-IV Recurrent Neural Networks.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Recurrent Neural Network (RNN)
A type of neural network that saves the output of a layer and feeds it back to the input to predict the layer's output; excels in tasks where the order of sequence matters due to its memory function.
Sequence Importance
Tasks in which the order of sequence is critical; the arrangement of words defines their meaning, evident in time series data where time defines the occurrence of events.
Recurrent Neural Network
A network with access to prior knowledge about data, designed to understand data where sequence matters.
Traditional Neural Networks vs. RNNs
In traditional neural networks, inputs and outputs are independent of each other; in a deeper network, multiple hidden layers are present.
Feed-Forward Network
Information flows only in the forward direction, from input nodes to output nodes, with the help of hidden layer nodes; there are no cycles/loops in the network.
Limitations of Feed-Forward Networks
Cannot be used to handle sequential data, considers only the current state for prediction, cannot memorize previous inputs, and has no memory.
Recurrent Neural Networks (RNNs)
Perform the same task for every element of a sequence, with the output being dependent on previous computations; have a memory that stores information about what has been calculated so far.
Parameter Sharing in RNNs
RNNs reduce the complexity of parameters; the number of layers equals the number of words in a sequence; output from a previous step is used as input to the current step.
RNNs for Sequence Learning
Ensures that the output of the next state depends on the previous state, deals with variable lengths of inputs, and the function executed at each time step is the same.
Parameter Sharing in RNN
Used to show the relation between xi, hi-1 and hi, with weights whh, whx, why, and bias b.
Tasks of RNNs
For each timestamp of the input sequence x, predict output y synchronously, or predict a scalar value of y at the end of the sequence; RNNs can take one or more input vectors and produce output vectors.
Output Calculation in RNNs
Outputs are calculated not only by weights applied on inputs like in a regular NN, but also by a vector representing the content based on prior inputs/outputs.
Training Through RNNs
First, words are transformed into machine-readable vectors; then, the RNN processes the sequence of vectors one by one; the current state becomes the 'ht-1' for the next time step.
RNN Applications
Sentiment classification, video classification, part-of-speech tagging, image captioning, machine translation.
Back Propagation Through Time (BPTT)
Applies backpropagation training algorithm to RNNs with sequence data like a time series to obtain parameters that optimize the cost function.
Vanishing and Exploding Gradients
The gradient is a product of many terms; if all terms are very small, the gradient will vanish; if all terms are very large, the gradient will explode.
Limitations of RNNs
Due to the vanishing gradient problem, RNNs are limited, and there is no fine control over which part of the context needs to be carried forward or forgotten.
Short-Term Memory
RNNs suffer from short-term memory; if a sequence is long enough, they won't carry information from earlier timestamps to later ones.
Long Short-Term Memory (LSTM) Networks
Special kinds of RNNs capable of learning long-term dependencies.
LSTM Structure
LSTM contains interacting layers in different memory blocks called cells.
Gates in LSTM
Input gate, forget gate, output gate - neural networks that decide which information is allowed on the cell state; they learn what information is relevant to keep or forget.
Forget Gate Layer
Involves deciding what information to throw away from the cell state using a sigmoid layer (forget gate layer).
Input Gate Layer
Decides what new information to store in the cell state, done by input gate with a sigmoid layer and a tanh layer.
Updating Cell State
Multiply Ct-1 by ft (forgetting things we decided), and add it to the new candidate values, scaled by how much we decided to update each state value.
Calculation of Output
Based on the cell state; a sigmoid layer (output gate layer) decides what parts of the cell state go to the output.
Gated Recurrent Unit (GRU)
Similar to LSTM but without the memory cell state; uses a hidden state to transfer information and has only two gates: a reset gate and an update gate.
Update Gate
Acts similarly to the forget and input gates of LSTM; decides what information to throw away and what new information to add.
Reset Gate
Decides how much past information to forget
Machine Translation
Includes automated translation software that translates text from one natural language to another.
Models Used for Machine Translation
Sequence-to-sequence models such as Encoder-Decoder and Attention Models.
Encoder Network
Built as an RNN (GRU or LSTM) to process an input sequence, and outputs a vector that represents the input sequence.
Decoder Network
Trained to output the translation one word at a time until it outputs the whole output sequence.
Conditional Language Model
Used for a system that translates English to Hindi; instead of modeling the probability of any sentence, it is shaping the likelihood of the output conditioned on some input sentence.
Beam Search
An algorithm that selects multiple alternatives for an input sequence at each timestamp based on conditional probability; the number of alternatives depends on a parameter called beam width B.
BLEU Score
A score for comparing a candidate translation of text to one or more reference translations; a metric ranging from 0 to 1, where a perfect match results in a lossless score.
Benefits of BLEU Score
Quick, inexpensive to calculate, easy to understand, language-independent, and correlates highly with human evaluation.
BLEU Score - Text String Matches
Based on 'text string matches'.
Overcoming High Precision
Clipped count & modified N-gram precision to overcome this problem
Beam Search Summary
Considers multiple best options based on beam width using conditional probability; higher beam width gives better translation but uses more memory and computational power.
Attention
Implies directing focus at something or concentrating on one or a few things while ignoring others.
Encoder - Decoder RNN/LSTM
Processes the entire input sentence and encodes it into a context vector, which is the last hidden state of LSTM/RNN.
Attention Mechanism
To solve this problem, the attention mechanism is used with Encoder-Decoder DNN/LSTM
Context Vector
States that contains a good summary of the initial hidden state.
Good RNNS
Imply those are good for Understading long sentances.
Attention is Proposed as a solution
encoding the input sequence to a one from vector which to decode each output of time step.length vector.
Using attention Mechanism encoder and decoder model .
With Attention mechanis.
Reinforcement Learning
Computer/software agent learns to perform a task through trial and error interactions with a dynamic environment.
Agent-Environment Interaction
One of many states of the environment, and chooses to take one of many actions to switch from one state to another.
Goal of Reinforcement learning
Smart actions to maximize cumulative awards.
Reinforcement Learning Used in
Video games, computer games.
Action
The move that an Agent makes in a given stae in the environment.
Policy
The strategy the agent employs to perform its actions based on the curretn State.
MDP(Markav decision Process)
A mathematical framework that can solve most Reinforcement learning problems with discrete actions.
MDP Agent Chooses Action
Process is in some state St,then The agent may choose any action (at & A) available in that state.
Markov porcess/ Discrete Weather forecast
Also may known as Discretetine
Rewards
Function rewards an agent for taking the right actiond; Punishes ( with negative records) for wrong actions.
Dsicount Facror
Help to evaluate expeted reward to the advantages/disadvantages of each state
Belman Equation
Helps to solve Markov decision process. or We can say it helps in finding potimal policy & volue function
Bellman Equation/Valve Functions
Each State is associated with a value function(S), also it id equal to (R/St=S)
Bellman Equation Goal
Helps to predict/value of the given state to able to find value/expectively.
In other view Belman Equation
Also can says at each Subsequant what long tern reward
Value Iteraction
Update value of reach state repeadiatly util the volues become stable.
Iteractice Policy
First tale a random policy, then evaluate it and improve it.
Value Iteraction/ Policy Iteration/Compareation
Approach
Function/ value-policy compare
Updates the value functions interavtivelly/ Aleternates bestweem policy evaluations improvements
Actor and critic function
Both the critic and action dunction are Parameterized with neural network.
Q-Learning
It is off-pollcy reinforcement dearning algorithm that seels to find the best action to take given given(state).",
Q-learning consideed off-policy
Learning functions learns of actions that are outside the current policy like raldom action
Main Goal of Q. table -function
The best (max )table we get / also maximize function
Q-learning -table
Also may called action function iteratively improve action
approimate action ( TD)
The basic idea od that is Time Differnence learning
values (also Action values) to
Learning values also helps to improve learning Agents.
SARSA Modified Q (state) (State Action) Learning
Also call (Q) = [Qt, at] [ht,a
SARSAl policy -based
Updates current state actiom,rewards, new state action base.