1/27
This has all the nlp data
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Transformers
This combines encoder + decoder
Encoder
This help in text classification , sentiment analysis
Ex : BERT - bidirectiobnal encoder representation transformer
This cannot generate text.
Decoder only
They are used for generation of text.
Ex GPT - generative Pretrained Model
What is sequntial 2 Sequntial model
Encoder + decoder
This is used for language translation
Ex BART - bidirectional auto regressive transformer
Good for summarization also
What is key Difference between encoder and decoder
Prompt
Context embedding
This has contextual words . This is bidirectional.
Self Attention vs cross Attention
Feature | Self-Attention | Cross-Attention |
---|---|---|
Definition | Focuses on different parts of the same input sequence. | Focuses on different parts of another sequence. |
Usage | Commonly used in encoder and decoder layers of transformers. | Typically used in decoder layers of transformers. |
Input | Single sequence (e.g., a sentence). | Two sequences (e.g., a query and a context). |
Purpose | Captures relationships within the same sequence. | Captures relationships between two different sequences. |
Example | Understanding word dependencies in a sentence. | Aligning a question with a relevant context passage. |
Decoder is used for
This is used for prediction of next token
Ex Akhil went to MYR ← here MYR is predicted
BERT
Bidirectional encoder
this is developed by google
used for text classification , next word predciction , Q&A
Ex akhil love amazon forest.
anesh work in amazon
Self attention mechanism
akhil is working in CTS. he is working as DS.
He is akhil ← self attention
Bert Fine tuning
Training the preexisting model.
Bert is trained on
Mask language model
Sentence Prediction
Mask language modeling in Bert
This mask few randomness token.
this work in 1 direction but transformer work in both direction.
Mask language model is Classified model
Auto Recursive - unidirectional ex Open Ai + Summerization (ARLM)
Auto Encoding - Bidirectional ex Transformers
sentance prediction in bert
BERT is given pairs of sentences. For each pair, it predicts whether the second sentence follows the first sentence in the original text. This helps BERT understand the relationship between sentences, which is crucial for tasks like question answering and natural language inference.
Roberta - Robustly optimized bert approch
Best for large data set here performance is more.
Training time is more & Big Vocabulary size
Distil Bert
here model size is low & speed is High
accuracy is low
It is same architecture as Roberta but encoder is less
Alberta
alite Bert
has less performance , parameter , accuracy
Speed is 1.7 high then bert
Cross Parameters Sharing
reduce the number of parameters in decoder
cross Parameters sharing types
feed forward share : parameter are share only feed forward
Multihead attention : reduces the parameters
all parameter : all parameters are shared
Self attention mechanism also called as
Infra attention
This allow to focus on most relevant part of info
what is most used activation function in multihead attention
RELU
how do we reduce the vanishing gradient decent in the multihead attention
Normalization
Attention mechanism
This focus on particular sentance of input sequence
Scaled by QKVV dot product & scaled by squareroot passed through softmax for weight.
Q K V full form
Query , key , Value
contextual windowing vs postional encoding
Feature | Contextual Windowing | Positional Encoding |
---|---|---|
Definition | Divides input into smaller, manageable windows or chunks. | Adds positional information to each token in the sequence. |
Purpose | Helps manage long sequences by focusing on smaller parts. | Helps the model understand the order of tokens in a sequence. |
Usage | Often used in models dealing with long texts or sequences. | Used in transformer models to retain sequence order. |
Input Handling | Processes chunks independently or with limited overlap. | Processes the entire sequence with positional context. |
Example | Splitting a long document into paragraphs for analysis. | Encoding positions of words in a sentence to maintain order. |