Intro to Large Language Models (LLMs)

0.0(0)

Studied by 0 people

0.0(0)

Call with Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/18

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No study sessions yet.

19 Terms

New cards

LLMs are a subset of AI that builds on Natural Language Processing (NLP). True or false?

true

New cards

Which architectural model was introduced in the research paper “Attention is all you need”?

Transformer model

New cards

What does fine-tuning mean?

Training a model with labeled data for specific tasks after pre-training.

New cards

What is a transformer?

A type of neural network that can handle sequences of data, such as a sentence with human language.

New cards

Where are the 6 components of a transformer embedded?

Within the encoder and decoder. Encoder is the part of the transformer that processes the information that the user provides as input. Decoder is responsible for providing the output.

New cards

What is the self-attention mechanism of a transformer?

The transformer looks at all the words and assigns an importance or attention score to them.

New cards

Explain positional encoding of a transformer.

Positional encoding is added to each word so that the AI system can ensure that it knows where the word was within the sentence.

New cards

What is the Multi-head Attention component of a transformer supposed to do?

The multi-head attention looks at all the most important elements, or heads, simultaneously.

New cards

What does the feed-forward neural network of a transformer do?

After understanding the context of a word, the system ensures that there is understanding before moving on. In other words, the ambiguity is fed forward through the neural network to be further processed and refined.

New cards

What does the Layer Normalization component of a transformer propose?

Proposes that we should ensure that all layers within a neural network have the same amount of input data and as a result, will be able to train itself based upon the data in a more efficient manner.

New cards

What is the purpose of the residual connection component of a transformer?

Certain layers can get bypassed, and the data can continue to move through the network by only focusing on the layers that matter.

New cards

What are the steps to prepare the input for processing?

Take the text input and break it down into a mathematical representation of the question that the transformer can understand. These are called tokens. Each word is converted into tokens, which captures the information about the sentence.
Each token is assigned a mathematical score through a process called embedding.
Positional encoding is added to the embeddings to give the model information about the position of each word in the sequence.

New cards

After the input is prepared for processing, what happens next?

The data now enters the encoder. Determine what to focus upon and calculate attention scores for each word in the sentence.
Recalculate each token (Occurs within the multi-head attention component by relying on the calculations of importance as determined by the self-attention mechanism).
Then, based upon the output from the attention scores, the transformer decides whether it needs to add in a residual connection or a shortcut to bypass a layer and/or normalize the layers, or ensure that each layer has the same amount of data being outputted per layer.
Finally, send the calculations to the neural network for processing the query from the user.

New cards

After encoding the tokens and sending the calculations to the neural network, what happens next?

The query enters the neural network and a match between the question and the data of the LLM is found.
Output goes to the decoder and the same steps are followed as the query went through the encoder.
Output is given to the user. The decoder’s output is passed through an algorithm that generates the probabilities for each word in the vocabulary. The word with the highest probability is chosen as the output word.

New cards

What are some examples of LLMs?

ChatGPT, Gemini, and Llama

New cards

What is a general definition of an LLM?

A type of artificial intelligence that uses machine learning techniques to understand, generate, and process human language.

New cards

LLM uses a combination of supervised and unsupervised training data. true or false?

true

New cards

What are some benefits of LLMs?

The ability to understand and generate text with its use of transformers.
More versatile and adaptable
More scalable and efficient: Can handle massive amounts of text data, enabling them to learn complex language patters and relationships.
Improved performance: shown promising results in tasks requiring common sense reasoning, such as understanding and responding to complex queries.
Driven new innovations: accelerating research in natural language understanding.

New cards

What are the limitations of LLMs?

Inconsistent responses in quality, relevance and accuracy.
Lack of task alignment: responses don’t align with the intended task or objective and fails to meet specific task requirements.
Contextual misunderstanding
Irrelevant or off-topic responses
Lack of control: models may lack control over generating specific types of responses, which result in undesired or inappropriate outputs.
Biased or inaccurate output
Inability to handle complex tasks
Over-reliance on training data
Lack of user customization