Oracle Gen AI - Fundamentals of Large Language Models (Module 2)

0.0(0)

Studied by 2 people

0.0(0)

Call with Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/43

Earn XP

Description and Tags

Flashcards to practice knowledge on LLM basics

Computer Science

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No study sessions yet.

44 Terms

New cards

What is an LLM?

LLM stands for “Large Language Model“. Language models are probabilistic models of text, that generate a probability distribution of words in its vocabulary being the next word, based on the input given. “Large“ is in reference to the # of parameters and it is subjective. No agreed-upon threshold.

New cards

What is encoding?

Embedding text. Encoders are models that convert a sequence of words to an embedding (vector representation).

New cards

What are some examples of encoders?

MiniLM, Embed-light, BERT, RoBERTa, DistillBERT, SBERT

New cards

What is decoding?

Text generation. Models that take in a sequence of text and output the next word. Happens iteratively (1 word at a time). Emits 1 word based on vocabulary distribution and it is appended to the original input and the process repeats.

New cards

What are some examples of decoders?

GPT-4, Llama, BLOOM, Falcon

New cards

What is an encoder-decoder?

Models that take in a sequence of words, encode the sequence of words, and use the encoding to output the next word.

New cards

What are some examples of encoder-decoders?

T5, UL2, BART

New cards

Which model(s) is used for embedding text?

Encoders

New cards

Which model(s) is used for abstractive QA (generating answers by paraphrasing and summarizing information)?

Decoders and encoder-decoders.

New cards

Which model(s) is used for extractive QA (generating answers directly from text extracts)?

Encoder-decoders and maybe decoders

New cards

Which model(s) is used for translation?

Encoder-decoders and maybe decoders

New cards

Which model(s) is used for creative writing?

Decoders

New cards

Which model(s) is used for abstractive summarization (summarizing the main ideas of a text using words/phrases not in the original text)?

Decoders and encoder-decoders

New cards

Which model(s) is used for extractive summarization (selecting important sentences/phrases from te original text to summarize it)?

Encoders, encoder-decoders, and maybe decoders

New cards

Which model(s) is used for chatting (like chat-gpt)?

Decoders

New cards

Which model(s) is used for generating forecasting (making predictions based on historical data & trends)?

None

New cards

Which model(s) is used for generating code?

Decoders and encoder-decoders

New cards

What is prompting?

The practice of changing the prompt to a model to affect the probability distribution of the words in its vocabulary. (Simplest method).

New cards

What is a prompt?

The text provided to an LLM as input, sometimes containing instructions/examples.

New cards

What is prompt engineering?

The process of iteratively refining a prompt to elicit a particular style of response. Not always going to work but has been fairly successful in working.

New cards

What is in-context learning?

Conditioning (prompting) an LLM with instruction and demonstrations of the tasks it’s meant to complete.

New cards

What is K-shot prompting?

Explicitly providing k examples of the intended task in the prompt.

New cards

What is chain-of-thought prompting?

Prompting the LLM to show it’s intermediate reasoning steps that helped it arrive at its answer.

New cards

What is least-to-most prompting?

Prompt the LLM to decompose the problem into subproblems, solve easy first, then build up.

New cards

What is step-back prompting?

Prompt the LLM to identify the high-level concepts relevant to the problem/task.

New cards

What is prompt injection?

Deliberately providing an LLM with malicious inputs, that has it ignore instructions, cause harm, or behave contrary to deployment expectation.

New cards

What is memorization?

After answering, repeat the original prompt.

Leaked prompts
Leaked private information from training.

New cards

When may prompting alone be inappropriate?

If training data exists or domain adaption is required.

New cards

What is domain adaption?

Adapting a model (via training) to enhance its performance outside of the domain/subject-area it was training on.

New cards

What is fine-tuning training style?

We take a pre-trained model (like BERT) and a labeled dataset for a task that we care about and train the model to perform the task by altering all of its parameters. (Very expensive)

New cards

What is parameter efficient fine-tuning training style?

Isolate a very small subset of the model’s parameters to train or add a handful of new parameters to the model.

New cards

What is soft prompting training style?

Add parameters to the prompt – adding very specialized “words“ that is inputted to the model to queue it to perform specific tasks. Soft prompts are learned. Specialized words are initialized randomly/iteratively fine-tuned during training.

New cards

What is continual pre-training training style?

Similar to fine-tuning, changing all parameters in the model (expensive), but does not require label data. Instead of prediciting specific labels, we feed the model any kind of data that we have and ask the model to continually predict the next word.

New cards

For the four training style specify:

How many parameters are modified
Whether or not the data provided in training is labeled/task-specific
Summary of what it does

Fine-tuning:

All parameters
Labeled, task-specific
Classic ML training

Parameter Efficient Fine-tuning:

Few, new parameters
Labeled, task-specific
Other params can be learned rather than provided

Soft Prompting:

Few, new parameters
Labeled, task-specific
Prompts can be learned rather than provdied. Learnable prompt params.

Continual pre-training:

All parameters
unlabeled
Same as LLM pre-training

New cards

Greedy decoding

At each step of picking a word, select the word in the vocabulary with the highest probability

New cards

Non-deterministic decoding

Pick randomly amon high probability candidates at each step

New cards

Temperature

Hyper parameter that modulates the distribution over vocabulary. Increasing temperature causes the model to deviate from greedy decoding and flattens the distribution. Decreasing causes the distribution to accumulate near the likely word.

New cards

Hallucination

Generated text that is non-factual/ungrounded. The text sounds logical or sensible but it’s actually wrong.

New cards

Can we keep an LLM from hallucinating?

Methods like retreival-augmentation claim to reduce hallucination but there hasn’t been a proven method to reliably prevent LLMs from hallucinating.

New cards

Groundedness

Generated text is grounded in a document if the document supports the text.

New cards

Retreival Augmented Generation

Commonly used in QA where model retreives info from support documents to answer a query. Claimed to reduce hallucination. Used in dialogue, QA, fact-checking, slot-filling, entity-linking. Can be trained end-to-end.

New cards

Code Models

Train on code and comments over written language. (Co-pilot, Codex, Code llama). Great fit between training data (code + comments) and test-time tasks (write code + comments). Also, is structured -> easier to learn

New cards

Multi-modal

Models trained through multiple modes (language and imaging, etc.) Can be autoregressive (DALL-E) or diffusion-based (Stable Diffusion).

New cards

Language Agents

LLM-based agents: Create plans and “reason“, take actions in response to plans & the environment. Capable of using tools.

ReAct: Iterative framework where LLM emits thoughts, then acts, and observes result
Toolformer: Pre-training technique where strings are replaced with calls to tools that yield result
Bootstrapped reasoning: Prompt the LLM to emit rationalization of intermediate steps; use as fine-tuning data