1/43
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
What is an LLM?
LLM stands for “Large Language Model“. Language models are probabilistic models of text, that generate a probability distribution of words in its vocabulary being the next word, based on the input given. “Large“ is in reference to the # of parameters and it is subjective. No agreed-upon threshold.
What is encoding?
Embedding text. Encoders are models that convert a sequence of words to an embedding (vector representation).
What are some examples of encoders?
MiniLM, Embed-light, BERT, RoBERTa, DistillBERT, SBERT
What is decoding?
Text generation. Models that take in a sequence of text and output the next word. Happens iteratively (1 word at a time). Emits 1 word based on vocabulary distribution and it is appended to the original input and the process repeats.
What are some examples of decoders?
GPT-4, Llama, BLOOM, Falcon
What is an encoder-decoder?
Models that take in a sequence of words, encode the sequence of words, and use the encoding to output the next word.
What are some examples of encoder-decoders?
T5, UL2, BART
Which model(s) is used for embedding text?
Encoders
Which model(s) is used for abstractive QA (generating answers by paraphrasing and summarizing information)?
Decoders and encoder-decoders.
Which model(s) is used for extractive QA (generating answers directly from text extracts)?
Encoder-decoders and maybe decoders
Which model(s) is used for translation?
Encoder-decoders and maybe decoders
Which model(s) is used for creative writing?
Decoders
Which model(s) is used for abstractive summarization (summarizing the main ideas of a text using words/phrases not in the original text)?
Decoders and encoder-decoders
Which model(s) is used for extractive summarization (selecting important sentences/phrases from te original text to summarize it)?
Encoders, encoder-decoders, and maybe decoders
Which model(s) is used for chatting (like chat-gpt)?
Decoders
Which model(s) is used for generating forecasting (making predictions based on historical data & trends)?
None
Which model(s) is used for generating code?
Decoders and encoder-decoders
What is prompting?
The practice of changing the prompt to a model to affect the probability distribution of the words in its vocabulary. (Simplest method).
What is a prompt?
The text provided to an LLM as input, sometimes containing instructions/examples.
What is prompt engineering?
The process of iteratively refining a prompt to elicit a particular style of response. Not always going to work but has been fairly successful in working.
What is in-context learning?
Conditioning (prompting) an LLM with instruction and demonstrations of the tasks it’s meant to complete.
What is K-shot prompting?
Explicitly providing k examples of the intended task in the prompt.
What is chain-of-thought prompting?
Prompting the LLM to show it’s intermediate reasoning steps that helped it arrive at its answer.
What is least-to-most prompting?
Prompt the LLM to decompose the problem into subproblems, solve easy first, then build up.
What is step-back prompting?
Prompt the LLM to identify the high-level concepts relevant to the problem/task.
What is prompt injection?
Deliberately providing an LLM with malicious inputs, that has it ignore instructions, cause harm, or behave contrary to deployment expectation.
What is memorization?
After answering, repeat the original prompt.
Leaked prompts
Leaked private information from training.
When may prompting alone be inappropriate?
If training data exists or domain adaption is required.
What is domain adaption?
Adapting a model (via training) to enhance its performance outside of the domain/subject-area it was training on.
What is fine-tuning training style?
We take a pre-trained model (like BERT) and a labeled dataset for a task that we care about and train the model to perform the task by altering all of its parameters. (Very expensive)
What is parameter efficient fine-tuning training style?
Isolate a very small subset of the model’s parameters to train or add a handful of new parameters to the model.
What is soft prompting training style?
Add parameters to the prompt – adding very specialized “words“ that is inputted to the model to queue it to perform specific tasks. Soft prompts are learned. Specialized words are initialized randomly/iteratively fine-tuned during training.
What is continual pre-training training style?
Similar to fine-tuning, changing all parameters in the model (expensive), but does not require label data. Instead of prediciting specific labels, we feed the model any kind of data that we have and ask the model to continually predict the next word.
For the four training style specify:
How many parameters are modified
Whether or not the data provided in training is labeled/task-specific
Summary of what it does
Fine-tuning:
All parameters
Labeled, task-specific
Classic ML training
Parameter Efficient Fine-tuning:
Few, new parameters
Labeled, task-specific
Other params can be learned rather than provided
Soft Prompting:
Few, new parameters
Labeled, task-specific
Prompts can be learned rather than provdied. Learnable prompt params.
Continual pre-training:
All parameters
unlabeled
Same as LLM pre-training
Greedy decoding
At each step of picking a word, select the word in the vocabulary with the highest probability
Non-deterministic decoding
Pick randomly amon high probability candidates at each step
Temperature
Hyper parameter that modulates the distribution over vocabulary. Increasing temperature causes the model to deviate from greedy decoding and flattens the distribution. Decreasing causes the distribution to accumulate near the likely word.
Hallucination
Generated text that is non-factual/ungrounded. The text sounds logical or sensible but it’s actually wrong.
Can we keep an LLM from hallucinating?
Methods like retreival-augmentation claim to reduce hallucination but there hasn’t been a proven method to reliably prevent LLMs from hallucinating.
Groundedness
Generated text is grounded in a document if the document supports the text.
Retreival Augmented Generation
Commonly used in QA where model retreives info from support documents to answer a query. Claimed to reduce hallucination. Used in dialogue, QA, fact-checking, slot-filling, entity-linking. Can be trained end-to-end.
Code Models
Train on code and comments over written language. (Co-pilot, Codex, Code llama). Great fit between training data (code + comments) and test-time tasks (write code + comments). Also, is structured -> easier to learn
Multi-modal
Models trained through multiple modes (language and imaging, etc.) Can be autoregressive (DALL-E) or diffusion-based (Stable Diffusion).
Language Agents
LLM-based agents: Create plans and “reason“, take actions in response to plans & the environment. Capable of using tools.
ReAct: Iterative framework where LLM emits thoughts, then acts, and observes result
Toolformer: Pre-training technique where strings are replaced with calls to tools that yield result
Bootstrapped reasoning: Prompt the LLM to emit rationalization of intermediate steps; use as fine-tuning data