LLM Reasoning and Prompting
Prompt Engineering Techniques for Large Language Models
Introduction
- Improving large language models (LLMs) without updating weights using prompt engineering.
- Techniques aim to enhance model reasoning and performance.
Comparing Human and Artificial Intelligence
- Human Intelligence:
- Learns from few examples.
- Reasons and generalizes well.
- Explainable rationale.
- Traditional Artificial Intelligence:
- Needs large, labeled datasets.
- Black box approach (hard to explain).
- Struggles with generalization and reasoning.
- Reasoning is a difficult task and efforts are made to fill the gaps using Bayesian machine learning, transfer learning, and domain adaptation.
Types of Reasoning Problems
- Mathematics and Symbolic Reasoning:
- Example: Solving math problems.
- Common Sense Reasoning:
- Example: Combining character traits to understand a situation.
- Logical Reasoning:
- Example: Deducing conclusions from a series of if-then statements.
- If A, then B. If B, then C. Therefore, if A, then C.
- Example from the transcript: "The more ifs, greater mind states, outbreak of war. Emily is a war. What is? And we have freedom."
Challenges with Large Language Models and Reasoning
- LLMs show some reasoning abilities but struggle with true semantic understanding.
- Mathematics remains a significant challenge, even for large models (e.g., 540 billion parameter models).
Train of Thought Prompting
- Using train of thought prompting can significantly improve results.
Prompt Engineering Techniques
- Using natural language to instruct LLMs.
- Improving models without updating weights.
- Five methods:
- Adding a magic word.
- Adding more context.
- Multi-round iterations.
- Moderating model responses.
- Combining multiple models.
Method 1: Magic Words
- Adding specific words to encourage reasoning.
- Example: "One last thing. Step by step."
- LLMs are directional reasoners; magic words prompt deeper reasoning.
- Zero-shot train of thought involves using these magic words to guide the model step by step without prior examples.
- Experimented with other prompts such as:
- "Let's think about this water quality."
- "Let's solve this program in a few steps."
- Asking models to analyze why something is funny.
- Example: "Tell why it is funny."
- Using phrases to denote importance or invoke emotion.
- Example: "This is very important to my career."
- Emotional prompts can improve performance.
Guidelines for Using Magic Words
- Avoid politeness.
- Use direct, actionable language.
- Avoid negative phrasing.
- Avoid relying on stereotypes.
- Find magic words through reinforcement learning.
- Using two LLMs, a target and a generator, and evaluate the performance to find the best magic words.
Finding Magic Words
- External Method (Reinforcement Learning):
- Using a generator model to create magic words and a target model to evaluate their effectiveness.
- Internal Method (Prompting):
- Prompting the model to output a possible good prompt.
- Reverse engineering prompts from input-output pairs.
- Iteratively refining prompts based on previous responses.
- Example: Iteratively improve the prompt through multiple rounds, refining it until the desired response is achieved.
- Baselines: "Let's think step by step," "Therefore, it is probably a step by step"
Iterative Prompt Optimization
- Iteratively optimizing prompts can significantly improve model performance.
- Example prompts:
- "Palm to L IDC had produced back home to take a deep breath and work on this problem step by step to get the most accuracy."
- Magic words may not always help with newer, larger models.
- Newer models have improved reasoning capabilities and may not require magic words.
Method 2: Adding Context
- Context significantly improves LLM performance.
- Role-playing: Assigning a specific role to the LLM.
- The outputs of LLMs will have a consistent style and reasoning based on the role.
- Example: Asking the LLM to act like Shakespeare.
- Modern commercial services use role-playing by default.
In-Context Learning
- Providing a few examples or demonstrations to the LLM.
- The larger models learn from the context without any weight updating.
- Examples of positive, negative, and neutral phrases for sentiment analysis.
- Future prompting involves providing several question-answer pairs.
Few-Shot Learning
- Learning from examples in the prompt (in-context learning).
- Emergent capabilities: LLMs can learn from examples in the prompt without explicit training data.
- Few-shot train of thought involves providing several example triplets (question, reasoning, and answer).
Understanding Examples
- LLMs adapt to the formatting and patterns of the demonstrations.
- Changing the labels of examples does not significantly affect performance.
Scaling Laws
- Larger models understand examples to some extent.
- In-context learning for translation tasks shows significant improvement with larger models.
- Gemini model can reach near-human performance by reading sample textbook.
Providing More Context
- Manually searching online and providing information to the LLM.
- Using a system to automatically retrieve information from a database (Retrieval Augmented Generation - RAG).
- Asking the LLM to generate knowledge first and then answer the question (Generated Knowledge Prompting).
Method 3: Multi-Round Iterations
- Decomposing complex tasks into separate rounds.
- Solving sub-problems and addressing trade-offs.
- Breaking down the process of writing an academic paper into multiple steps, such as creating an outline, writing each section based on the outline, and revising the content.
- Self-verification: LLMs check their own work.
- Using Python as a calculator.
Decomposed Question Answering
- Breaking down questions into sub-questions.
- Example: Asking the LLM to divide a complex task into several sub-questions and then answer each one.
- Self-prompting: Decomposing the reasoning process into sub-questions and answers.
Generated Knowledge Prompting
- Asking the LLM to generate knowledge and then answer the question based on that knowledge.
- Combining sub-questions with generated knowledge.
Self-Reflection
- LLMs evaluate their own answers and improve them.
- Self-refinement: Feeding the initial output back to the model to improve it.
- Self-consistency: Generating multiple responses and selecting the most common answer.
Tree of Thoughts
- Similar to self-consistency but explores different reasoning paths in a tree-like structure.
- Searching for counter passes and jumping between different reasoning paths.
Constitutional AI
- Asking LLMs to think twice before responding.
- Using a set of principles to refine the API and prevent harmful content.
- Model does self-critique and re-answers the question based on feedback.
- It checks the answers of every step and not only the final answer.
Homework and Research Opportunities
- Design prompts to elicit specific types of output from LLMs.
- Explore research opportunities in prompt engineering.