LLM Reasoning and Prompting

Prompt Engineering Techniques for Large Language Models

Introduction

  • Improving large language models (LLMs) without updating weights using prompt engineering.
  • Techniques aim to enhance model reasoning and performance.

Comparing Human and Artificial Intelligence

  • Human Intelligence:
    • Learns from few examples.
    • Reasons and generalizes well.
    • Explainable rationale.
  • Traditional Artificial Intelligence:
    • Needs large, labeled datasets.
    • Black box approach (hard to explain).
    • Struggles with generalization and reasoning.
  • Reasoning is a difficult task and efforts are made to fill the gaps using Bayesian machine learning, transfer learning, and domain adaptation.

Types of Reasoning Problems

  • Mathematics and Symbolic Reasoning:
    • Example: Solving math problems.
  • Common Sense Reasoning:
    • Example: Combining character traits to understand a situation.
  • Logical Reasoning:
    • Example: Deducing conclusions from a series of if-then statements.
      • If A, then B. If B, then C. Therefore, if A, then C.
      • Example from the transcript: "The more ifs, greater mind states, outbreak of war. Emily is a war. What is? And we have freedom."

Challenges with Large Language Models and Reasoning

  • LLMs show some reasoning abilities but struggle with true semantic understanding.
  • Mathematics remains a significant challenge, even for large models (e.g., 540 billion parameter models).

Train of Thought Prompting

  • Using train of thought prompting can significantly improve results.

Prompt Engineering Techniques

  • Using natural language to instruct LLMs.
  • Improving models without updating weights.
  • Five methods:
    • Adding a magic word.
    • Adding more context.
    • Multi-round iterations.
    • Moderating model responses.
    • Combining multiple models.

Method 1: Magic Words

  • Adding specific words to encourage reasoning.
  • Example: "One last thing. Step by step."
  • LLMs are directional reasoners; magic words prompt deeper reasoning.
  • Zero-shot train of thought involves using these magic words to guide the model step by step without prior examples.
  • Experimented with other prompts such as:
    • "Let's think about this water quality."
    • "Let's solve this program in a few steps."

Magic Words for Multimodal Models

  • Asking models to analyze why something is funny.
    • Example: "Tell why it is funny."
  • Using phrases to denote importance or invoke emotion.
    • Example: "This is very important to my career."
  • Emotional prompts can improve performance.

Guidelines for Using Magic Words

  • Avoid politeness.
  • Use direct, actionable language.
  • Avoid negative phrasing.
  • Avoid relying on stereotypes.
  • Find magic words through reinforcement learning.
    • Using two LLMs, a target and a generator, and evaluate the performance to find the best magic words.

Finding Magic Words

  • External Method (Reinforcement Learning):
    • Using a generator model to create magic words and a target model to evaluate their effectiveness.
  • Internal Method (Prompting):
    • Prompting the model to output a possible good prompt.
    • Reverse engineering prompts from input-output pairs.
    • Iteratively refining prompts based on previous responses.
    • Example: Iteratively improve the prompt through multiple rounds, refining it until the desired response is achieved.
      • Baselines: "Let's think step by step," "Therefore, it is probably a step by step"

Iterative Prompt Optimization

  • Iteratively optimizing prompts can significantly improve model performance.
  • Example prompts:
    • "Palm to L IDC had produced back home to take a deep breath and work on this problem step by step to get the most accuracy."
  • Magic words may not always help with newer, larger models.
    • Newer models have improved reasoning capabilities and may not require magic words.

Method 2: Adding Context

  • Context significantly improves LLM performance.
  • Role-playing: Assigning a specific role to the LLM.
    • The outputs of LLMs will have a consistent style and reasoning based on the role.
    • Example: Asking the LLM to act like Shakespeare.
    • Modern commercial services use role-playing by default.

In-Context Learning

  • Providing a few examples or demonstrations to the LLM.
  • The larger models learn from the context without any weight updating.
  • Examples of positive, negative, and neutral phrases for sentiment analysis.
  • Future prompting involves providing several question-answer pairs.

Few-Shot Learning

  • Learning from examples in the prompt (in-context learning).
  • Emergent capabilities: LLMs can learn from examples in the prompt without explicit training data.
  • Few-shot train of thought involves providing several example triplets (question, reasoning, and answer).

Understanding Examples

  • LLMs adapt to the formatting and patterns of the demonstrations.
  • Changing the labels of examples does not significantly affect performance.

Scaling Laws

  • Larger models understand examples to some extent.
  • In-context learning for translation tasks shows significant improvement with larger models.
  • Gemini model can reach near-human performance by reading sample textbook.

Providing More Context

  • Manually searching online and providing information to the LLM.
  • Using a system to automatically retrieve information from a database (Retrieval Augmented Generation - RAG).
  • Asking the LLM to generate knowledge first and then answer the question (Generated Knowledge Prompting).

Method 3: Multi-Round Iterations

  • Decomposing complex tasks into separate rounds.
  • Solving sub-problems and addressing trade-offs.
  • Breaking down the process of writing an academic paper into multiple steps, such as creating an outline, writing each section based on the outline, and revising the content.
  • Self-verification: LLMs check their own work.
    • Using Python as a calculator.

Decomposed Question Answering

  • Breaking down questions into sub-questions.
  • Example: Asking the LLM to divide a complex task into several sub-questions and then answer each one.
  • Self-prompting: Decomposing the reasoning process into sub-questions and answers.

Generated Knowledge Prompting

  • Asking the LLM to generate knowledge and then answer the question based on that knowledge.
  • Combining sub-questions with generated knowledge.

Self-Reflection

  • LLMs evaluate their own answers and improve them.
  • Self-refinement: Feeding the initial output back to the model to improve it.
  • Self-consistency: Generating multiple responses and selecting the most common answer.

Tree of Thoughts

  • Similar to self-consistency but explores different reasoning paths in a tree-like structure.
  • Searching for counter passes and jumping between different reasoning paths.

Constitutional AI

  • Asking LLMs to think twice before responding.
  • Using a set of principles to refine the API and prevent harmful content.
  • Model does self-critique and re-answers the question based on feedback.
  • It checks the answers of every step and not only the final answer.

Homework and Research Opportunities

  • Design prompts to elicit specific types of output from LLMs.
  • Explore research opportunities in prompt engineering.