Advanced Topics in Computer Science: Advances in AI

Advanced Topics in Computer Science/Business Computing (CS3001/3606)

Advances in AI: Large Language Models (LLMs) for Question Answering (QA) & Text Generation

David Bell

News

Task

ONE

Task 1
Quizzes & Discussion

Introduction

Overview of Topics

  • Language Models

  • Large Language Models

  • Fine-Tuning Large Language Models

  • Prompt Engineering

  • Retrieval Augmented Generation

  • LLMs in Fintech & Healthcare

  • Evaluation


Concepts

  • Language Models: Statistical models predicting the next word in a sentence based on prior words.

  • Large Language Models (LLMs): Advanced language models with millions/billions of parameters capable of understanding and generating human-like text.

  • Embeddings: Continuous vector representations of words or phrases that capture semantic meaning.

  • Fine-tuning: Adaptation of pre-trained models to specific tasks by continued training on smaller, task-specific datasets.

  • Prompt Engineering: Designing inputs to effectively guide the response generation of LLMs.

  • Retrieval Augmented Generation (RAG): Technique that enhances LLM outputs by integrating external factual sources.

  • Zero-/Single-/n-shot Prompting: Approaches to prompt responses without examples (zero-shot), with one example (single-shot), and with multiple examples (n-shot).

  • Chaining: A technique that connects multiple prompts/results for complex tasks.

  • Transformers: Architecture for LLMs leveraging self-attention to process input data.

  • Attention: Focus mechanism within Transformers that enhances the processing of input sequence parts.

  • Quantization: Reducing a model's precision to lower resource usage while maintaining acceptable performance.

  • LLM Evaluation: Methods for assessing the effectiveness of LLM outputs, e.g., ROUGE, BLEU.

  • Vector Databases: Specialized databases optimized for storing and querying vector representations.

  • Distillation: A process of creating smaller models that retain the knowledge of a larger model.

Learning Outcomes

  • Explanation of terms: Large Language Model (LLM), Prompt Engineering, Fine-Tuning

  • Design LLM systems using advanced methods and RAG

  • Understand LLM concepts and make design choices

  • Assess LLM system outputs


GenAI and LLMs

Example Prompt
  • Task: Create a report on fine-tuning LLMs

  • Response from ChatGPT:

    1. Introduction: LLMs revolutionize natural language processing. Fine-tuning enhances their performance on specific tasks.

    2. Overview of Large Language Models:

    • Definition and Characteristics: Capable of performing multiple NLU tasks such as translation and summarization.

    • Pre-training Phase: Uses transfer learning with transformer architectures and self-attention mechanisms.

    • Notable LLMs: GPT series, BERT, XLNet, T5.

    1. Fine-Tuning LLMs: Overview of methodologies and techniques from literature.


What is a Language Model?

  • Definition: Probability model denoted as P(Text | Preceding Text)

  • Concept: Models the likelihood of a word following one or more preceding words based on statistical patterns in training data.

Conditional Probability Formula

  • Formula: P(W1,W2,W3,W4,ext,Wn)=P(W1)imesP(W2W1)imesP(W3W1,W2)imesimesP(WnW1,W2,ext,W(n1))P(W1,W2,W3,W4, ext{…},Wn) = P(W1) imes P(W2 | W1) imes P(W3 | W1,W2) imes … imes P(Wn | W1,W2, ext{…},W(n-1))

  • Exploration: Captures the sequence's implicit order through conditional probability.


Large Language Models

  • Examples of LLMs:

    • OpenAI's GPT

    • DeepMind's Chinchilla

    • Bloomberg GPT

    • Google's Med-Gemini

    • Meta's LLAMA

    • T5/Gemma

    • Mistral

Characteristics of LLMs
  • Utilize transformer-based architectures.

  • Efficiently manage long-range dependencies by processing sequences of tokens.


Typical LLM Projects in Computer Science

  • Text-oriented tasks:

    • Email writing

    • Code generation

    • Document analysis

  • Training and Fine-tuning:

    • Utilization of datasets for specialized tasks

  • Replacing humans with chatbots: Performing repetitive text generation tasks.


Size of LLMs

  • Illustrations of LLM capabilities with performance metrics including human expert levels, measured via various benchmarks such as MMLU.

  • Key models and their sizes include:

    • Gopher

    • U-PALM

    • GPT-4 Classic

    • ChatGPT (GPT-3.5 turbo)

    • Llama (65B parameters)

    • Chinchilla and others.


Architectural Designs

  • Fine-tuning and Prompt Engineering:

    • Techniques and frameworks for optimizing LLMs for specific applications.


Advanced Prompt Engineering Techniques

  • Retrieval Augmented Generation (RAG):

    • Combines prompt engineering with external knowledge sources for more accurate LLM outputs.

    • Components:

      • Prompt embedding

      • Query Data

      • Improved Prompt for the LLM

Example Data (from GPT-4)
  • Generate question-answer pairs related to financial risks.


Fine-Tuning vs Quantization

  • Fine-Tuning: Modifying a pre-trained LLM for specific tasks using domain-specific data. Typical cost for training models like GPT-4 is approximately $100 million.

  • Quantization: Reduces resource requirements by lowering precision levels (e.g., from float32 to float16 or int8).


Distillation

  • Process involving training smaller models to emulate larger models' behavior, thereby retaining knowledge while improving efficiency.

  • Key roles of Knowledge Distillation (KD) in LLMs:

    1. Enhances capabilities.

    2. Provides model compression.

    3. Encourages self-improvement.


LLMs in Business

Applications in Fintech

  • Bloomberg GPT: A large language model tailored for finance with superior performance metrics compared to open models.

  • LLM-based Trade Document Analysis: Leveraging LLMs to ensure accuracy and reliability in trade documents.

Applications in Healthcare

  • Utilization of LLMs to manage and summarize scientific knowledge and translate complex medical content.


Evaluation Metrics

ROUGE

  • An evaluation metric measuring summary quality by overlap with reference texts, utilizing F1 score for N-gram precision and recall.

  • Recall and Precision Goals:

    • Recall = accurate n-grams / reference total

    • Precision = quality n-grams / output total

Application Example

  • Example: Model Output = 'the house on the hill', Reference = 'walk up the hill to the house'.

    • 1-gram overlap evaluation.

Other Evaluation Metrics

  • BLEU, Perplexity, Human evaluation, etc.


Future Lectures

Expectations

  • Preparation for exam structure focusing on LLMs, their applications, and integration methods.

  • Discussion questions regarding RAG, output comparisons, and potential LLM applications in specific domains such as healthcare and finance.


Bibliography

  • Includes multiple referenced works discussing advancements in and applications of LLMs.