knowt logo

Large Language Models

Large Language Model (LLM) refers to a type of artificial intelligence model designed to understand and generate human-like text based on the data it has been trained on. These models are characterized by their vast size, typically consisting of billions or even trillions of parameters, which are the adjustable weights that help the model make predictions or generate text.

LLM architectures

Encoder: Encoder models are designed to encode text, that is, produce embeddings and is based on transformer architecture. By embedding text, we’re generally referring to the process of converting a sequence of words into a single vector or a sequence of vectors. Embedding of text is a numeric representation of the text that typically tries to capture the semantics or meaning of the text.

Decoder: Decoder models are designed to decode or generate text. The input to a text generation model is a sequence of words, and the output is a generated sequence of words. A decoder only produces a single token at a time.

Encoder-Decoder: The encoder processes the input sequence and compresses it into a fixed-size context vector, capturing the essential information. The decoder then uses this context vector to generate the output sequence step-by-step. Encoders and decoders can come in all different kinds of sizes. Size refers to the number of trainable parameters that the model has.

Prompting

Prompting, in the context of LLMs like GPT-4, refers to the process of providing an initial input or “prompt” to the model to generate a response or complete a task. The prompt acts as a starting point or instruction, guiding the model to produce relevant and coherent text based on its training data. Prompt engineering, is the process of iteratively refining the model input in an attempt to induce a probability distribution in the vocabulary for a particular task. This is done by modifying the inputs of the model to get closer and closer to the response that we want.

Types of prompting

In-context learning: Constructing a prompt that has demonstrations of the task that the model is meant to complete.

K-shot prompting: Including k examples of the task that you want the model to complete in the prompt. In Zero-shot prompting, no examples are provided within the prompt.

Chain-of-thought prompting: Prompt the model to break down the steps of solving the problem into small chunks.

Least to most prompting: Solve simpler problems first, and use the solutions to the simple problems to solve more difficult problems.

Step-back prompting: Identify high-level concepts pertinent to a task.

Issues with prompting

Prompting can be used to elicit unintended or even harmful behavior from a model. In prompt injection, the prompt is designed to elicit a response from the model that is not intended by the developer. In leaked prompt, the model is coaxed to reveal the data it's been trained on or reveal some sensitive information.

Hallucination refers to a phenomenon where a model generates content that is not grounded in reality or lacks factual accuracy. The threat of hallucination is one of the biggest challenges to safely deploying LLMs.

Training

Sometimes, prompting is insufficient. For example, when a model is trained on data from one domain, and you want to use it for a new domain, then prompting may not work. Training, will help in such cases where the parameters of the model are changed.

Fine-tuning: In fine-tuning, a pre-trained model, for example, BERT, and a labeled dataset is trained to perform the task by altering all of its parameters.

Parameter efficient fine-tuning: In PEFT a very small set of the model’s parameters is isolated to train, or a handful of new parameters are added to the model. For example, in LORA (Low Rank Adaptation) the parameters of the model are not altered, but additional parameters are added and trained.

Soft prompting: In soft prompting, specific parameters are added to the prompt, that acts as input to the model to perform specific tasks. This is another economic training option.

Continual pretraining: This is similar to fine-tuning, where all the parameters of the model are changed. However, continual pretraining is used for unlabeled data.

Decoding

There are various decoding techniques:

Greedy decoding: In this approach, at each step of the sequence generation process, the model selects the token (word or character) with the highest probability as its next output. This process continues until an end-of-sequence token is produced, or the sequence reaches a predefined maximum length.

Nucleus sampling: Also known as top-p sampling, is a sophisticated decoding strategy. Unlike greedy decoding, which always selects the most probable token, nucleus sampling considers a dynamic subset of the top probable tokens, allowing for more nuanced and varied text generation.

Beam search: Is an extension of the greedy decoding approach and aims to improve the quality of generated sequences by considering a set of candidate sequences instead of just the single most probable one.

Retrieval Augmented Generation (RAG)

RAG is an approach in natural language processing (NLP) that combines elements of both retrieval-based and generative models to produce high-quality, contextually relevant text. In this approach, a generative model (such as a language model) is augmented with a retrieval mechanism that retrieves relevant information from a large external knowledge source, such as a database or a corpus of text. This retrieved information is then used to guide or enhance the generation process of the model, resulting in more informed and contextually rich outputs.

There are two ways to implement RAG, sequence model and token model. The RAG sequence model focuses on generating entire sequences of text, such as paragraphs, documents, or longer pieces of content. The RAG token model, on the other hand, operates at the token level and is typically used for tasks where fine-grained control over individual tokens is required, such as text completion, question answering, or dialogue generation.

Vector Databases

Vector databases, also known as vector stores, are specialized databases designed to efficiently store, manage, and query high-dimensional vector data. These databases are particularly well-suited for applications involving machine learning, natural language processing, computer vision, recommendation systems, and other domains where data is represented as vectors. Many vector databases use a distributed architecture to handle the storage and computational demands of large scale, high-dimensional data that allows horizontal scaling, improved performance and storage capacity.

Semantic Search

Semantic search is an advanced information retrieval technique that aims to improve the accuracy and relevance of search results by understanding the meaning (semantics) behind the search query and the documents being searched. Unlike traditional keyword-based search, which relies solely on matching keywords, semantic search takes into account the intent, context, and semantics of both the query and the documents to return more precise and contextually relevant results.

OCI Generative AI Service

Oracle Cloud Infrastructure (OCI) Generative AI service is a fully-managed platform that allows you to leverage generative AI models for various text-based tasks. Here’s a breakdown of its key features:

Pre-trained Models: OCI Generative AI provides access to state-of-the-art large language models (LLMs) from Cohere and Meta. You can use these for tasks like summarization, text generation, translation and information extraction.

Fine-Tuning Capabilities: The service allows you to fine-tune these pre-trained models on your own data. This customization can significantly improve the model’s performance on specific tasks relevant to your business needs.

Dedicated Resources: OCI Generative AI utilizes isolated AI clusters for both fine-tuning and hosting custom models. This ensures security and optimal performance for your workloads.

Flexibility and Control: The service offers control over your models. You can create endpoints, update them, or even delete them as needed. Additionally, you can manage the compute resources allocated to your custom models.Artificial Intelligence

AI refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The goal of AI is to develop systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.

AI systems use various techniques, including machine learning, natural language processing, and computer vision, to analyze and interpret data, make decisions, and improve their performance over time.

Machine Learning

Machine learning, is a subset of AI, involves training algorithms on large datasets to enable them to make predictions or decisions without being explicitly programmed.

ML algorithms can be categorized into supervised learning, unsupervised learning, and reinforcement learning. They are trained on data to recognize patterns and make predictions or decisions. For example, spam filters that learn to classify emails as spam or not based on user feedback.

Deep Learning

DL is a specialized field within ML that involves neural networks with many layers (deep neural networks). It aims to model high-level abstractions in data using multiple processing layers.

DL has been particularly successful in tasks such as image and speech recognition, natural language processing, and playing games. For example, Convolutional Neural Networks (CNNs) used in image recognition or Recurrent Neural Networks (RNNs) used in natural language processing.

Generative AI

Generative AI refers to a form of artificial intelligence that has the ability to understand, learn, and apply knowledge across diverse tasks, similar to human intelligence. It is a more advanced and theoretical concept.

Generative AI possess a broad range of cognitive abilities, allowing it to perform human like intellectual tasks. Gen AI models has the ability to learn patterns in a given data set and use that knowledge to create new data. The model can generate both visual and text content.

Models Used in Generative AI

GAN (Generative Adversarial Network)

GAN is a type of generative model in machine learning where two neural networks, a generator, and a discriminator, are trained simultaneously through adversarial training. The generator creates new data instances, and the discriminator evaluates them. The goal is for the generator to produce data that is indistinguishable from real data.

GANs are widely used for image and video generation, style transfer, and other tasks where the generation of realistic data is desired.

LLM (Large Language Model)

LLM generally refers to a language model that is large in scale, often in terms of the number of parameters in the model. These models are trained on vast amounts of text data and can generate human-like text, understand context, and perform various natural language processing tasks.

Large Language Models, such as OpenAI’s GPT (Generative Pre-trained Transformer) series, are used for tasks like language translation, text completion, question answering, and more.

Transformer

The Transformer is a type of neural network architecture introduced in the paper “Attention is All You Need” by Vaswani et al. Transformers use a mechanism called self-attention to process input data in parallel, making them highly effective for tasks that involve sequential or parallel processing.

Transformers have become a fundamental architecture in natural language processing and have been used in various models, including BERT (Bidirectional Encoder Representations from Transformers) for language understanding, GPT for language generation, and more.

Oracle Cloud Infrastructure (OCI) AI services

OCI offers a variety of pre-trained AI services that allow developers to easily add AI capabilities to their applications without the need for deep expertise in machine learning. These include services for speech recognition, language understanding, image analysis, and more.

The OCI AI services are:

OCI Language

Language allows you to perform sophisticated text analysis at scale. Using the pretrained and custom models, you can process unstructured text to extract insights without data science expertise. Pretrained models include sentiment analysis, key phrase extraction, text classification, and named entity recognition. Additionally, you can translate text across numerous languages.

OCI Speech

Speech can transcribe customer service calls, automate subtitling, and generate metadata for media assets. Speech harnesses the power of spoken language enabling you to easily convert media files containing human speech into highly exact text transcriptions.

OCI Vision

Vision is a serverless, multi-tenant service, accessible using the OCI Cloud Console, or over REST APIs. You can upload images to detect and classify objects in them. If you have many images, then you can process them in batch using asynchronous API endpoints.

OCI Document Understanding

Document Understanding allows you to extract text, tables, and other key data from document files through APIs and CLI tools. With Document Understanding, you can automate tedious business processing tasks with prebuilt AI models, and customize document extraction to fit your industry-specific needs. You can upload documents to detect and classify text and objects in them.

OCI Anomaly Detection

Anomaly Detection provides you with a set of tools to identify undesirable events or observations in business data in real time so that you can act to avoid business disruptions. This service is multi-tenant that analyzes large volume of multivariate or univariate time series data.

Large Language Models

Large Language Model (LLM) refers to a type of artificial intelligence model designed to understand and generate human-like text based on the data it has been trained on. These models are characterized by their vast size, typically consisting of billions or even trillions of parameters, which are the adjustable weights that help the model make predictions or generate text.

LLM architectures

Encoder: Encoder models are designed to encode text, that is, produce embeddings and is based on transformer architecture. By embedding text, we’re generally referring to the process of converting a sequence of words into a single vector or a sequence of vectors. Embedding of text is a numeric representation of the text that typically tries to capture the semantics or meaning of the text.

Decoder: Decoder models are designed to decode or generate text. The input to a text generation model is a sequence of words, and the output is a generated sequence of words. A decoder only produces a single token at a time.

Encoder-Decoder: The encoder processes the input sequence and compresses it into a fixed-size context vector, capturing the essential information. The decoder then uses this context vector to generate the output sequence step-by-step. Encoders and decoders can come in all different kinds of sizes. Size refers to the number of trainable parameters that the model has.

Prompting

Prompting, in the context of LLMs like GPT-4, refers to the process of providing an initial input or “prompt” to the model to generate a response or complete a task. The prompt acts as a starting point or instruction, guiding the model to produce relevant and coherent text based on its training data. Prompt engineering, is the process of iteratively refining the model input in an attempt to induce a probability distribution in the vocabulary for a particular task. This is done by modifying the inputs of the model to get closer and closer to the response that we want.

Types of prompting

In-context learning: Constructing a prompt that has demonstrations of the task that the model is meant to complete.

K-shot prompting: Including k examples of the task that you want the model to complete in the prompt. In Zero-shot prompting, no examples are provided within the prompt.

Chain-of-thought prompting: Prompt the model to break down the steps of solving the problem into small chunks.

Least to most prompting: Solve simpler problems first, and use the solutions to the simple problems to solve more difficult problems.

Step-back prompting: Identify high-level concepts pertinent to a task.

Issues with prompting

Prompting can be used to elicit unintended or even harmful behavior from a model. In prompt injection, the prompt is designed to elicit a response from the model that is not intended by the developer. In leaked prompt, the model is coaxed to reveal the data it's been trained on or reveal some sensitive information.

Hallucination refers to a phenomenon where a model generates content that is not grounded in reality or lacks factual accuracy. The threat of hallucination is one of the biggest challenges to safely deploying LLMs.

Training

Sometimes, prompting is insufficient. For example, when a model is trained on data from one domain, and you want to use it for a new domain, then prompting may not work. Training, will help in such cases where the parameters of the model are changed.

Fine-tuning: In fine-tuning, a pre-trained model, for example, BERT, and a labeled dataset is trained to perform the task by altering all of its parameters.

Parameter efficient fine-tuning: In PEFT a very small set of the model’s parameters is isolated to train, or a handful of new parameters are added to the model. For example, in LORA (Low Rank Adaptation) the parameters of the model are not altered, but additional parameters are added and trained.

Soft prompting: In soft prompting, specific parameters are added to the prompt, that acts as input to the model to perform specific tasks. This is another economic training option.

Continual pretraining: This is similar to fine-tuning, where all the parameters of the model are changed. However, continual pretraining is used for unlabeled data.

Decoding

There are various decoding techniques:

Greedy decoding: In this approach, at each step of the sequence generation process, the model selects the token (word or character) with the highest probability as its next output. This process continues until an end-of-sequence token is produced, or the sequence reaches a predefined maximum length.

Nucleus sampling: Also known as top-p sampling, is a sophisticated decoding strategy. Unlike greedy decoding, which always selects the most probable token, nucleus sampling considers a dynamic subset of the top probable tokens, allowing for more nuanced and varied text generation.

Beam search: Is an extension of the greedy decoding approach and aims to improve the quality of generated sequences by considering a set of candidate sequences instead of just the single most probable one.

Retrieval Augmented Generation (RAG)

RAG is an approach in natural language processing (NLP) that combines elements of both retrieval-based and generative models to produce high-quality, contextually relevant text. In this approach, a generative model (such as a language model) is augmented with a retrieval mechanism that retrieves relevant information from a large external knowledge source, such as a database or a corpus of text. This retrieved information is then used to guide or enhance the generation process of the model, resulting in more informed and contextually rich outputs.

There are two ways to implement RAG, sequence model and token model. The RAG sequence model focuses on generating entire sequences of text, such as paragraphs, documents, or longer pieces of content. The RAG token model, on the other hand, operates at the token level and is typically used for tasks where fine-grained control over individual tokens is required, such as text completion, question answering, or dialogue generation.

Vector Databases

Vector databases, also known as vector stores, are specialized databases designed to efficiently store, manage, and query high-dimensional vector data. These databases are particularly well-suited for applications involving machine learning, natural language processing, computer vision, recommendation systems, and other domains where data is represented as vectors. Many vector databases use a distributed architecture to handle the storage and computational demands of large scale, high-dimensional data that allows horizontal scaling, improved performance and storage capacity.

Semantic Search

Semantic search is an advanced information retrieval technique that aims to improve the accuracy and relevance of search results by understanding the meaning (semantics) behind the search query and the documents being searched. Unlike traditional keyword-based search, which relies solely on matching keywords, semantic search takes into account the intent, context, and semantics of both the query and the documents to return more precise and contextually relevant results.

OCI Generative AI Service

Oracle Cloud Infrastructure (OCI) Generative AI service is a fully-managed platform that allows you to leverage generative AI models for various text-based tasks. Here’s a breakdown of its key features:

Pre-trained Models: OCI Generative AI provides access to state-of-the-art large language models (LLMs) from Cohere and Meta. You can use these for tasks like summarization, text generation, translation and information extraction.

Fine-Tuning Capabilities: The service allows you to fine-tune these pre-trained models on your own data. This customization can significantly improve the model’s performance on specific tasks relevant to your business needs.

Dedicated Resources: OCI Generative AI utilizes isolated AI clusters for both fine-tuning and hosting custom models. This ensures security and optimal performance for your workloads.

Flexibility and Control: The service offers control over your models. You can create endpoints, update them, or even delete them as needed. Additionally, you can manage the compute resources allocated to your custom models.Artificial Intelligence

AI refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The goal of AI is to develop systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.

AI systems use various techniques, including machine learning, natural language processing, and computer vision, to analyze and interpret data, make decisions, and improve their performance over time.

Machine Learning

Machine learning, is a subset of AI, involves training algorithms on large datasets to enable them to make predictions or decisions without being explicitly programmed.

ML algorithms can be categorized into supervised learning, unsupervised learning, and reinforcement learning. They are trained on data to recognize patterns and make predictions or decisions. For example, spam filters that learn to classify emails as spam or not based on user feedback.

Deep Learning

DL is a specialized field within ML that involves neural networks with many layers (deep neural networks). It aims to model high-level abstractions in data using multiple processing layers.

DL has been particularly successful in tasks such as image and speech recognition, natural language processing, and playing games. For example, Convolutional Neural Networks (CNNs) used in image recognition or Recurrent Neural Networks (RNNs) used in natural language processing.

Generative AI

Generative AI refers to a form of artificial intelligence that has the ability to understand, learn, and apply knowledge across diverse tasks, similar to human intelligence. It is a more advanced and theoretical concept.

Generative AI possess a broad range of cognitive abilities, allowing it to perform human like intellectual tasks. Gen AI models has the ability to learn patterns in a given data set and use that knowledge to create new data. The model can generate both visual and text content.

Models Used in Generative AI

GAN (Generative Adversarial Network)

GAN is a type of generative model in machine learning where two neural networks, a generator, and a discriminator, are trained simultaneously through adversarial training. The generator creates new data instances, and the discriminator evaluates them. The goal is for the generator to produce data that is indistinguishable from real data.

GANs are widely used for image and video generation, style transfer, and other tasks where the generation of realistic data is desired.

LLM (Large Language Model)

LLM generally refers to a language model that is large in scale, often in terms of the number of parameters in the model. These models are trained on vast amounts of text data and can generate human-like text, understand context, and perform various natural language processing tasks.

Large Language Models, such as OpenAI’s GPT (Generative Pre-trained Transformer) series, are used for tasks like language translation, text completion, question answering, and more.

Transformer

The Transformer is a type of neural network architecture introduced in the paper “Attention is All You Need” by Vaswani et al. Transformers use a mechanism called self-attention to process input data in parallel, making them highly effective for tasks that involve sequential or parallel processing.

Transformers have become a fundamental architecture in natural language processing and have been used in various models, including BERT (Bidirectional Encoder Representations from Transformers) for language understanding, GPT for language generation, and more.

Oracle Cloud Infrastructure (OCI) AI services

OCI offers a variety of pre-trained AI services that allow developers to easily add AI capabilities to their applications without the need for deep expertise in machine learning. These include services for speech recognition, language understanding, image analysis, and more.

The OCI AI services are:

OCI Language

Language allows you to perform sophisticated text analysis at scale. Using the pretrained and custom models, you can process unstructured text to extract insights without data science expertise. Pretrained models include sentiment analysis, key phrase extraction, text classification, and named entity recognition. Additionally, you can translate text across numerous languages.

OCI Speech

Speech can transcribe customer service calls, automate subtitling, and generate metadata for media assets. Speech harnesses the power of spoken language enabling you to easily convert media files containing human speech into highly exact text transcriptions.

OCI Vision

Vision is a serverless, multi-tenant service, accessible using the OCI Cloud Console, or over REST APIs. You can upload images to detect and classify objects in them. If you have many images, then you can process them in batch using asynchronous API endpoints.

OCI Document Understanding

Document Understanding allows you to extract text, tables, and other key data from document files through APIs and CLI tools. With Document Understanding, you can automate tedious business processing tasks with prebuilt AI models, and customize document extraction to fit your industry-specific needs. You can upload documents to detect and classify text and objects in them.

OCI Anomaly Detection

Anomaly Detection provides you with a set of tools to identify undesirable events or observations in business data in real time so that you can act to avoid business disruptions. This service is multi-tenant that analyzes large volume of multivariate or univariate time series data.