Advancements in AI

5.0(1)
studied byStudied by 3 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/127

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

128 Terms

1
New cards

what is a language model

A type of machine learning model that predicts the probability of a word based on 1 or more preceding words (or characters) - using conditional probability

2
New cards

what is the conditional probability that language models use

P(Text | Preceding Text)

has implicit order

3
New cards

what are QA systems

Computer programs that use natural language processing (NLP) techniques to understand and answer questions posed by users.

4
New cards

what are the different types of QA systems

  • open vs. closed domain

  • open vs. closed book

  • conversational vs. mathematical vs. visual

5
New cards

what are open vs. closed domain QA systems

questions on any topic vs. specialised to one topic area

6
New cards

what are open vs. closed book QA systems

access to external data sources vs. rely on training knowledge

7
New cards

what are conversational vs. mathematical vs. visual QA systems

maintain context and continuity over long conversations vs. answer Qs about mathematical logic and reasoning vs. combines NLP with computer vision like describe an image or image accessibility

8
New cards

why are QA systems difficult to create

- need to handle complex language structures

- questions can be phrased in various ways so need advanced NLP understanding

- need to retrieve relevant information from vast amounts of data

- answers need to be verified for reliability

9
New cards

how can LLMs improve QA systems

allows for information retrieval

10
New cards

what are large language models

models that have been trained of a large volume of data (text) in order to predict the next tokens

11
New cards

what are LLMS based on

[[Transformers]] based deep learning architecture - great for natural language processing and can maintain long range dependencies and contexts

12
New cards

what type of media can LLMS accept

LLMs prompts are not restricted to only text, but can include other multi-media like image or audio, but for [[QA Systems]], prompts are primarily text, requires an instruction or a question (optionally, input data and examples).

13
New cards

what are the two types of LLMs

closed: business model requiring license and payment monetisation

open: open to programming community for fine tuning

14
New cards

where can LLMs be used

using LLMs for different industries require different techniques, or using LLMS tuned on industry specific data

15
New cards

what are the cons of LLMs

  • poor adaptability to having out-dated training data

  • too generalist, hard to adapt them to specific use cases and enterprise applications

  • stochastic response - randomly determined to some extent even when re-using the same prompt - deal with this by tweaking the temperature parameter, but this alone is not enough to prevent unwanted variability in responses

16
New cards

what optimisation methods exist for LLM behaviour to be improve

  • fine tuning

  • prompt engineering

  • retrieval augmented generation

17
New cards

what is fine tuning

Adapting a model for a specific task or use case, using the broader knowledge gained from pre-training and honing it towards specific concepts

18
New cards

what are the two types of fine tuning

  • fully fine tuned

  • parameter efficient fine tuning

19
New cards

what is full fine tuning

updating all parameters

20
New cards

what is parameter-efficient fine tuning

  • updates only the most relevant parameters

  • allows for the fine-tuning to occur on lower spec hardware

21
New cards

what are the pipeline steps for fine tuning

  1. pre-training

  2. fine-tuning

  3. quantizing

can also be done using reinforcement learning with human feedback

22
New cards

what is the difference between pre-raining and fine-tuning

Pre-training uses large datasets while fine tuning uses smaller, specialised datasets

23
New cards

what is the goal of fine tuning

adapts pre-trained LLM models with new content to performed better and better suit a particular task or application domain

24
New cards

what is the scope of knowledge in fine tuning

limited to knowledge in base model

25
New cards

how up to date is fine tuning

static - expensive to extend

26
New cards

how transparent is fine tuning

less transparent

27
New cards

what are the pros of fine tuning

  • gives us greater influence in how the model behaves and reacts since the context is baked into the weights instead of being supplemented on top as an extra technique

  • better for speed when running the model - smaller prompt context windows to still get the best responses for specific use cases

  • good for when datasets are available, specific industries that have easily identifiable patterns (specific jargon)

  • can produce highest accuracy in [[Chaubey et al.'s]] comparison paper and outperform their base models

28
New cards

what are the cons of fine tuning

  • information cut-off (outdated)

  • upfront resource cost data preparation and training processes are so compute-intensive and time-consuming -> powerful GPUs running in parallel and large memory to store the fine-tuned LLM

  • poor generalisation, can't be used in multiple application, only for the specific application it was fine tuned for

  • poor datasets will impact the effectiveness of the final model

29
New cards

what is prompt engineering

the practice of designing inputs for generative AI tools that will result in better outputs without altering the model's parameters or expanding its knowledge base

30
New cards

what does prompt engineering require an understanding of to do

  • the domain to incorporate the goal into the prompt like what good or bad responses look like.

  • AI models as different models will respond differently to the same prompt

  • best to create a prompt template that can be programmatically modified according to some dataset or context

31
New cards

what are the types of prompt engineering

  • shot based prompting

  • prompt chaining

  • chain of thought prompting

  • prompt structuring

32
New cards

what are the pros of prompt engineering

  • cost-effective -> least time consuming or resource intensive and basic prompt engineering can be done manually

  • flexible to modify

  • quick to develop - e.g. for rapid deployment scenarios

  • most flexible and effective for open-ended situations where there are a diverse array of potential outputs like image and video generation

33
New cards

what are the cons of prompt engineering

less effective in proving coherent and accurate results compared to more advanced techniques like RAG or fine-tuning

34
New cards

what is shot based prompting

technique that uses a specific number of examples in the prompt to guide the model's output

35
New cards

what are the types of shot based prompting

  • zero shot

  • one shot

  • n shot

36
New cards

what is zero-shot prompting

No examples included in the prompt

37
New cards

what is one-shot prompting

1 example included in the prompt

38
New cards

what is n-shot prompting

n examples included in the prompt

39
New cards

what is prompt chaining

technique of proving a series of prompts to an NLP model to guide it to produce the desired response - where the output of one LLM becomes the input to another

40
New cards

what are simple prompts

prompt that contains a single question or instruction

41
New cards

what are complex prompts

contain multiple instructions or questions where the model needs to perform a series of actions to provide a detailed response.

42
New cards

what is prompt chaining used for

simplify complex prompts

  • instead of giving one complex prompt, split up the complex prompt into multiple single prompts and chain them together

    • e.g. extract and categorise - get output

    • extract first - get output

    • categorise - get output

43
New cards

what are the pros of prompt chaining

  • traceability - allows users to understand the steps required to complete a request than a large monolithic prompt

  • accuracy and clarity - reduce the risk of errors or misunderstandings, can understand better and larger contexts and outperform monolithic prompts

  • troubleshooting - easier to identify at what point in the chain needs adjustments to produce optimal responses, more granular control

  • consistency - forcing the model to follow a series of prompts maintains consistency in tone or style or format which might be needed

  • widely applicable - most tasks can be functionally decomposed (although not all), making chaining a good solution

    • multiple instructions

    • multiple output transformations

    • when models seem to lose focus or forget context

44
New cards

what are the cons of prompt chaining

  • management difficulty - interrelated prompts can be hard to manage when the china becomes long or intricate, more room for error especially when using different AI models

  • hyper-dependency on the first prompt, if the initial prompt is bas then the failures will cascade throughout the rest of the chain

  • time consuming, the quality of each chain link needs to be (iteratively) tested and reviewed

  • longer processing time - multiple LLM calls instead of 1

  • higher costs - each part of the chain requires an LLM call which leads to higher costs for tokens for API usage (especially since prompt chaining produces longer outputs than one big prompt)

45
New cards

what are the applications of prompt chaining

  • structured prompt chains used for customer service chatbots, where prompts are linked together so the system can build upon previous responses and maintain context

  • any sort of multi step tasks like programming development, personalised recommendations, content creation from outline to editing

46
New cards

how can non-ai experts build prompt chains

[[key paper]] - using a visual interface

47
New cards

what is chain of thought prompting

the prompt guides the model to lay out its reasoning process, often resulting in more accurate and interpretable outcomes

48
New cards

what are the different ways to structure a prompt

  • COSTAR framework

  • delimiters to help LLM differentiate between information

  • direct language for more accurate responses

  • system prompts that persist before every response, rather than adding then to each request which the LLM can 'forget' over time, with guardrails to define what the LLM should not dot

49
New cards

what is retrieval-augmented generation (RAG)

an advanced, specific data architecture framework / technique for retrieving facts from an external knowledge base to optimise LLM responses on the most accurate and up-to-date information.

50
New cards

what are the tree parts of RAG

  1. knowledge base

  2. retriever

  3. LLM

51
New cards

what does a basic RAG architecture look like

can be built using langchain

52
New cards

how does RAG work

  1. query: user submit a query to the LLM (initially the prompt goes straight to LLM)

  2. information retrieval: instead, an algorithm or API is used to retrieve relevant information first from the [[Vector Databases]]

  3. integration: combine the retrieved data with the user's query to produce and augmented query, and then give it to the LLM

  4. response: LLM generates an accurate response based on retrieve data and own training knowledge, this allows the LLM to give evidence for why it gave a certain response

53
New cards

what are the different types of RAG

  • naive

  • modular

  • advanced

54
New cards

what is the aim of RAG

improves retrieval and generation and guide the LLM to produce relevant and accurate output

55
New cards

how up to data is RAG

integrates latest knowledge

56
New cards

how transparent is RAG

uses sourced data

57
New cards

what is the resource use of RAG

user driven resource cost, datasets and pipelines to connect the LLM to the dataset need to be constructed

58
New cards

what common issues does RAG solve over base LLMs

  • regular LLMs have no source to derive their responses from, so information can be inaccurate - RAGs are directed to focus on primary source data, so less likely to leak data or hallucinate

  • information given can be out of date, instead of having to retrain the entire model for new information, augment the data store with updated information. much less computational effort is required.

    • any info can be part of the corpus, documents, PDFs, anything relevant to the business operations

  • hallucination from not knowing the answer and giving something random - encourages the LLM to admit when it doesn't know things based on what is or is not available in the vector database (but this can be a negative effect if the initial user prompt doesn't get an answer despite being answerable)

59
New cards

what are the pros of RAG

  • best for dynamic data sources like databases

  • best for trust essential systems (because of sources and less hallucinations) and for systems where accurate, relevant, and current information is essential

    • for ethical concerns

  • good use cases include documentation chatbots

  • allows non-public data to be leveraged

60
New cards

what are the cons of RAG

  • How we determine and select which information needs to be considers, as a poor calculation of similarity between the prompt and sources will produce inaccurate responses

  • not enhancing the base model itself

  • requires extensive data architecture construction and maintenance to build the pipelines between the organisation's data and the LLM

  • need to have precise prompt engineering to locate the right data and ensure the LLM can handle it properly

  • not as robust to unexpected queries

61
New cards

what is a vector database

A specific type of database that stores multi-dimensional data points instead of scalar values, the vectors represent data in numerous dimensions

62
New cards

how many dimensions do vectors in a vector database have

few to a thousand

63
New cards

what type of content can a vector database store

this content store can be open (internet) or closed (classified documents) depending on the need.

64
New cards

how is a vector database queried

quickly and precisely locate data based on vector proximity or resemblance

  • searches don't just rely on exact matches but can match based on semantic or contextual relevance

  • using measures of similarity to find the closest matches - APP, Approximate Nearest Neighbour

65
New cards

what are examples of vector databases

  • Chroma

  • Pinecone

  • Weaviate

  • Faiss

  • Qdrant

  • Milvus

  • PGVector

66
New cards

what features does Chroma have

open source, easy to implement with langchain and python

<p>open source, easy to implement with langchain and python</p>
67
New cards

what features does Pinecone have

better indexing and search capabilities for high-dimensional data, low-latency search and highly scalable, fully managed. best for rea-time searching

<p>better indexing and search capabilities for high-dimensional data, low-latency search and highly scalable, fully managed. best for rea-time searching </p>
68
New cards

what features does Weaviate have

open-source, can search nearest neighbour of millions of objects in milliseconds, also includes recommendations, summarisations

<p>open-source, can search nearest neighbour of millions of objects in milliseconds, also includes recommendations, summarisations</p>
69
New cards

how to create a vector database

  • need to create [[Embeddings]] for unstructured data, allows us to use all media types including images, text, audio, into the vector database

  • neural networks can be used to convert the text / data chunk into a numerical embedding

70
New cards

what are the pros of vector databases

  • scalability and adaptability - as data grows it can scale across multiple nodes, the system can be tuned to include any new data as needed

  • multi-user support and data privacy - data isolation by ensuring changes made to one data collection are unseen unless explicitly shared

  • flexible for development if they offer a comprehensive set of APIs and SKDs

71
New cards

what is an embedding

a vector (multi-numerical) representations of words/phrases that capture the semantic meanings in a high-dimensional space

72
New cards

how is the relatedness of vectors measured

based on how close the embeddings are in the dimensional space -> closer together = semantically similar / related and vice versa

  • aka. each word represented over a set of numbers e.g. a 3d embedding will have 3 different numbers to make up a piece of text - in real applications there are many more axis than 3

73
New cards

what existing embedding libraries are there

existing models include GloVe, Word2Vec to generate the embeddings, and then store the embeddings in the DB

  • OpenAI Embeddings could be best for RAG, since it’s optimised for GPT models and works with Pinecone - API interface, 3 different models to choose from depending on price and embedding length

74
New cards

what is distillation

Training smaller, more efficient (student) models to mimic the behaviour and reasoning patters of the larger models by using it as a teacher - transferring the knowledge and capabilities parameter model into more compact architectures

75
New cards

how does distillation work

Given some unlabelled data, the teacher LLM is asked to label it. This synthetically labelled data is given to train the student model

76
New cards

what are the pros of distillation

  • generates predictions much faster, good for real-time applications or resource-constrained devices

  • reduced parameters = requires fewer computational and environmental resources (energy consumption), easier to store and mange

  • less expensive to host an LLM with fewer parameters via API

77
New cards

what are the cons of distillation

  • predictions are usually not as good as the original LLM's

  • the student is limited to the teacher e.g. generalised LLMS don't do well with specialised tasks

  • still requires lots of unlabelled data

  • the OG LLM labelled data might be sensitive and not usable

  • knowledge loss

78
New cards

what is the key roles of distillation for LLMs

  1. enhancing capabilities

  2. traditional compression for efficiency

  3. emerging trend of self-improvement via self-generate knowledge

79
New cards

what can be done to mitigate knowledge loss in distillation

  • augment the data generated by the teacher model to provide broader range of training examples

  • iterative distillation - can capture more of the teacher's knowledge

  • adjust hyperparameters like temperature and learning rate

    • temperate - smoothness of probability distribution and creativity of responses

    • learning rate - balance speed and stability of training process so the model can converge without under or over fitting

80
New cards

what is quantization

compressing a LLM by converting weights and activation values of high precision data, usually 32-bit floating point (FP32) or 16-bit floating point (FP16), to a lower-precision data, like 8-bit integer (INT8).

81
New cards

what are the most common quantization cases

The two most common quantization cases are float32 -> float16 and float32 -> int8.

82
New cards

what are the types of quantization

  • linear quantization

  • block-wise quantization

83
New cards

what is linear quantization

mapping the range of floating-point values of the original weights to a range of fixed-point values evenly, using the high-precision data type for inference

84
New cards

what is block-wise quantization

quantizing weights in smaller blocks rather than across the entire range, better handles variations within different parts of the model

85
New cards

when can quantization be performed

  • post-training quantization

  • quantization-aware training

86
New cards

what is post-training quantization

quantizes after the LLM has been trained

87
New cards

what is quantization-aware training

  • fine tune on data that can quantization can be done easily one e.g. including rounding or clipping the training data

  • both the quantized and original weights are stores, the quantized version is used for inference but the unquantized version is updated during backpropagation

  • usually higher accuracy than PTQ

88
New cards

what are the pros of quantization

  • faster inference - quicker calculations which reduces latency despite reducing accuracy

  • efficiency - reduced computational and memory costs of an inference

  • allows models to run on embedded devices that only support int types

  • better compatibility with integer operations to support older computers that don’t support floating point operations

89
New cards

what are the cons of quantization

  • - loss of accuracy (often negligible though)

90
New cards

why is the loss of accuracy in quantization okay

LLMs require a lot of processing power and memory so we can sacrifice some (negligible) precision in favour of less processing requirements - balance

91
New cards

how can we evaluate QA systems

  • Benchmarking with metrics

  • LLM performance comparison e.g. latency

  • Expert reviews of output depending on the context

  • Comparing results to existing documents e.g. financial contracts

  • Repeat the same prompt multiple times to access clarity, readability, repeatability - and identify any bias or hallucinations

  • AI judging acting as experts e.g. ChatBot Arena, LLMCompare

92
New cards

what metrics can be used

  • ROUGE metric

  • BLEU metric

  • accuracy

  • perplexity

93
New cards

what is BLEU score

(BiLingual Evaluation Understudy) is a metric for automatically evaluating machine-translated text. A number between zero and one that measures the similarity of the machine-translated text to a set of high quality reference translations

94
New cards

what is perplexity

capture the degree of 'uncertainty' a model has in predicting (ie assigning probabilities to) text

  • higher perplexity means more uncertainty and more incoherent, inaccurate text

95
New cards

why can’t metrics alone be used

metrics cannot capture complexity and understanding of responses so human evaluation is needed to…

  • determine the reasoning and synthesis quality

  • review responses that may be subjective

  • identify out-of-vocabulary words

96
New cards

what can poor performance help us do

areas where the model underperforms signals that more development is needed

97
New cards

what is ROUGE metric

the F1 score (overlap) for N-gram precision and recall

98
New cards

what is an n-gram

sequence of N words

  • e.g. can be ROUGE-1 or ROUGE-2 for N word sequences

##for unigrams it is

(I),(love),(Machine),(Learning)

##for bigrams it is

(I love),(love Machine),(Machine Learning)

99
New cards

what is recall

quality of accurate n-grams

100
New cards

how do you calculate recall