CSIT205 - Generative AI

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/91

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 11:15 AM on 3/19/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

92 Terms

1
New cards

What are the practical applications of generative AI?

  • Generating faces and videos

  • Lung cancer detection and prediction

  • Summarising texts

  • Assisted coding

2
New cards

Image generation

Generating faces and videos and lung cancer prediciton

3
New cards

Image classification

Lung cancer detection

4
New cards

Text summarisation

Summarising texts

5
New cards

Text generation

AI-assisted coding (writing code)

6
New cards

Text evaluation

AI-assisted coding (critiquing)

7
New cards

Artificial Intelligence

  • Refers to systems designed to perform tasks that normally require human intelligence.

  • They can be rule-based or learning-based

  • Rule-based ones follows predefined instructions rather than learning patterns from data

  • Doesn’t necessarily learn from data

8
New cards

Machine Learning (subset of AI)

  • Focuses on pattern learning from data

  • Identifies feature-based patterns to make predictions or decisions

  • Does learn from data (where learning begins in AI)

9
New cards

Deep Learning (subset of ML)

  • Uses layer-based models

  • Multiple layers allow the system to learn increasingly complex patterns

  • Learn from data

  • Essentially ML using layered structures

10
New cards

Generative AI/GenAI (subset of DL)

  • Based on transformer models or diffusion models

  • Generates new outputs rather than only analysing data

  • Most specialised

11
New cards

AI Hierarchy

AI —> ML —> DL —> GenAI

12
New cards

What are the general areas generative AI is applied to?

  • Deep reasoning & general knowledge

  • Coding & software development

  • Multimodal (text + image + video)

  • Large-context

13
New cards

Supervised Learning

  • ML method where a model is trained using labelled data.

  • Each data example has a correct label (e.g. dog image —> Dog and cat image —> Cat)

  • Model learns the pattern between the data and its label

  • Goal: train a model to classify data

  • After learning from labelled examples (e.g. dogs and cats), the model can use learnt patterns to connect new data to the correct label

14
New cards

Unsupervised Learning

  • ML method where the model is trained using data with no labels

  • Model learns to distinguish groups in the data based on patterns

  • Groups similar data together into clusters without predefined labels

  • When a new input is given, the model places it into the most similar group

15
New cards

What is Perceptron?

  • Model that uses a linear function to produce binary outcomes i.e. Yes/No, True/False and 0/1

  • Used for binary classification

  • Learns by updating its weights and bias during training

  • Trained using supervised learning

16
New cards

What components make up a Perceptron model?

  • Inputs x — several input values

  • Weights w — importance assigned to each input

  • Bias b — an additional adjustment term

  • Output o — the final prediction

<ul><li><p>Inputs x — several input values</p></li><li><p>Weights w — importance assigned to each input</p></li><li><p>Bias b — an additional adjustment term</p></li><li><p>Output o — the final prediction</p></li></ul><p></p>
17
New cards

What is the activation function that Perceptron uses?

  • Step function

  • Performs binary classification [0, 1]

    • Output 0 if the linear combination is below the threshold

    • Output 1 if its above the threshold

<ul><li><p>Step function </p></li><li><p>Performs binary classification [0, 1]</p><ul><li><p>Output 0 if the linear combination is below the threshold</p></li><li><p>Output 1 if its above the threshold</p></li></ul></li></ul><p></p>
18
New cards

How does a Perceptron learn?

Weights are updated using the learning rule

<p>Weights are updated using the learning rule</p><p></p>
19
New cards

What are the limitations of Perceptron?

  • Performs only binary classification (0 or 1)

  • Uses a single layer — too simple to represent complex problems

  • Limited learning capability

20
New cards

What is a Neural Network?

  • Multi-layer perceptron that learns weights between neurons

  • Uses stochastic gradient desccent to learn

  • Used for tasks such as object character recognition (OCR)

21
New cards

What is the structure of a neural network?

  • Same components as Perceptron

    • x = input values

    • w = importance assigned to each input

    • b = additional adjustment term

    • o = final prediction

  • 1 input layer

  • Multiple hidden layers

  • 1 output layer

  • Connections between layers have weights and biases, which are updated during learning

  • Information moves from left to right through the network (feed-forward)

<ul><li><p>Same components as Perceptron</p><ul><li><p>x = input values</p></li><li><p>w = importance assigned to each input</p></li><li><p>b = additional adjustment term</p></li><li><p>o = final prediction</p></li></ul></li></ul><p></p><ul><li><p>1 input layer</p></li><li><p>Multiple hidden layers</p></li><li><p>1 output layer</p></li></ul><p></p><ul><li><p>Connections between layers have weights and biases, which are updated during learning</p></li><li><p>Information moves from left to right through the network (feed-forward)</p></li></ul><p></p>
22
New cards

What is the activation function that Neutral Networks use?

  • ReLU Function (Rectified Linear Unit)

  • Allows more gradual changes in the network

  • Output 0 if z <= 0

  • Output z if z > 0

  • where z = xw+b

23
New cards

How do neural networks learn?

Learning is based on minimising prediction error

24
New cards

Neural Networks Learning Rule: Loss Function

  • C(w) e.g. RSME

  • Measures the error between predictions and actual values

25
New cards

Neural Networks Learning Rule: Back-propogation

  • Calculates the gradient (how the cost changes when weights change)

26
New cards

Neural Networks Learning Rule: Stochastic Gradient Descent (SGD)

Updates the weights by moving down the gradient to minimise the cost

27
New cards

What are the limitations of neural networks?

  • Vanishing/exploding gradient — weights can become extremely small or large during backpropogation

  • Cannot handle sequence data (e.g. sentences like “The quick brown fox jumps over the lazy dog“).

28
New cards

Why do some neural networks tasks require memory?

  • Some problems involve sequences of information, where earlier inputs affect later outputs

29
New cards

What are Recurrent Neural Networks (RNNs)?

  • Contain loops within their layers, allowing information to persist

  • Keeps a running memory called the hidden state

  • Allows the model to process sequential data i.e. sentences, speech, time series, video etc

30
New cards

What types of input-output structures can RNNs handle?

  • One-to-one

  • One-to-many

  • Many-to-one

  • Many-to-many

  • Network is unfolded across time to show information flows step by step

31
New cards

What are the limitations of RNNs?

  • Vanishing gradient problem

  • Long-term dependency problem

  • Long sequences are difficult because the model must retain information from earlier inputs

32
New cards

Recurrent Neural Networks: Long Short-Term Memory (LSTM)

  • Special Type of RNN designed to remember information for a long time

  • Created to fix the memory loss problem in RNNs

  • Applications include language understanding e.g. Google Bert and time-series forecasting

33
New cards

How does LSTM improve on RNNs?

Instead of a single hidden state, LSTM uses:

  • Memory block that stores long-term information

  • Gates that control the flow of information

These help address:

  • Vanishing gradients

  • Long-term dependency problems (not completely)

34
New cards

What are Convulutional Neural Networks (CNNs) used for?

Designed to analyse grid-like data i.e. images

35
New cards

What is the basic architecture of a CNN?

  • Convolutional layers that apply kernels/filters to detect features in the data

  • Pooling layers that reduce dimensionality (downsampling)

36
New cards

Why are GPUs important for training CNNs?

  • They enable faster computation so its originally designed for computer graphics but can now be used for general processing

  • It makes the training faster

37
New cards

What architecture did many language generation systems use before transformers?

  • Encoder-decoder architecture

  • Also called sequence-to-sequence learning because it converts one input sequence into another output sequence

  • e.g. an English sentence translated into another language

38
New cards

What does the encoder do in the encoder-decoder architecure?

  • Reads the input sentence

  • Compresses the information into a fix-sized representation called a context vector e.g. 128 dimensions

39
New cards

What does the decoder do in the endcoder-decoder architecture?

  • Takes the context vector

  • Generates the output sequence word-by-word

40
New cards

Why do sequence-to-sequence models need to handle different input and output lengths?

  • Input and output sequences can have different lengths e.g. input english sentence is shorter than the output french translated sentence so models must handle variable-length sequence

  • Needs to understand a full input sentence before generating an output sentence word by word.

Solution

  • Encodes the entire input into a context vector

  • Lets the decoder generate the output sentence

41
New cards

What is the limitation of encoder-decoder models that use only LSTMs?

  • Suffers from long-term dependency problem which happens when the input sequence becomes very long

  • All the input information must be compressed into a single fixed-sized context vector.

  • So some important details may be lost

  • The model may struggle to understand long inputs

42
New cards

Why do LSTM encoder-decoder models struggle with relevance of words?

  • LSTMs consider the entire input sequence, but not all input words are equally important when generating each output and a standard LSTM autoencoder does not know which input parts are most relevant

  • So some irrelvant words may influence the output

  • Model cannot easily focus on the most important parts of the input

43
New cards

What is attention in machine learning models?

  • Allows a model to focus on specific parts of the input data instead of treating all parts equally

  • Model can look at the most relevant words or features when producing an output

  • Improves model’s performance

44
New cards

How does attention help when processing sequences of words?

  • Allows a model to attend to different parts of the input sequence at the same time

  • Model doesn’t treat all previous words as equally important

  • Model gives more weight to the words that matter most for the current prediction

  • Assigns different importance values (weights) to different words in the sequence

45
New cards

Word embedding

  • Way of representing the meaning of words using numbers

  • Each word is converted into a vector (a list of numbers), i.e. fox = [0.01, 0.43…, 0.3]

  • Vectors are learned from large amounts of text which allow the model to capture relationships between words

46
New cards

How is similarity between words measured in word embeddings?

  • Words are represented as vectors in an n-dimensional space

  • The angle or dot product between vectors (measured using cosine similarity) shows how similar the words are

  • If 2 vectors point in similar directions, words are considered more similar in meaning.

47
New cards

What does self-attention do in the encoder?

  • Allows the model to compute attention weights for each word (token) in the input sequence

  • Adjust the meaning of a word depending on the surrounding words in the sentence

  • Weights show how important each word is relative to other words in the same input sentence

  • Model then uses these weights to combine information from different words when creating the representation of each word (vector)

48
New cards

What does self-attention do in the decoder?

  • Computes attention weights for each token in the input sequence

  • These weights show which input words are most important when generating the next word in the output sequence

  • Uses these values to decide which parts of the input sentence to focus on for predicting the next word.

49
New cards

How does self-attention compute a token’s representation?

  1. Applies the attention weights to the input words

  2. Calculates a weighted sum of the input word vectors (features.

This means each word’s representation (vector) can include information from other relevant words in the sentence

  • e.g. the representation of fox may include context from “quick“ and “brown“

50
New cards

What does the self-attention output represent?

  • Produces a set of values called attention weights

  • Each value represents how important a specific input word (token) is when interpreting another token

  • These values help the model determine which words in the input sequence are most relevant

51
New cards

How does the decoder use attention during translation?

  • It starts with a start token (<EOS>) and asks: “Which input words are most important for generating the next word?“

  • Calculates attention scores for each input word, then forms a context vector as a weighted sum of the encoder outputs

  • Context helps the model predict the next word, and the process repeats for the following words.

52
New cards

What are the limitations of single-head attention?

  • Computation time increases as the input sequence becomes longer because attention weights must be calculated for more words (tokens)

  • Limited ability to capture complex relationships between words, since a single attention head can only focus on one type of relationship at a time.

53
New cards

Why do LSTMs struggle with long input sequences?

  • They consider the entire input sequence but not all words are equally important for producing each output word

  • Model may treat irrelavant words as important, making it harder to focus on the most relevant parts of the input

54
New cards

What architecture does a transformer typically use?

  • Encoder: processes the input and maps it into a context representation

  • Decoder: uses the encoder’s representation to generate the output sequence

Both uses a multi-layer architecture for the encoder and decoder

55
New cards

What is multi-head attention in transformers?

  • Allows the model to use multiple attention heads at the same time

  • Each head can focus on different relationships between words in the text

  • Captures more detailed and nuanced relationships in language

56
New cards

Why do transformers allow better parallelisation?

  • Uses multi-head attention, allowing the model to compute multiple attention weight calculations in parallel

  • Helps the model capture different relation types between words at the same time

57
New cards

What components are included in the transformer encoder and decoder layers?

  • Multi-head attention

  • Feed-forward neural networks

  • Add & norm layers for stability

  • Positional encoding to help the model understand word order

58
New cards

What are advantages of transformers?

  • Improved performance compared to traditional recurrent architectures

  • Can handle sequential data with variable-length inputs

  • Can be parallelised easily (e.g. using multiple GPUs and attention heads).

59
New cards

What are the main types of transformer models?

  • Encoder-only models e.g. BERT

  • Encoder-decoder models e.g. T5, BART

  • Decoder-only models e.g. GPT

60
New cards

What is an encoder-only (transformer) model?

  • Simpler architecture

  • Use fixed input length

  • Used for sentiment analysis

  • e.g. BERT

61
New cards

What is an encoder-decoder (transformer) model?

  • Can handle longer sequences and context

  • Used for translation

62
New cards

What is a decoder-only (transformer) model?

  • Simpler architecture

  • Used for text generation and translation

63
New cards

What is a Variational Autoencoder (VAE)?

  • A neural network designed to encode data and reconstruct it as accurately as possible

  • Encoder converts the input e.g. an image into a compressed numerical representation called an embedding vector (Z)

  • This vector captures important features of the input i.e. shapes or textures

  • The decoder then uses this representation to reconstruct the original image.

  • Goal: output image to be as close as possible to the original input

64
New cards

How are autoencoders trained?

  1. Split the dataset into training and test sets

  2. Build the encoder and decoder parts of the autoencoder

  3. Train the model using the training data, where the input and output are the same data

During training:

  • Model tries to reconstruct the same image it receives as input

  • Calculates the difference between the original and reconstructed image (reconstruction loss)

  • Model updates its paramters using backpropogation to reduce this error

65
New cards

How can the decoder in VAE generate new outputs after training?

After training:

  1. A vector (embedding) is defined e.g. x = [2.2, 2.5]

  2. The decoder predicts an output using this vector: decoder.predict(x)

  • The decoder then generates an image based on the vector representation

  • Different points in the embedding space correspond to different image features

66
New cards

What is a limitation of standard autoencoders?

  • When an image is encoded into a vector in the embedding space, the model does not guarantee that nearby vectors represent similar images

  • This means that even if 2 vectors are close in the embedding space, the generated images may still be very different

This happens because

  • Standard autoencoders are trained only to reconstruct each individual input

  • They do not learn relationships between different embedding vectors

67
New cards

How do Variational Autoencoders address its limitation?

  • Instead of representing an image as a single point (vector), they model it as a normal /probabability (Gaussian) distribution

The encoder outputs

  • A mean vectoro (center of the distribution)

  • A variance vector (how spread out the distribution is)

The model then samples a point from this distribution.

This helps create a more structured embedding space for generating outputs

68
New cards

What is a Generative Adversarial Network (GAN)?

  • Consists of 2 neural networks: generator and discriminator

  • They compete which helps the generator produce more realistic outputs

  • Goal: generate new data samples that look similar to real data

  • Often used in unsupervised learning tasks

69
New cards

What is noise?

A vector of random numbers with no meaninga vector of random numbers with no meaning

70
New cards

What is the generator’s role in a GAN?

  • Responsible for creating new data samples that look like real data

How it works:

  • Starts with random noise as input

  • Generator transforms this noise into synthetic da

Training objective:

  • Improves by minimising its loss function

  • Goal: fool the discriminator into thinking the generated data is real

71
New cards

What is the discriminator’s role in a GAN?

  • Determines whether a sample is real data from a training dataset or fake data produced by the generator

  • It receives both real and generated samples and outputs a probability showing how likely the input is real

Training objective

  • Improves by maximising the loss function

  • Tries to correctly detect fake images produced by the generator

72
New cards

How is a GAN trained?

Alternates between updating the generator and the discriminator

  • Generator tries to make a generated sample look real by maximising the function toward 0

  • The discriminator tries to correctly classify generated samples as fake by pushing the function toward 0

Function = D(G(z))

  • z = latent vector used to generate new data

73
New cards

What are some application of GANs?

  • Image generation (e.g. faces/objects)

  • Data augmentation (for tasks like image classification)

  • Anomaly detection (finding unusual data points)

  • Style transfer and colourisation

  • Text-to-image synthesis

74
New cards

What are some limitation of GANs?

  • Mode collapse: generator produces limited variations of the same output

  • Vanishing gradients: generator’s weights become too small during training, making learning difficult

  • Unstable training: discriminator may become too powerful, causing the generator to converge to a poor solution

75
New cards

What is a diffusion model and what can it learn?

  • An unsupervised learning framework

  • Learns to reconstruct images using diffusion and reverse diffusion

  • Models complex distributions, even with noise and uncertainty

  • Main components: encoder, decoder, skip connections

76
New cards

What does the Diffusion model actually learn during training (reverse diffusion)?

  • It learns to predict the noise at each step

  • Then subtracts that noise gradually

  • This repeated refinement produces high-quality and diverse images

77
New cards

What is the role of U-net in Diffusion Models?

  • The main architecture used

  • Does not generate images directly

  • Predicts the noise in a noisy image so the model can remove it

78
New cards

What are the main parts of U-Net (network)?

  1. Noise-level embedding

  2. Encoder

  3. Decoder

  4. Skip connections

79
New cards

What inputs does the U-Net receive?

  • Noise variance (timestep t): tells how much noise is added

  • Noisy image: image at step t

Noise embedding spreads this timestep info across all pixels

80
New cards

What does the U-Net encoder do?

Downsampling Path

  • Takes combined features (images + noise info)

  • Gradually transforms: x0 —> x1 —> … —> xt

  • Reduces spatial size but increases features

  • Captures high-level semantic inormation

81
New cards

What does the U-Net decoder do?

  • Reverse the process: xt —> xt-1 … —> x0

  • Predicts noise and removes it step-by-step

  • Cannot jump directly from noisy to clean image

82
New cards

What are skip connections in U-Net and why are they important?

  • Direct links between encoder and decoder layers

  • Restore lost details like edges, textures, boundaries

  • Help decoder recover spatial information

  • Reduce issues like vanishing gradients

83
New cards

What are the three key components of a Diffusion model?

  • Noise schedule: controls how much noise is added at each step

  • Diffusion process: gradually adds noise to the image over many steps

  • Reverse diffusion: predits and removes noise step-by-step to recover or generate images

84
New cards

What does the Noise schedule do in the Diffusion model?

  • Controls how quickly noise is added

  • Small noise is added gradually over many steps

  • Determines how fast the image becomes corrupted

85
New cards

What happens during the Diffusion process?

  • Noise is iteratively added to the image

  • Image transforms: x0 —> x1 —> x2 —> … —> xt

  • After many steps, the image becomes almost random noise

86
New cards

What happens during Reverse Diffusion?

  • Model learns how noise was added

  • Then predicts and removes noise step-by-step

  • Gradually reconstructs a clean image from noise

87
New cards

What are the main steps in Diffusion model training?

  1. Add noise to images (using noise rate and signal rate)

  2. Train model to predict noise and signal

  3. Remove noise from the image (recover clean version)

  4. Calculate loss (compare prediction vs true values)

  5. Update weights (where learning happens)

88
New cards

What is Nose rate in the Diffusion model?

How much noise is added

89
New cards

What is Signal rate in the Diffusion model?

How much of the original image remains

90
New cards

What does the Diffusion model learn during training?

  • It estimates how much noise is present

  • Predicts noise and signal components

  • Uses this to recover the clean image

91
New cards

What are the main advantage of Diffusion m odels?

92
New cards

Explore top notes

note
Data Acquisition
Updated 1073d ago
0.0(0)
note
Oxidative Phosphorylation
Updated 1191d ago
0.0(0)
note
economics
Updated 416d ago
0.0(0)
note
Tools of Foreign Policy
Updated 1241d ago
0.0(0)
note
Art Notes - Sem 2 2024
Updated 507d ago
0.0(0)
note
Lord of the Flies
Updated 707d ago
0.0(0)
note
Data Acquisition
Updated 1073d ago
0.0(0)
note
Oxidative Phosphorylation
Updated 1191d ago
0.0(0)
note
economics
Updated 416d ago
0.0(0)
note
Tools of Foreign Policy
Updated 1241d ago
0.0(0)
note
Art Notes - Sem 2 2024
Updated 507d ago
0.0(0)
note
Lord of the Flies
Updated 707d ago
0.0(0)

Explore top flashcards

flashcards
Latin quiz 1 review
46
Updated 268d ago
0.0(0)
flashcards
GLW #2
20
Updated 180d ago
0.0(0)
flashcards
ETS RC 2023 - TEST 01 PART 5
130
Updated 913d ago
0.0(0)
flashcards
Unit 8: Clinical Psychology
64
Updated 1079d ago
0.0(0)
flashcards
APUSH Midterm
42
Updated 100d ago
0.0(0)
flashcards
Latin quiz 1 review
46
Updated 268d ago
0.0(0)
flashcards
GLW #2
20
Updated 180d ago
0.0(0)
flashcards
ETS RC 2023 - TEST 01 PART 5
130
Updated 913d ago
0.0(0)
flashcards
Unit 8: Clinical Psychology
64
Updated 1079d ago
0.0(0)
flashcards
APUSH Midterm
42
Updated 100d ago
0.0(0)