AI

studied byStudied by 11 people
0.0(0)
Get a hint
Hint

Generative Models

1 / 50

51 Terms

1

Generative Models

type of machine learning model that is used to generate new data samples based on a training set

New cards
2

Discriminative Models

type of machine learning models that  separate the data points into different classes and learn the boundaries.

New cards
3

Language model

a type of machine learning model trained to conduct a probability distribution over words, a model tries to predict the next most appropriate word to fill in a blank space in a sentence or phrase, based on the context of the given text.

New cards
4

Metaverse

A virtual-reality space where users can interact with a computer-generated environment and other users. It uses two techniques: VR and AR.

New cards
5

Purpose of Generative Models

Model data distribution

New cards
6

Purpose of Discriminative Models

Model conditional probability of labels given data

New cards
7

Use Cases of Generative Models

Data generation, denoising, unsupervised learning

New cards
8

Use Cases of Discriminative Models

Classification, supervised learning tasks

New cards
9

Training Focus of Generative Models

Maximize probability of observed data, Capture data structure

New cards
10

Training Focus of Discriminative Models

Learn decision boundary, Differentiate between classes

New cards
11

Linear regression

is a statistical technique used to model the relationship between a dependent variable and one or more independent variables

New cards
12

Implementing linear regression using the scikit-learn library

Step 1: Importing the libraries/dataset

Step 2: Data pre-processing

Step 3: Splitting the dataset into training data - validate data and test data

Step 4: Train model

Step 5: Evaluate the model

New cards
13

k-NN

a simple machine learning technique, used for classification and regression tasks. When make a prediction for a new data point, it looks at the k closest data points from the training dataset.

New cards
14

Pros of k-NN

  • Simplistic algorithm — uses only value of K (odd number) and the distance function (Euclidean, as mentioned today).

  • Efficient method for small datasets.

  • Utilises “Lazy Learning.” In doing so, the training dataset is stored and is used only when making predictions, therefore making it quicker than Support Vector Machines (SVMs) and Linear Regression.

New cards
15

Cons of k-NN

Large datasets take longer to process.

Requires feature scaling.

Inability to do will result in wrongful predictions.

Noisy data can result in overfitting or underfitting of data.

New cards
16

Classification

a supervised machine learning method where the model tries to predict the correct label of a given input data

New cards
17

Regression

a supervised machine learning technique which is used to predict continuous values

a method for understanding the relationship between independent variables and a dependent variable

New cards
18

Type of Linear regression

Simple linear regression

Multiple linear regression

New cards
19

Difference between classification and regression

Classification Task: predict label

Regression Task: predict specific value

New cards
20

Overfitting

An undesirable machine learning behavior that occurs when the model gives accurate predictions for training data but not for new data

New cards
21

How to solve overfitting

Increase training data, simplify model architecture, regularize model parameters.

New cards
22

How to solve underfitting

Increase model complexity.

Increase the number of features, performing feature engineering.

Remove noise from the data.

Increase the number of epochs or increase the duration of training to get better results.

New cards
23

Underfitting

an undesirable machine learning behavior that occurs when a model is too simple to capture data complexities. It represents the inability of the model to learn the training data effectively result in poor performance both on the training and testing data

New cards
24

Difference between overfitting & underfitting

Overfitting:
Model is too complex, needs to reduce complexity
Perform well on training data and poorly on unseen data
Training accuracy is good, but validation accuracy is poor
Happens when we train model with a lot of noisy datasets
Low bias, high variance

Underfitting:
Model is too simple, needs to increase complexity
Perform poorly on both training data and unseen data
Both training accuracy and validation accuracy are poor
Happens when we have very small amount of data
High bias, low variance

New cards
25

Reasons for Overfitting

High variance and low bias

The model is too complex

The size of the training data is not enough

New cards
26

Reasons for underfitting

High bias and low variance

The model is too simple.

Training data is not cleaned and also contains noise in it.

New cards
27

neural network

A computational model inspired by the human brain, consisting of interconnected nodes called neurons. It learns from data through a process called training, adjusting the strength of connections between neurons to improve performance. It is used for tasks such as pattern recognition, classification, and regression.

New cards
28

example for neural network

speech and image recognition, spam email filtering, finance, and medical diagnosis

New cards
29

Advantages of neural networks

  • Parallel processing: Can handle multiple tasks at one time

  • Adaptability: Can learn and improve from experience

  • Non-linearity: Can model complex relationships

  • Fault tolerance: Can still work even with damaged nodes

  • Real-time processing: Can make quick, real-time predictions

New cards
30

multilayer perceptron (MLP)

a type of artificial neural network that consists of multiple layers of interconnected nodes, known as neurons. It is commonly used for tasks: classification and regression.

New cards
31

Activation function

is a function that calculates the output of a neuron and decides whether a neuron should be activated or not.

It helps in decision-making by assigning weights to inputs and producing an output signal.

The role of the Activation Function is to get output from a set of input values to feed to a node

Ex: sigmoid, ReLU, and tanh.

New cards
32

Sigmoid

mathematical function having a characteristic S-shaped curve or sigmoid curve

New cards
33

ReLU

is an non-linear activation function that will output the input directly if it is positive, otherwise, it will output zero

New cards
34

Tanh

similar to the sigmoid activation function and has the same S-shape. This function takes any real value as input and outputs values in the range -1 to 1

New cards
35

Loss function

a function that calculates the error between the actual output and the desired output in the neural network

New cards
36

Gradient descent

an optimization algorithm for finding a local minimum of a differentiable function.

simply used to find the values of a function's parameters (coefficients) that minimize a cost function as far as possible.

New cards
37

Backpropagation learning algorithm

calculates the output by forward calculations given the input,

then calculates the error between the actual output and the desired output.

Aiming to minimize the mean squared error (MSE)

New cards
38

GPT1

is a language model introduced in 2018. It uses unsupervised learning to pre-train on a large warehouse of text data and can generate coherent and contextually relevant text. It has 117 million parameters and is capable of performing tasks like text completion and text generation.

New cards
39

GPT2

is a large language model chatbot developed by OpenAI. It is a transformer-based model with 1.5 billion parameters, trained on a massive dataset of text and code. It can generate text, translate languages, write different kinds of content, and answer questions

New cards
40

GPT3

is a state-of-the-art language processing AI model developed by OpenAI. It is known for its impressive ability to generate human-like text. It has been trained on a massive amount of internet text data and can perform a wide range of language-related tasks, including translation, question-answering, and text generation. It consists of 175 billion parameters, making it one of the largest language models ever created.

New cards
41

Main difference between GPT1 2 and 3

Main difference between GPT-1, GPT-2, and GPT-3: Scaling. GPT-1 had 117M parameters, GPT-2 had 1.5B parameters, and GPT-3 has 175B parameters. The increase in parameters allows for more complex and better language generation, making GPT-3 the most powerful language model to date.

New cards
42

Difference between GPT and chatGPT and OpenAI

GPT is a type of large language model developed by OpenAI. It can be used for tasks like generating text, translating languages, writing many kinds of content, and answering questions.

ChatGPT is an AI chatbot developed by OpenAI. It is a variance of GPT and is designed to be more conversational than other LLMs.

OpenAI is a research laboratory that develops and publishes research on AI

New cards
43

Full Batch Learning

Training a machine learning model using the entire dataset in each iteration
Calculate the loss function's gradient by considering all the training examples at once
Expensive for large datasets
Provides accurate parameter updates

New cards
44

Mini-Batch Learning

A training technique in machine learning where the dataset is divided into smaller subsets called mini-batches.
Instead of updating the model after each individual data point, the model is updated after processing each mini-batch.
This approach balances computational efficiency and model optimization, making it suitable for large datasets.

New cards
45

Optimizer

A software tool that improves efficiency and performance by minimizing resource usage and maximizing output.

New cards
46

Hyper parameter

are parameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning.

New cards
47

Mean Squared Error (MSE)

Measures the average squared difference between predicted and actual values.

New cards
48

Binary Cross-Entropy

Used in binary classification problems to measure the dissimilarity between predicted and actual class probabilities.

New cards
49

Categorical Cross-Entropy

Used in multi-class classification problems to measure the dissimilarity between predicted and actual class probabilities.

New cards
50

Mean Absolute Error (MAE)

Measures the average absolute difference between predicted and actual values.

New cards
51

List some loss functions

Mean Squared Error (MSE), Binary Cross-Entropy,
Categorical Cross-Entropy, Mean Absolute Error (MAE), Hinge Loss, Log Loss, Kullback-Leibler Divergence

New cards

Explore top notes

note Note
studied byStudied by 18 people
... ago
5.0(1)
note Note
studied byStudied by 36 people
... ago
5.0(1)
note Note
studied byStudied by 9 people
... ago
5.0(1)
note Note
studied byStudied by 22 people
... ago
5.0(1)
note Note
studied byStudied by 6 people
... ago
5.0(1)
note Note
studied byStudied by 5 people
... ago
5.0(1)
note Note
studied byStudied by 12 people
... ago
5.0(1)
note Note
studied byStudied by 91 people
... ago
5.0(2)

Explore top flashcards

flashcards Flashcard (54)
studied byStudied by 33 people
... ago
5.0(1)
flashcards Flashcard (166)
studied byStudied by 76 people
... ago
5.0(2)
flashcards Flashcard (30)
studied byStudied by 1 person
... ago
5.0(1)
flashcards Flashcard (30)
studied byStudied by 5 people
... ago
5.0(1)
flashcards Flashcard (135)
studied byStudied by 2 people
... ago
5.0(1)
flashcards Flashcard (71)
studied byStudied by 3 people
... ago
5.0(1)
flashcards Flashcard (303)
studied byStudied by 15 people
... ago
5.0(1)
flashcards Flashcard (26)
studied byStudied by 20 people
... ago
5.0(2)
robot