Generative Models
type of machine learning model that is used to generate new data samples based on a training set
Discriminative Models
type of machine learning models that separate the data points into different classes and learn the boundaries.
Language model
a type of machine learning model trained to conduct a probability distribution over words, a model tries to predict the next most appropriate word to fill in a blank space in a sentence or phrase, based on the context of the given text.
Metaverse
A virtual-reality space where users can interact with a computer-generated environment and other users. It uses two techniques: VR and AR.
Purpose of Generative Models
Model data distribution
Purpose of Discriminative Models
Model conditional probability of labels given data
Use Cases of Generative Models
Data generation, denoising, unsupervised learning
Use Cases of Discriminative Models
Classification, supervised learning tasks
Training Focus of Generative Models
Maximize probability of observed data, Capture data structure
Training Focus of Discriminative Models
Learn decision boundary, Differentiate between classes
Linear regression
is a statistical technique used to model the relationship between a dependent variable and one or more independent variables
Implementing linear regression using the scikit-learn library
Step 1: Importing the libraries/dataset
Step 2: Data pre-processing
Step 3: Splitting the dataset into training data - validate data and test data
Step 4: Train model
Step 5: Evaluate the model
k-NN
a simple machine learning technique, used for classification and regression tasks. When make a prediction for a new data point, it looks at the k closest data points from the training dataset.
Pros of k-NN
Simplistic algorithm — uses only value of K (odd number) and the distance function (Euclidean, as mentioned today).
Efficient method for small datasets.
Utilises “Lazy Learning.” In doing so, the training dataset is stored and is used only when making predictions, therefore making it quicker than Support Vector Machines (SVMs) and Linear Regression.
Cons of k-NN
Large datasets take longer to process.
Requires feature scaling.
Inability to do will result in wrongful predictions.
Noisy data can result in overfitting or underfitting of data.
Classification
a supervised machine learning method where the model tries to predict the correct label of a given input data
Regression
a supervised machine learning technique which is used to predict continuous values
a method for understanding the relationship between independent variables and a dependent variable
Type of Linear regression
Simple linear regression
Multiple linear regression
Difference between classification and regression
Classification Task: predict label
Regression Task: predict specific value
Overfitting
An undesirable machine learning behavior that occurs when the model gives accurate predictions for training data but not for new data
How to solve overfitting
Increase training data, simplify model architecture, regularize model parameters.
How to solve underfitting
Increase model complexity.
Increase the number of features, performing feature engineering.
Remove noise from the data.
Increase the number of epochs or increase the duration of training to get better results.
Underfitting
an undesirable machine learning behavior that occurs when a model is too simple to capture data complexities. It represents the inability of the model to learn the training data effectively result in poor performance both on the training and testing data
Difference between overfitting & underfitting
Overfitting:
Model is too complex, needs to reduce complexity
Perform well on training data and poorly on unseen data
Training accuracy is good, but validation accuracy is poor
Happens when we train model with a lot of noisy datasets
Low bias, high variance
Underfitting:
Model is too simple, needs to increase complexity
Perform poorly on both training data and unseen data
Both training accuracy and validation accuracy are poor
Happens when we have very small amount of data
High bias, low variance
Reasons for Overfitting
High variance and low bias
The model is too complex
The size of the training data is not enough
Reasons for underfitting
High bias and low variance
The model is too simple.
Training data is not cleaned and also contains noise in it.
neural network
A computational model inspired by the human brain, consisting of interconnected nodes called neurons. It learns from data through a process called training, adjusting the strength of connections between neurons to improve performance. It is used for tasks such as pattern recognition, classification, and regression.
example for neural network
speech and image recognition, spam email filtering, finance, and medical diagnosis
Advantages of neural networks
Parallel processing: Can handle multiple tasks at one time
Adaptability: Can learn and improve from experience
Non-linearity: Can model complex relationships
Fault tolerance: Can still work even with damaged nodes
Real-time processing: Can make quick, real-time predictions
multilayer perceptron (MLP)
a type of artificial neural network that consists of multiple layers of interconnected nodes, known as neurons. It is commonly used for tasks: classification and regression.
Activation function
is a function that calculates the output of a neuron and decides whether a neuron should be activated or not.
It helps in decision-making by assigning weights to inputs and producing an output signal.
The role of the Activation Function is to get output from a set of input values to feed to a node
Ex: sigmoid, ReLU, and tanh.
Sigmoid
mathematical function having a characteristic S-shaped curve or sigmoid curve
ReLU
is an non-linear activation function that will output the input directly if it is positive, otherwise, it will output zero
Tanh
similar to the sigmoid activation function and has the same S-shape. This function takes any real value as input and outputs values in the range -1 to 1
Loss function
a function that calculates the error between the actual output and the desired output in the neural network
Gradient descent
an optimization algorithm for finding a local minimum of a differentiable function.
simply used to find the values of a function's parameters (coefficients) that minimize a cost function as far as possible.
Backpropagation learning algorithm
calculates the output by forward calculations given the input,
then calculates the error between the actual output and the desired output.
Aiming to minimize the mean squared error (MSE)
GPT1
is a language model introduced in 2018. It uses unsupervised learning to pre-train on a large warehouse of text data and can generate coherent and contextually relevant text. It has 117 million parameters and is capable of performing tasks like text completion and text generation.
GPT2
is a large language model chatbot developed by OpenAI. It is a transformer-based model with 1.5 billion parameters, trained on a massive dataset of text and code. It can generate text, translate languages, write different kinds of content, and answer questions
GPT3
is a state-of-the-art language processing AI model developed by OpenAI. It is known for its impressive ability to generate human-like text. It has been trained on a massive amount of internet text data and can perform a wide range of language-related tasks, including translation, question-answering, and text generation. It consists of 175 billion parameters, making it one of the largest language models ever created.
Main difference between GPT1 2 and 3
Main difference between GPT-1, GPT-2, and GPT-3: Scaling. GPT-1 had 117M parameters, GPT-2 had 1.5B parameters, and GPT-3 has 175B parameters. The increase in parameters allows for more complex and better language generation, making GPT-3 the most powerful language model to date.
Difference between GPT and chatGPT and OpenAI
GPT is a type of large language model developed by OpenAI. It can be used for tasks like generating text, translating languages, writing many kinds of content, and answering questions.
ChatGPT is an AI chatbot developed by OpenAI. It is a variance of GPT and is designed to be more conversational than other LLMs.
OpenAI is a research laboratory that develops and publishes research on AI
Full Batch Learning
Training a machine learning model using the entire dataset in each iteration
Calculate the loss function's gradient by considering all the training examples at once
Expensive for large datasets
Provides accurate parameter updates
Mini-Batch Learning
A training technique in machine learning where the dataset is divided into smaller subsets called mini-batches.
Instead of updating the model after each individual data point, the model is updated after processing each mini-batch.
This approach balances computational efficiency and model optimization, making it suitable for large datasets.
Optimizer
A software tool that improves efficiency and performance by minimizing resource usage and maximizing output.
Hyper parameter
are parameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning.
Mean Squared Error (MSE)
Measures the average squared difference between predicted and actual values.
Binary Cross-Entropy
Used in binary classification problems to measure the dissimilarity between predicted and actual class probabilities.
Categorical Cross-Entropy
Used in multi-class classification problems to measure the dissimilarity between predicted and actual class probabilities.
Mean Absolute Error (MAE)
Measures the average absolute difference between predicted and actual values.
List some loss functions
Mean Squared Error (MSE), Binary Cross-Entropy,
Categorical Cross-Entropy, Mean Absolute Error (MAE), Hinge Loss, Log Loss, Kullback-Leibler Divergence