1/18
A collection of flashcards covering key concepts from a lecture on matrix calculus and supervised learning, including definitions and explanations related to the topics discussed.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Homework Due Date
Homework one is due on September 21.
Supervised Learning
A machine learning task where a model is trained on labeled data input-output pairs.
Training Stimuli
Ordered pairs in supervised learning, such as (si, yi), representing input patterns and their corresponding desired responses.
Binary Classification
A classification task where the output is one of two values, typically 0 or 1.
Likelihood Function
A function that measures the probability of obtaining the observed data under a specific statistical model.
Maximum Likelihood Estimation
A method of estimating the parameters of a statistical model that maximizes the likelihood function.
Loss Function
A function that measures the cost associated with a model's prediction errors.
Gradient Descent
An iterative optimization algorithm used to minimize a function by adjusting parameters in the opposite direction of the gradient.
Adaptive Gradient Descent
A variant of gradient descent that adjusts the learning rate based on historical gradients.
Softmax Function
A function that converts raw output scores (logits) into probabilities that sum to 1, often used in multi-class classification problems.
Cross Entropy Function
A loss function commonly used in classification tasks, measuring the difference between two probability distributions.
Sigmoidal Function
A mathematical function used to map predictions to probabilities, often described as a logistic function.
Empirical Risk Function
A function used to quantify the risk based on training data, typically associated with the average error of predictions.
Gradient of the Loss Function
The vector of partial derivatives that indicates the direction in which the loss function increases, guiding optimization.
Mini-batch Learning
A training approach that splits data into small batches, combining benefits of both batch and stochastic gradient descent.
One-hot Encoding
A method of representing categorical variables as binary vectors, where each vector has a 1 for the active category and 0s elsewhere.
Output Unit in Neural Networks
The final layer in a neural network that produces predictions based on input data.
Gaussian Distribution
A continuous probability distribution characterized by a bell-shaped curve, defined by its mean and variance.
Mean Squared Error
A common loss function used for regression tasks, measuring the average of the squares of the errors.