1/71
These 72 vocabulary flashcards cover essential terms and definitions from the UT Dallas lecture on machine-learning algorithms, focusing on neural networks, activation functions, loss metrics, optimization, and convolutional architectures.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Artificial Intelligence
Any technique that enables computers to mimic human behavior or intelligence-related tasks.
Machine Learning
The ability of a computer system to learn from data without being explicitly programmed for each rule.
Deep Learning
A subset of machine learning that extracts patterns directly from raw data using multi-layer neural networks.
Big Data
Extremely large data sets whose size and complexity enable modern learning algorithms to perform well.
TensorFlow
Google’s open-source software library for numerical computation and large-scale machine learning.
PyTorch
Facebook’s open-source deep-learning framework that offers dynamic computation graphs and GPU acceleration.
Perceptron
The simplest feed-forward neural unit that combines inputs with weights, adds a bias, then applies a non-linear activation.
Forward Propagation
The process of computing outputs by passing inputs through the network’s layers from input to output.
Bias (Neural Networks)
An additional learnable parameter that shifts the activation function, improving model flexibility.
Weight (Neural Networks)
A learnable coefficient that scales input values; the core parameters adjusted during training.
Activation Function
A non-linear function (e.g., ReLU, sigmoid) applied to neurons to enable complex pattern learning.
Sigmoid Function
An S-shaped activation g(z)=1/(1+e^{-z}) that outputs values between 0 and 1, useful for binary classification.
Hyperbolic Tangent (tanh)
Activation g(z)=tanh(z) producing outputs between −1 and 1; zero-centered alternative to sigmoid.
Rectified Linear Unit (ReLU)
Activation g(z)=max(0,z) that keeps positive inputs and sets negatives to zero, speeding convergence.
Maxout
An activation that outputs the maximum of a set of linear functions, generalizing ReLU and alleviating saturation.
Multilayer Perceptron (MLP)
A feed-forward neural network with one or more hidden layers between input and output.
Input Layer
The first layer of a network that receives raw feature vectors from the data set.
Hidden Layer
Intermediate layer(s) of neurons that learn internal feature representations.
Output Layer
The final layer that produces predictions such as probabilities or continuous values.
Loss Function
A metric that quantifies prediction error; minimized during training to improve performance.
Mean Squared Error (MSE)
Loss defined as the average of squared differences between targets and predictions: (1/n)Σ(y−ŷ)².
Mean Absolute Error (MAE)
Loss defined as the average of absolute differences between targets and predictions: (1/n)Σ|y−ŷ|.
Binary Cross-Entropy (Log Loss)
Loss for binary classification: −[y log ŷ + (1−y) log(1−ŷ)].
Categorical Cross-Entropy
Loss that measures prediction error over multiple mutually exclusive classes.
Sparse Categorical Cross-Entropy
Cross-entropy formulation that expects integer class labels instead of one-hot vectors.
Gradient Descent
An optimization algorithm that updates parameters in the direction of the negative gradient of the loss.
Learning Rate
A hyperparameter controlling the step size during gradient descent updates.
Weight Update Rule
Parameter adjustment formula w ← w − η ∂L/∂w, where η is the learning rate.
Derivative
The instantaneous rate of change of a function; foundation for optimization in neural networks.
Partial Derivative
The derivative of a multivariable function with respect to one variable while keeping others constant.
Cost Function
Overall measure of model error, often the average loss across the entire training set.
Overfitting
When a model learns noise and specific details of training data, harming generalization to new data.
Convolutional Neural Network (CNN)
A neural architecture that employs convolutional layers to automatically learn spatial hierarchies of features from images.
Convolution Operation
Sliding a filter over input data, performing element-wise multiplication and summing to extract local patterns.
Filter (Kernel)
A small matrix of weights applied during convolution to detect specific features such as edges or textures.
Feature Map
The output matrix produced after applying a filter over the input through convolution.
Parameter Sharing
Reusing the same filter weights across different spatial locations, greatly reducing model parameters.
Local Connectivity
Each neuron connects only to a local region of the previous layer, capturing spatially local patterns.
Pooling Layer
A layer that down-samples feature maps, reducing dimensionality and computation.
Max Pooling
Pooling method that keeps the maximum value within each sub-region of the feature map.
Spatial Invariance
Model property enabling recognition of objects regardless of their position or small deformations in the image.
Representation Learning
The automatic discovery of useful feature hierarchies directly from raw data.
Fully Connected Layer
A dense layer where every neuron is connected to all activations from the previous layer.
Softmax Function
Activation that converts logits into a probability distribution over multiple classes.
Flatten Layer
Operation that reshapes multidimensional feature maps into a one-dimensional vector for dense layers.
Image as Matrix
Concept that digital images are arrays of integers (0-255) representing pixel intensities.
Low-Level Features
Basic patterns like edges or corners learned in early convolutional layers.
Mid-Level Features
Intermediate patterns such as eyes, noses, wheels, or windows learned by deeper layers.
High-Level Features
Complex, task-specific concepts like full faces or objects formed in the deepest layers.
Downsampling
Process of reducing spatial resolution, typically via pooling, to decrease computation and encourage invariance.
Upsampling (Transposed Convolution)
Operation that increases spatial resolution, used to reconstruct high-resolution outputs such as segmentation maps.
Fully Convolutional Network (FCN)
A network composed only of convolutional and upsampling layers, used for tasks like semantic segmentation.
Object Detection
Task of identifying and localizing multiple objects within an image by drawing bounding boxes and classifying them.
Region Proposal
A candidate bounding box likely to contain an object, used as input for detection pipelines.
R-CNN
Region-based CNN that classifies region proposals but suffers from slow inference due to separate CNN passes.
Faster R-CNN
Improved detection model that learns region proposals with an internal network and runs a single CNN pass per image.
Region Proposal Network (RPN)
Sub-network in Faster R-CNN that generates object region candidates directly from convolutional feature maps.
Semantic Segmentation
Pixel-wise classification task assigning every image pixel to a semantic class label.
Receptive Field
The region of the input image that influences the activation of a particular neuron.
Stride
The number of pixels a filter moves at each step during convolution or pooling operations.
Slope
In calculus, the change in y divided by change in x; generalized in ML as the derivative at a point.
Gradient
Vector of partial derivatives indicating the direction and magnitude of the steepest loss increase.
Sparse Connections
Network property where each neuron connects to only a subset of previous layer activations, reducing parameters.
Pooling Benefits
Dimensionality reduction, decreased overfitting, and tolerance to small spatial distortions.
Channel (Feature Depth)
The third dimension in image tensors representing color channels or multiple feature maps.
Classification
Predictive task where the output variable represents discrete class labels.
Regression
Predictive task where the output variable is continuous, such as a real-valued number.
Probability Output
Model output between 0 and 1 indicating confidence in a prediction, often produced by sigmoid or softmax.
Non-Linear Activation
Function that introduces non-linearity, enabling networks to learn complex, non-linear mappings.
Hyperparameter
A configuration variable (e.g., learning rate, filter size) set before training and not learned from data.
Hardware Acceleration (GPU)
Use of specialized hardware to perform parallel computations, drastically speeding up deep-learning training.
Backpropagation
Algorithm that computes gradients of the loss with respect to each parameter by reverse traversal of the network.