1/103
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Dimensionality Reduction
The process of reducing the number of dimensions (attributes) of a dataset to improve analysis and visualization.
Principal Component Analysis (PCA)
A dimensionality reduction technique that projects data onto a lower-dimensional space while maximizing the variance of the projected data.
Eigenvalues
Values that represent the variance captured by each principal component in PCA.
Eigenvectors
Vectors that define the directions of the axes in the PCA transformed space.
Covariance Matrix
A matrix that indicates the extent to which two variables change together, used in PCA to find eigenvalues and eigenvectors.
Singular Value Decomposition (SVD)
A method of decomposing a matrix into three other matrices, used as an alternative to eigendecomposition for dimensionality reduction.
Bag-of-Words Model
A method of transforming text into numerical form by counting occurrences of words.
Tokenization
The process of breaking down text into individual words or tokens.
Stopwords
Commonly used words in a language that carry little semantic meaning and are often removed in text preprocessing.
Latent Semantic Analysis (LSA)
A technique that uses SVD to reduce dimensions in text data and uncover semantic structures.
Polysemy
The phenomenon where a word has multiple meanings depending on context.
Distributional Semantics
The theory that words that appear in similar contexts tend to have similar meanings.
Topic Models
Algorithms that cluster words and documents into groups (or topics) based on their distributions.
Latent Dirichlet Allocation (LDA)
A generative probabilistic model for collections of discrete data such as text, used for topic modeling.
Jensen-Shannon Divergence
A method of measuring the similarity between two probability distributions over the same variable.
Term Frequency-Inverse Document Frequency (tf-idf)
A numerical statistic that reflects how important a word is to a document in a collection of documents.
Document-Term Matrix
A matrix representation of document data, where rows represent documents and columns represent terms; entries denote term occurrences.
Unsupervised learning
A type of machine learning where the model learns patterns from unlabelled data without target variables.
Clustering
An unsupervised learning technique that involves partitioning data into distinct groups based on similarity.
Latent variables
Unobserved or hidden variables that can be inferred from observed data and are used to identify structures in a dataset.
k-means algorithm
A clustering method that assigns data points to one of k clusters by minimizing the distances from points to cluster centroids.
Euclidean distance
A commonly used distance metric that measures the straight line distance between two points in Euclidean space.
Centroid
The mean point of a cluster in clustering algorithms, representing the center of that cluster.
Hard clustering
A type of clustering where each data point is assigned to exactly one cluster.
Soft clustering
A type of clustering where a data point can belong to multiple clusters with varying membership degrees.
Expectation-Maximization (EM) algorithm
An iterative method to find maximum likelihood estimates for models with latent variables.
Gaussian mixture model
A probabilistic model that assumes all data points are generated from a mixture of several Gaussian distributions.
Marginal probability
The probability of a single random variable without consideration of other random variables.
Joint probability
The probability of two random variables occurring simultaneously.
Conditional probability
The probability of one event occurring given that another event has occurred.
Bayes' theorem
A mathematical formula that expresses the probability of an event based on prior knowledge of conditions related to the event.
Image segmentation
The process of partitioning an image into multiple segments or regions, often using clustering techniques.
Log-likelihood
A measure of how well a statistical model describes the observed data, usually expressed on a logarithmic scale.
Convolutional Neural Networks (CNNs)
A type of neural network that utilizes spatially local connections and replicated patterns of weights across units.
Image Classification
The process of taking an image as input and outputting what is depicted in the image.
Viewpoint Variation
A challenge in image classification where the same object may appear differently based on its orientation relative to the camera.
Deformation
A challenge where many objects may be presented in various configurations, affecting recognition accuracy.
Occlusion
A situation when objects are partially hidden behind other objects, complicating their identification.
Pooling
A technique in CNNs that summarizes and condenses a region of feature maps, typically through operations like max-pooling or average-pooling.
Recurrent Neural Networks (RNNs)
A type of neural network designed to process sequences of data, allowing cycles in computation to account for temporal dependencies.
Long Short-Term Memory (LSTM)
A specialized form of RNN that includes gating mechanisms to maintain long-term memory over time.
Autoencoders
An unsupervised artificial neural network architecture used to learn efficient representations of data, consisting of an encoder and a decoder.
Generative Adversarial Networks (GANs)
A architecture comprising two neural networks, the generator and the discriminator, that compete against each other to improve the quality of generated outputs.
Hyperparameter
A parameter that is not learned during model training and is set before the learning process begins, acting like a knob to adjust the model.
Validation Set
An additional set of data used to evaluate how well a model performs after training, helping to prevent overfitting.
Overfitting
A modeling error that occurs when a model learns the details and noise in the training data to the extent that it negatively impacts the performance of the model on new data.
k-fold Cross-Validation
A method where the training data is split into k equally-sized subsets, with each subset used as a validation set once while the others are used for training.
Leave-One-Out Cross-Validation (LOOCV)
A special case of cross-validation where each training example is used as a single validation set while the rest serve as the training set.
Grid Search
A systematic method for selecting hyperparameter combinations by evaluating all possible combinations within a specified parameter grid.
Random Sampling
A method of selecting hyperparameter combinations at random rather than systematically, useful when there is little intuition about parameter settings.
Bayesian Optimization
A method that treats hyperparameter tuning as a machine learning problem, using prior information to evaluate new hyperparameter configurations.
Training Set
The portion of the dataset used to train the model, allowing it to learn patterns and relationships.
Testing Set
The data used to evaluate the model's performance after it has been trained, measuring how well it generalizes to unseen data.
k-nearest neighbors (k-NN)
A nonparametric method used for classification or regression by finding the k nearest examples in the training data.
Parametric models
Models that summarize training data with a fixed set of parameters, independent of the number of training examples.
Nonparametric models
Models that rely on the data themselves and cannot be characterized by a bounded set of parameters.
Euclidean distance
The straight-line distance between two points in Euclidean space; it is useful when attributes are similar.
Manhattan distance
Also known as city block distance; it measures the distance between points in a grid-based path.
Curse of Dimensionality
A phenomenon where the distance between points increases in high-dimensional spaces, making nearest neighbors less meaningful.
k-dimensional tree (k-d tree)
A balanced binary tree structure that organizes data points in k dimensions, facilitating faster nearest neighbor searches.
Normalization
The process of scaling data to have a mean of zero and a standard deviation of one, often done using z-scores.
Instance-based learning
A type of learning where the model relies on specific instances of the training data rather than general parameters.
Time complexity of k-NN
The computational complexity of finding nearest neighbors, which is O(N) for datasets with N examples.
Logistic Regression
A statistical method for predicting binary classes by using a logistic function.
Linear Classification
A classification approach that models the relationship between input features and classes using linear boundary.
Decision Boundary
A line or surface that separates different classes in a classification problem.
Linearly Separable
A condition where classes can be separated by a linear decision boundary.
Threshold Function
A function that determines the output of a model based on whether a linear function exceeds a certain threshold.
Minimizing Loss
The process of adjusting model parameters to reduce the difference between predicted and actual outcomes.
Perceptron Learning Rule
An algorithm for updating weights in binary classification problems based on prediction errors.
Logistic Function
A sigmoid function that produces outputs between 0 and 1, representing probabilities.
Probabilistic Interpretation
Understanding model outputs as probabilities indicating the likelihood of a class assignment.
One-vs-the-Rest Classifier
A method where multiple binary classifiers distinguish one class against all others.
Confusion Matrix
A table used to evaluate the performance of a classification model by showing true vs predicted classifications.
Sensitivity
The ratio of true positives to the sum of true positives and false negatives, indicating the ability to detect positive instances.
Specificity
The ratio of true negatives to the sum of true negatives and false positives, indicating the ability to identify negative instances.
Precision
The ratio of true positives to the sum of true positives and false positives, indicating the accuracy of positive predictions.
Simple linear regression
A method to model the relationship between one independent variable (x) and a dependent variable (y) by fitting a linear equation.
Loss function
A function that measures the difference between predicted values (ŷ) and actual values (y).
L1 loss
Absolute-value loss defined as L1(y, ŷ) = |y - ŷ|, indicating the magnitude of prediction errors.
L2 loss
Squared-error loss defined as L2(y, ŷ) = (y - ŷ)², which emphasizes larger errors more than smaller ones.
Least squares
A method used in regression analysis that minimizes the sum of the squares of the residuals to find the best-fitting line.
Gradient descent
An iterative optimization algorithm used to minimize the loss function by updating weights incrementally based on the gradient.
Learning rate (α)
A hyperparameter that determines the size of the steps taken towards the minimum of the loss function during optimization.
Stochastic gradient descent (SGD)
A variant of gradient descent where the weights are updated using a randomly selected subset of training examples.
Multivariable linear regression
A type of regression analysis where two or more predictor variables are used to predict the outcome of a response variable.
Regularization
A technique used to prevent overfitting by adding a penalty to large coefficients in the loss function.
Inputs
Also known as features or attributes, typically represented by a vector, representing variables such as house size and abalone weight.
Objective function
A function that measures the performance of the model.
Ground truth
The actual labels or outputs (yi) in a supervised learning task.
Hypothesis
A function h that approximates the true function f in supervised learning.
Classification
A type of supervised learning where the output is categorical.
Regression
A type of supervised learning where the output is a continuous number.
Training set
A set of input-output pairs used to train a model.
Test set
A separate set of (x, y) pairs used to evaluate the performance of a model after training.
Bias
The difference between the model prediction and the actual observed value; high bias can cause underfitting.
Variance
The amount of change in the model due to fluctuations in the training data; high variance can cause overfitting.
Bias-Variance Tradeoff
The balance between bias and variance to minimize total error in predictive models.
Features
Attributes or inputs used in a machine learning model.
Model class
A set of possible models defined by a common structure.