Machine Learning Cumulative

0.0(0)
studied byStudied by 2 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/103

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

104 Terms

1
New cards

Dimensionality Reduction

The process of reducing the number of dimensions (attributes) of a dataset to improve analysis and visualization.

2
New cards

Principal Component Analysis (PCA)

A dimensionality reduction technique that projects data onto a lower-dimensional space while maximizing the variance of the projected data.

3
New cards

Eigenvalues

Values that represent the variance captured by each principal component in PCA.

4
New cards

Eigenvectors

Vectors that define the directions of the axes in the PCA transformed space.

5
New cards

Covariance Matrix

A matrix that indicates the extent to which two variables change together, used in PCA to find eigenvalues and eigenvectors.

6
New cards

Singular Value Decomposition (SVD)

A method of decomposing a matrix into three other matrices, used as an alternative to eigendecomposition for dimensionality reduction.

7
New cards

Bag-of-Words Model

A method of transforming text into numerical form by counting occurrences of words.

8
New cards

Tokenization

The process of breaking down text into individual words or tokens.

9
New cards

Stopwords

Commonly used words in a language that carry little semantic meaning and are often removed in text preprocessing.

10
New cards

Latent Semantic Analysis (LSA)

A technique that uses SVD to reduce dimensions in text data and uncover semantic structures.

11
New cards

Polysemy

The phenomenon where a word has multiple meanings depending on context.

12
New cards

Distributional Semantics

The theory that words that appear in similar contexts tend to have similar meanings.

13
New cards

Topic Models

Algorithms that cluster words and documents into groups (or topics) based on their distributions.

14
New cards

Latent Dirichlet Allocation (LDA)

A generative probabilistic model for collections of discrete data such as text, used for topic modeling.

15
New cards

Jensen-Shannon Divergence

A method of measuring the similarity between two probability distributions over the same variable.

16
New cards

Term Frequency-Inverse Document Frequency (tf-idf)

A numerical statistic that reflects how important a word is to a document in a collection of documents.

17
New cards

Document-Term Matrix

A matrix representation of document data, where rows represent documents and columns represent terms; entries denote term occurrences.

18
New cards

Unsupervised learning

A type of machine learning where the model learns patterns from unlabelled data without target variables.

19
New cards

Clustering

An unsupervised learning technique that involves partitioning data into distinct groups based on similarity.

20
New cards

Latent variables

Unobserved or hidden variables that can be inferred from observed data and are used to identify structures in a dataset.

21
New cards

k-means algorithm

A clustering method that assigns data points to one of k clusters by minimizing the distances from points to cluster centroids.

22
New cards

Euclidean distance

A commonly used distance metric that measures the straight line distance between two points in Euclidean space.

23
New cards

Centroid

The mean point of a cluster in clustering algorithms, representing the center of that cluster.

24
New cards

Hard clustering

A type of clustering where each data point is assigned to exactly one cluster.

25
New cards

Soft clustering

A type of clustering where a data point can belong to multiple clusters with varying membership degrees.

26
New cards

Expectation-Maximization (EM) algorithm

An iterative method to find maximum likelihood estimates for models with latent variables.

27
New cards

Gaussian mixture model

A probabilistic model that assumes all data points are generated from a mixture of several Gaussian distributions.

28
New cards

Marginal probability

The probability of a single random variable without consideration of other random variables.

29
New cards

Joint probability

The probability of two random variables occurring simultaneously.

30
New cards

Conditional probability

The probability of one event occurring given that another event has occurred.

31
New cards

Bayes' theorem

A mathematical formula that expresses the probability of an event based on prior knowledge of conditions related to the event.

32
New cards

Image segmentation

The process of partitioning an image into multiple segments or regions, often using clustering techniques.

33
New cards

Log-likelihood

A measure of how well a statistical model describes the observed data, usually expressed on a logarithmic scale.

34
New cards

Convolutional Neural Networks (CNNs)

A type of neural network that utilizes spatially local connections and replicated patterns of weights across units.

35
New cards

Image Classification

The process of taking an image as input and outputting what is depicted in the image.

36
New cards

Viewpoint Variation

A challenge in image classification where the same object may appear differently based on its orientation relative to the camera.

37
New cards

Deformation

A challenge where many objects may be presented in various configurations, affecting recognition accuracy.

38
New cards

Occlusion

A situation when objects are partially hidden behind other objects, complicating their identification.

39
New cards

Pooling

A technique in CNNs that summarizes and condenses a region of feature maps, typically through operations like max-pooling or average-pooling.

40
New cards

Recurrent Neural Networks (RNNs)

A type of neural network designed to process sequences of data, allowing cycles in computation to account for temporal dependencies.

41
New cards

Long Short-Term Memory (LSTM)

A specialized form of RNN that includes gating mechanisms to maintain long-term memory over time.

42
New cards

Autoencoders

An unsupervised artificial neural network architecture used to learn efficient representations of data, consisting of an encoder and a decoder.

43
New cards

Generative Adversarial Networks (GANs)

A architecture comprising two neural networks, the generator and the discriminator, that compete against each other to improve the quality of generated outputs.

44
New cards

Hyperparameter

A parameter that is not learned during model training and is set before the learning process begins, acting like a knob to adjust the model.

45
New cards

Validation Set

An additional set of data used to evaluate how well a model performs after training, helping to prevent overfitting.

46
New cards

Overfitting

A modeling error that occurs when a model learns the details and noise in the training data to the extent that it negatively impacts the performance of the model on new data.

47
New cards

k-fold Cross-Validation

A method where the training data is split into k equally-sized subsets, with each subset used as a validation set once while the others are used for training.

48
New cards

Leave-One-Out Cross-Validation (LOOCV)

A special case of cross-validation where each training example is used as a single validation set while the rest serve as the training set.

49
New cards

Grid Search

A systematic method for selecting hyperparameter combinations by evaluating all possible combinations within a specified parameter grid.

50
New cards

Random Sampling

A method of selecting hyperparameter combinations at random rather than systematically, useful when there is little intuition about parameter settings.

51
New cards

Bayesian Optimization

A method that treats hyperparameter tuning as a machine learning problem, using prior information to evaluate new hyperparameter configurations.

52
New cards

Training Set

The portion of the dataset used to train the model, allowing it to learn patterns and relationships.

53
New cards

Testing Set

The data used to evaluate the model's performance after it has been trained, measuring how well it generalizes to unseen data.

54
New cards

k-nearest neighbors (k-NN)

A nonparametric method used for classification or regression by finding the k nearest examples in the training data.

55
New cards

Parametric models

Models that summarize training data with a fixed set of parameters, independent of the number of training examples.

56
New cards

Nonparametric models

Models that rely on the data themselves and cannot be characterized by a bounded set of parameters.

57
New cards

Euclidean distance

The straight-line distance between two points in Euclidean space; it is useful when attributes are similar.

58
New cards

Manhattan distance

Also known as city block distance; it measures the distance between points in a grid-based path.

59
New cards

Curse of Dimensionality

A phenomenon where the distance between points increases in high-dimensional spaces, making nearest neighbors less meaningful.

60
New cards

k-dimensional tree (k-d tree)

A balanced binary tree structure that organizes data points in k dimensions, facilitating faster nearest neighbor searches.

61
New cards

Normalization

The process of scaling data to have a mean of zero and a standard deviation of one, often done using z-scores.

62
New cards

Instance-based learning

A type of learning where the model relies on specific instances of the training data rather than general parameters.

63
New cards

Time complexity of k-NN

The computational complexity of finding nearest neighbors, which is O(N) for datasets with N examples.

64
New cards

Logistic Regression

A statistical method for predicting binary classes by using a logistic function.

65
New cards

Linear Classification

A classification approach that models the relationship between input features and classes using linear boundary.

66
New cards

Decision Boundary

A line or surface that separates different classes in a classification problem.

67
New cards

Linearly Separable

A condition where classes can be separated by a linear decision boundary.

68
New cards

Threshold Function

A function that determines the output of a model based on whether a linear function exceeds a certain threshold.

69
New cards

Minimizing Loss

The process of adjusting model parameters to reduce the difference between predicted and actual outcomes.

70
New cards

Perceptron Learning Rule

An algorithm for updating weights in binary classification problems based on prediction errors.

71
New cards

Logistic Function

A sigmoid function that produces outputs between 0 and 1, representing probabilities.

72
New cards

Probabilistic Interpretation

Understanding model outputs as probabilities indicating the likelihood of a class assignment.

73
New cards

One-vs-the-Rest Classifier

A method where multiple binary classifiers distinguish one class against all others.

74
New cards

Confusion Matrix

A table used to evaluate the performance of a classification model by showing true vs predicted classifications.

75
New cards

Sensitivity

The ratio of true positives to the sum of true positives and false negatives, indicating the ability to detect positive instances.

76
New cards

Specificity

The ratio of true negatives to the sum of true negatives and false positives, indicating the ability to identify negative instances.

77
New cards

Precision

The ratio of true positives to the sum of true positives and false positives, indicating the accuracy of positive predictions.

78
New cards

Simple linear regression

A method to model the relationship between one independent variable (x) and a dependent variable (y) by fitting a linear equation.

79
New cards

Loss function

A function that measures the difference between predicted values (ŷ) and actual values (y).

80
New cards

L1 loss

Absolute-value loss defined as L1(y, ŷ) = |y - ŷ|, indicating the magnitude of prediction errors.

81
New cards

L2 loss

Squared-error loss defined as L2(y, ŷ) = (y - ŷ)², which emphasizes larger errors more than smaller ones.

82
New cards

Least squares

A method used in regression analysis that minimizes the sum of the squares of the residuals to find the best-fitting line.

83
New cards

Gradient descent

An iterative optimization algorithm used to minimize the loss function by updating weights incrementally based on the gradient.

84
New cards

Learning rate (α)

A hyperparameter that determines the size of the steps taken towards the minimum of the loss function during optimization.

85
New cards

Stochastic gradient descent (SGD)

A variant of gradient descent where the weights are updated using a randomly selected subset of training examples.

86
New cards

Multivariable linear regression

A type of regression analysis where two or more predictor variables are used to predict the outcome of a response variable.

87
New cards

Regularization

A technique used to prevent overfitting by adding a penalty to large coefficients in the loss function.

88
New cards

Inputs

Also known as features or attributes, typically represented by a vector, representing variables such as house size and abalone weight.

89
New cards

Objective function

A function that measures the performance of the model.

90
New cards

Ground truth

The actual labels or outputs (yi) in a supervised learning task.

91
New cards

Hypothesis

A function h that approximates the true function f in supervised learning.

92
New cards

Classification

A type of supervised learning where the output is categorical.

93
New cards

Regression

A type of supervised learning where the output is a continuous number.

94
New cards

Training set

A set of input-output pairs used to train a model.

95
New cards

Test set

A separate set of (x, y) pairs used to evaluate the performance of a model after training.

96
New cards

Bias

The difference between the model prediction and the actual observed value; high bias can cause underfitting.

97
New cards

Variance

The amount of change in the model due to fluctuations in the training data; high variance can cause overfitting.

98
New cards

Bias-Variance Tradeoff

The balance between bias and variance to minimize total error in predictive models.

99
New cards

Features

Attributes or inputs used in a machine learning model.

100
New cards

Model class

A set of possible models defined by a common structure.