Machine Learning

0.0(0)
studied byStudied by 21 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/92

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

93 Terms

1
New cards

Machine Learning

A field of Artificial Intelligence where systems learn from data to make predictions or decisions without being explicitly programmed. It involves the study of algorithms that improve performance at a task through experience.

2
New cards

Supervised Learning

A type of training that uses a series of labeled examples with direct feedback, where the training data includes the desired outputs.

3
New cards

Unsupervised Learning

A type of learning with no feedback, where the training data does not include desired outputs. It involves learning "what normally happens" or grouping similar instances.

4
New cards

Reinforcement Learning

A type of learning involving indirect feedback after many examples, where rewards are received from a sequence of actions. It focuses on learning a policy (a sequence of outputs).

5
New cards

Regression

A task where the goal is to predict a continuous numeric value based on input features (e.g., predicting house prices or temperature).

6
New cards

Classification

A task where the goal is to predict categories or nominal outputs.

7
New cards

Linear Regression

A regression algorithm that fits data with a hyperplane (or line in 2D). It is the simplest model for function approximation.

8
New cards

Logistic Regression

A common algorithm used when the dependent variable is binary (e.g., disease vs. no disease). It fits data with a sigmoidal or logistic curve rather than a line to output a probability approximation.

9
New cards

Delta Rule (Least Mean Squares Rule)

An update rule used in supervised learning (specifically for neural networks) to minimize error by adjusting weights based on the difference between actual and predicted outputs.

10
New cards

Sum of Squared Error (SSE)

An objective function used in simple linear regression that sums the squared differences between predicted and actual values. It creates a parabolic error surface ideal for gradient descent.

11
New cards

Mean Absolute Error (MAE)

The average absolute difference between predicted and actual values.

12
New cards

Mean Squared Error (MSE)

The average squared difference between predicted and actual values; it penalizes larger errors more than MAE.

13
New cards

R² Score (Coefficient of Determination)

A metric that measures how well the model explains the variance in the data.

14
New cards

Binary Classification

A type of classification involving exactly two classes (e.g., Pass/Fail, Yes/No).

15
New cards

Multiclass Classification

Classification involving more than two classes (e.g., Cat, Dog, Bird).

16
New cards

Multilabel Classification

A scenario where each instance can belong to multiple classes simultaneously.

17
New cards

Threshold

The decision boundary (e.g., 0.5 or 0.9) that converts a model’s probability output into a specific class label.

18
New cards

19
New cards

Decision Tree

A hierarchical structure that makes decisions based on feature values, used for classification and regression.

20
New cards

Random Forest

An ensemble method consisting of multiple decision trees.

21
New cards

K-Nearest Neighbors (KNN)

An algorithm that classifies instances based on their nearest data points.

22
New cards

Support Vector Machine (SVM)

An algorithm that finds the best boundary between classes.

23
New cards

Naïve Bayes

A classifier based on probability and Bayes’ theorem.

24
New cards
Confusion Matrix
A table comparing the actual (true) labels with the labels predicted by the classification model.
25
New cards
Precision
The number of correctly classified positive examples divided by the total number of examples classified as positive.
26
New cards
Recall (Sensitivity)

The number of correctly classified positive examples divided by the total number of actual positive examples in the test set.

27
New cards
Specificity
Also known as the True Negative Rate (TNR).
28
New cards
Cross-validation
A method where data is partitioned into n subsets to train and test the model n times to estimate accuracy.
29
New cards
Leave-one-out Cross-validation
A special case of cross-validation for small datasets where each fold has only a single test example.
30
New cards
Scoring
Assigning a probability estimate (PE) to an instance rather than a definite class label.
31
New cards
ROC Curve (Receiver Operating Characteristics)
A plot of the True Positive Rate (TPR) against the False Positive Rate (FPR).
32
New cards
AUC (Area Under the Curve)
A performance measure where a value of 1 represents a perfect classifier and 0.5 represents random guessing.
33
New cards
Lift Analysis
An analysis performed by ranking examples by their score and dividing them into bins to observe the distribution of positive examples.
34
New cards
Overfitting
A situation where a tree fits the training data well but performs poorly on test data, often characterized by the tree being too deep or having too many branches.
35
New cards
Pre-pruning
A method to avoid overfitting by halting the construction of the tree early.
36
New cards
Post-pruning
A method to avoid overfitting by removing branches or sub-trees from a "fully grown" tree.
37
New cards
Association Rule Mining
A task focused on finding relationships between items, such as identifying that customers who buy coffee likely buy bread.
38
New cards
Dimensionality Reduction
The process of simplifying large datasets into fewer variables while retaining most of the important information.
39
New cards
Clustering
The process of grouping data based on similarity or distance (e.g., Euclidean distance) without knowing labels ahead of time.
40
New cards
K-Means
An algorithm that partitions data into k clusters by minimizing the distance between data points and cluster centers (centroids).
41
New cards
Hierarchical Clustering
An algorithm that builds a tree-like structure of clusters, often visualized as a dendrogram.
42
New cards
DBSCAN
A density-based algorithm that groups points lying close together and marks outliers as noise.
43
New cards
Elbow Method
A technique used to select the optimal number of clusters (k) by plotting the Sum of Squared Errors (SSE) and finding the "elbow point" where the error curve starts to flatten.
44
New cards
PCA
A dimensionality reduction technique that transforms data into a new coordinate system using principal components to simplify the dataset while keeping the maximum variance.
45
New cards
Principal Components
New variables created by PCA that are linear combinations of original features, designed to capture the most variance in the data.
46
New cards
Eigenvalues and Eigenvectors
Mathematical properties used in PCA to determine the directions (components) with the most variance.
47
New cards
Silhouette Score
A metric (ranging from –1 to +1) that measures how well data points match their own cluster compared to others; values closer to +1 indicate good clustering.
48
New cards
Davies–Bouldin Index (DBI)
A metric measuring cluster separation and compactness, where values closer to 0 indicate better clustering.
49
New cards
Calinski–Harabasz Index (CH Index)
A metric where higher values indicate better clustering, signifying that between-cluster variance is much greater than within-cluster variance.
50
New cards
fit()
A method used to learn or train parameters from the data.
51
New cards
transform()
A method used to apply learned parameters to data, typically used for test or new data.
52
New cards
fit_transform()
A method that learns parameters and applies the transformation in a single step, commonly used with preprocessing transformers.
53
New cards
Scalers
Tools used to normalize features, which is essential for distance-based algorithms.
54
New cards
StandardScaler
Standardizes features to have a mean of 0 and a standard deviation of 1.
55
New cards
MinMaxScaler
Scales data to a specific range, usually [0, 1].
56
New cards
RobustScaler
Scales data using the Interquartile Range (IQR), making it resistant to outliers.
57
New cards
MaxAbsScaler
Scales data to the range [-1, 1], often used for sparse data.
58
New cards
QuantileTransformer
Transforms data to follow a uniform or Gaussian distribution.
59
New cards
Imputers
Tools for handling missing values.
60
New cards
SimpleImputer
Fills missing values with basic statistics like mean, median, or mode.
61
New cards
KNNImputer
Fills values based on the similarity of nearest neighbors.
62
New cards
IterativeImputer
A regression-based method for modeling complex missing values.
63
New cards
MissingIndicator
Adds a binary indicator to denote where values were missing.
64
New cards
Encoders
Techniques to convert categorical data into numbers.
65
New cards
One-Hot Encoding
Converts categories into binary columns.
66
New cards
Label Encoding
Assigns each category a unique integer label.
67
New cards
Ordinal Encoding
Converts categories into integers based on a specific order.
68
New cards
Binary Encoding
Converts categories into binary digits.
69
New cards
Target Encoding
Replaces categories with the mean of the target variable.
70
New cards
Hashing Encoding
Maps categories to fixed-length hash values.
71
New cards
Text Vectorizers
Tools that convert text into numeric vectors, such as CountVectorizer and TfidfVectorizer.
72
New cards
Regularization
Techniques to prevent overfitting in linear models by penalizing complexity.
73
New cards
Ridge
Shrinks coefficients but keeps all features.
74
New cards
Lasso
Shrinks coefficients and sets some to zero, performing feature selection.
75
New cards
Elastic Net
Combines Ridge and Lasso penalties.
76
New cards
Gini Impurity
Measures the impurity of a class distribution (used to select splits).
77
New cards
Entropy
A measure of uncertainty or disorder used in splitting.
78
New cards
Gaussian NB
Assumes continuous features are normally distributed.
79
New cards
Multinomial NB
Used for discrete counts (e.g., text classification).
80
New cards
Bernoulli NB
Used for binary features.
81
New cards
Kernel Trick
A method to transform data into higher dimensions without calculating the transformation explicitly.
82
New cards
Types
Linear, Polynomial, RBF (Radial Basis Function), and Sigmoid.
83
New cards
Activation Functions
Operations determining neuron output.
84
New cards
ReLU (Rectified Linear Unit)
Common for hidden layers, outputs input if positive, else 0.
85
New cards
Softmax
Used for multi-class classification.
86
New cards
Optimizers
Algorithms to minimize loss.
87
New cards
MLP Regressor
A Multi-Layer Perceptron specifically for predicting continuous values.
88
New cards
Manhattan Distance
Grid-like distance.
89
New cards
Minkowski Distance
Generalization of Euclidean and Manhattan.
90
New cards
Hamming Distance
Used for binary/string data.
91
New cards
Cosine Similarity
Measures the angle between vectors (text similarity).
92
New cards
ARIMA
AutoRegressive Integrated Moving Average for forecasting.
93
New cards
SARIMA
Seasonal ARIMA, which includes seasonal components for data with cyclic patterns.