Looks like no one added any tags here yet for you.
An ___ feature has values that are unaffected by other features.
Input
An ___ feature has values affected by other features.
Output
Residual Error
The difference between the observed and predicted value.
Extrapolation
A prediction that is far beyond the range of the original data.
Simple Linear Regression =
f(0)+mx
Sum of Squared Errors (SSE)
The sum of the squares of all residuals.
Least Squares Regression Line
The simple linear regression formula that minimizes SSE.
Correlation Coefficient
Measures the direction and strength of a linear relationship as a value between 0 and 1.
Fitted vs. Residuals Plots
Displays the predicted values against the residuals.
Normal Q-Q Plot
Displays the sample quantiles against the theoretical quantiles.
Multiple Linear Regression =
f(0) x_0 + f(1) x_1 + … + f(k) x_k
Simple Polynomial Regression =
f(0) x^{0} + f(1) x^{1} + … + f(k) x^{k}
Polynomial Regression Model
A regression model that displays a polynomial relationship between two features.
Interaction Term
A term in a regression model that contains multiple input features.
Logistic Regression =
\frac{e^{b_0 + b_1 x}}{1+e^{b_0 + b_1 x}}
Hot Encoding
Transforming a categorical feature into a numeric feature.
Log-Odds = ln(\frac{p}{1-p}) =
b_0 + b_1 x
Odds Ratio
Compares the relative odds of a outcome given a feature.
A model is ___ if it is too simple to fit the given data.
Underfit
A model is ___ if it is too complex to fit the given data.
Overfit
Ideally, a model ___ pass through every point on a graph.
Shouldn’t
The ___ complex model is preferred over the ___ complex model.
Least, More
Total Error
How much the observed values differ from predicted values.
Bias
How much a model’s prediction differs from the observed values.
Variance
How spread out a model’s predictions are.
Irreducible Error
Error inherent to the situation, unaffected by the model.
A complex model will have more ___ than ___.
Variance, Bias
A simple model will have more ___ than ___.
Bias, Variance
Machine Learning Algorithm
Uses data to build a model that makes predictions.
Regression
A machine learning model used to predict numerical values.
Classification
A machine learning model used to predict categorical values.
Model Training
The process of estimating model parameters used to make a prediction.
___ data is used to fit a model.
Training
___ data is used to evaluate a model’s performance while working on the model.
Validation
___ data is used to evaluatethe final model’s performance compared to other models.
Test
Loss Function
Quantifies the difference between a model’s predictions and the observed values.
Regression Metric
The value returned by a loss function.
The lower the regression metric, the ___ the model is.
Better
Mean Squared Error =
\frac{1}{n} \sum (y_i - \hat{y}_{i})^{2}
Mean Squared Error
A direct measure of a model’s variance.
Mean Absolute Error =
\frac{1}{n} \sum |y_i - \hat{y}_{i}|^{2}
Mean Absolute Error
Like Mean Squared Error, but is less influenced by outliers.
Absolute Loss
Quantifies the loss due to uncertainty.
L_{abs}(y,\hat{p})=|y-\hat{p}| where y is the ___ and \hat{p} is the ___.
Observed class, Predicted probability
An instance is ___ if the output feature’s value is known for that instance.
Labeled
Supervised Learning
Training a model to predict a labeled output feature.
A model is ___ if the relationship between input and output features in the model are easy to explain.
Interpretable
A model is ___ if the outputs produced by the model match the actual outputs with new data.
Predictive
K-Nearest Neighbors
A supervised learning algorithm that predicts the output of a new instance using instances with similar inputs.
Metric
A method of determining the distance between two instances.
Confusion Matrix
A table that summarizes the combinations of predicted and actual values.
Accuracy =
\frac{\text{TP} + \text{TN}}{\text{TP}+\text{FP}+\text{TN}+\text{FN}}
Precision =
\frac{\text{TP}}{\text{TP} + \text{FP}}
Recall =
\frac{\text{TP}}{\text{TP}+\text{FN}}
Receiver Operating Characteristic Curve (ROC Curve)
Measures how well a classification model distinguishes between classes at various probabilties.
Area Under The ROC Curve (AUC)
A metric used to compare the performance between two classification models.
Naive Bayes Classification
A supervised learning classifier that uses the number of times a category occurs in a class to eastimate the likelihood of an instance being in that class.
P(\text{class}|\text{data}) indicates the probability that ___.
The probability of an instance being in \text{class} given \text{data}.
Laplace Smoothing
Adds one ficitonal instance to a class if none exist.
Naive Bayes Classification assumes all categories are ___.
Equally important
Support Vector Machine
A supervised learning algorithm that uses hyperplanes to divide data into different classes.
Hyperplane
A flat surface that is one dimension lower than the input feature space.
A dataset is ___ if a hyperplane can divide the dataset so that all instances of one class fall on one side and everything else falls on the other.
Well-Seperated
Margin
The space between a hyperplane and its supporting vectors.
Support Vectors
The closest instances to a hyperplane.
Vectors on the wrong side of a hyperplane are often given a ___.
Penalty
Hinge Function
Takes the distance from the margin as input, returns a 0 if vector is on the right side and a linear penalty if on the wrong side.
Sensitivity/Recall
The True-Positive rate.
Specificity
The True-Negative rate.
Accuracy
The ratio of the number of correct labels to the total labels.
Missclassification Rate
The ratio of the number of incorrect labels to the total labels.
Missclassification Rate =
1 - \text{Accuracy}
F1 Score
A number between 0 and 1 that represents the harmonic mean of precision and recall.
F1 Score =
2 \frac{\text{Precision} * \text{Recall}}{\text{Precision} + \text{Recall}}
Sensitivity =
\frac{\text{TP}}{\text{TP}+\text{FN}}
Specificity =
\frac{\text{TN}}{\text{TN}+\text{FP}}
Entropy
Describes the number of ways a situation could diverge.
Steps to make a decision tree:
Calculate entropy of decision, split decision’s attributes into subtables and calculate their entropy, choose the attribute with the largest entropy, then repeat the process.
Information Gain
Entropy before split compared to entropy after split.
Heuristic
The attribute that produces the purest node.
Entropy / Expected Information needed to classify tuple D=
\text{Info}(D)=-\sum_{m}^{i=l}p_{i}\log_{2}(p_{i})
Information needed to classify D after using A to split D into v partitions=
\text{Info}_{A}(D)=\sum^{v}_{j=l}\frac{|D_{j}|}{|D|}*I(D_j)
Information gained by branching on attribute A=
\text{Gain}(A)=\text{Info}(D)-\text{Info}_{A}(D)
When picking a distance metric for kNN, the metric doesn’t have to be the ___ on a graph.
Physical distance
The ___ set is used to train the model before testing it.
Training
The ___ set is used to test the model’s abilities after training it.
Testing
Picking an ___ is the 3rd step in creating a kNN model.
Evaluation Metric
The k in kNN represents the ___.
Distance Metric
Unsupervised Learning
Teaching a model to categorize data where no labels are available.
kMeans
An unsupervised learning technique that groups different tuples together based on known attributes.
Centroids
The center points in a cluster for kMeans.
Each cluster in kMeans represents an individual ___.
Attribute
Step 3 of kMeans is to ___.
Move the centroids to the average location of the data points
kMeans should repeat until ___.
The centroids move either very little or not at all.
kMeans has the possible to fall into an ___ or give a ___ answer.
Infinite loop, Useless