Data Science Fundemtals 2

0.0(0)
studied byStudied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/169

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 4:27 AM on 12/3/24
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

170 Terms

1
New cards

An ___ feature has values that are unaffected by other features.

Input

2
New cards

An ___ feature has values affected by other features.

Output

3
New cards

Residual Error

The difference between the observed and predicted value.

4
New cards

Extrapolation

A prediction that is far beyond the range of the original data.

5
New cards

Simple Linear Regression =

f(0)+mx

6
New cards

Sum of Squared Errors (SSE)

The sum of the squares of all residuals.

7
New cards

Least Squares Regression Line

The simple linear regression formula that minimizes SSE.

8
New cards

Correlation Coefficient

Measures the direction and strength of a linear relationship as a value between 0 and 1.

9
New cards

Fitted vs. Residuals Plots

Displays the predicted values against the residuals.

10
New cards

Normal Q-Q Plot

Displays the sample quantiles against the theoretical quantiles.

11
New cards

Multiple Linear Regression =

f(0) x_0 + f(1) x_1 + … + f(k) x_k

12
New cards

Simple Polynomial Regression =

f(0) x^{0} + f(1) x^{1} + … + f(k) x^{k}

13
New cards

Polynomial Regression Model

A regression model that displays a polynomial relationship between two features.

14
New cards

Interaction Term

A term in a regression model that contains multiple input features.

15
New cards

Logistic Regression =

\frac{e^{b_0 + b_1 x}}{1+e^{b_0 + b_1 x}}

16
New cards

Hot Encoding

Transforming a categorical feature into a numeric feature.

17
New cards

Log Odds = ln(\frac{p}{1-p}) =

b_0 + b_1 x

18
New cards

Odds Ratio

Compares the relative odds of a outcome given a feature.

19
New cards

A model is ___ if it is too simple to fit the given data.

Underfit

20
New cards

A model is ___ if it is too complex to fit the given data.

Overfit

21
New cards

Ideally, a model ___ pass through every point on a graph.

Shouldn’t

22
New cards

The ___ complex model is preferred over the ___ complex model.

Least, More

23
New cards

Total Error

How much the observed values differ from predicted values.

24
New cards

Bias

How much a model’s prediction differs from the observed values.

25
New cards

Variance

How spread out a model’s predictions are.

26
New cards

Irreducible Error

Error inherent to the situation, unaffected by the model.

27
New cards

A complex model will have more ___ than ___.

Variance, Bias

28
New cards

A simple model will have more ___ than ___.

Bias, Variance

29
New cards

Machine Learning Algorithm

Uses data to build a model that makes predictions.

30
New cards

Regression

A machine learning model used to predict numerical values.

31
New cards

Classification

A machine learning model used to predict categorical values.

32
New cards

Model Training

The process of estimating model parameters used to make a prediction.

33
New cards

___ data is used to fit a model.

Training

34
New cards

___ data is used to evaluate a model’s performance while working on the model.

Validation

35
New cards

___ data is used to evaluatethe final model’s performance compared to other models.

Test

36
New cards

Loss Function

Quantifies the difference between a model’s predictions and the observed values.

37
New cards

Regression Metric

The value returned by a loss function.

38
New cards

The lower the regression metric, the ___ the model is.

Better

39
New cards

Mean Squared Error =

\frac{1}{n} \sum (y_i - \hat{y}_{i})^{2}

40
New cards

Mean Squared Error

A direct measure of a model’s variance.

41
New cards

Mean Absolute Error =

\frac{1}{n} \sum |y_i - \hat{y}_{i}|^{2}

42
New cards

Mean Absolute Error

Like Mean Squared Error, but is less influenced by outliers.

43
New cards

Absolute Loss

Quantifies the loss due to uncertainty.

44
New cards

L_{abs}(y,\hat{p})=|y-\hat{p}| where y is the ___ and \hat{p} is the ___.

Observed class, Predicted probability

45
New cards

An instance is ___ if the output feature’s value is known for that instance.

Labeled

46
New cards

Supervised Learning

Training a model to predict a labeled output feature.

47
New cards

A model is ___ if the relationship between input and output features in the model are easy to explain.

Interpretable

48
New cards

A model is ___ if the outputs produced by the model match the actual outputs with new data.

Predictive

49
New cards

K-Nearest Neighbors

A supervised learning algorithm that predicts the output of a new instance using instances with similar inputs.

50
New cards

Metric

A method of determining the distance between two instances.

51
New cards

Confusion Matrix

A table that summarizes the combinations of predicted and actual values.

52
New cards

Accuracy =

\frac{\text{TP} + \text{TN}}{\text{TP}+\text{FP}+\text{TN}+\text{FN}}

53
New cards

Precision =

\frac{\text{TP}}{\text{TP} + \text{FP}}

54
New cards

Recall =

\frac{\text{TP}}{\text{TP}+\text{FN}}

55
New cards

Receiver Operating Characteristic Curve (ROC Curve)

Measures how well a classification model distinguishes between classes at various probabilties.

56
New cards

Area Under The ROC Curve (AUC)

A metric used to compare the performance between two classification models.

57
New cards

Naive Bayes Classification

A supervised learning classifier that uses the number of times a category occurs in a class to eastimate the likelihood of an instance being in that class.

58
New cards

P(\text{class}|\text{data}) indicates ___.

the probability of an instance being in \text{class} given \text{data}

59
New cards

Laplace Smoothing

Adds one ficitonal instance to a class if none exist.

60
New cards

Naive Bayes Classification assumes all categories are ___.

Equally important

61
New cards

Support Vector Machine

A supervised learning algorithm that uses hyperplanes to divide data into different classes.

62
New cards

Hyperplane

A flat surface that is one dimension lower than the input feature space.

63
New cards

A dataset is ___ if a hyperplane can divide the dataset so that all instances of one class fall on one side and everything else falls on the other.

Well-Seperated

64
New cards

Margin

The space between a hyperplane and its supporting vectors.

65
New cards

Support Vectors

The closest instances to a hyperplane.

66
New cards

Vectors on the wrong side of a hyperplane are often given a ___.

Penalty

67
New cards

Hinge Function

Takes the distance from the margin as input, returns a 0 if vector is on the right side and a linear penalty if on the wrong side.

68
New cards

Sensitivity/Recall

The True-Positive rate.

69
New cards

Specificity

The True-Negative rate.

70
New cards

Accuracy

The ratio of the number of correct labels to the total labels.

71
New cards

Missclassification Rate

The ratio of the number of incorrect labels to the total labels.

72
New cards

Missclassification Rate =

1 - \text{Accuracy}

73
New cards

F1 Score

A number between 0 and 1 that represents the harmonic mean of precision and recall.

74
New cards

F1 Score =

2 \frac{\text{Precision} * \text{Recall}}{\text{Precision} + \text{Recall}}

75
New cards

Sensitivity =

\frac{\text{TP}}{\text{TP}+\text{FN}}

76
New cards

Specificity =

\frac{\text{TN}}{\text{TN}+\text{FP}}

77
New cards

Entropy

Describes the number of ways a situation could diverge.

78
New cards

Steps to make a decision tree:

Calculate entropy of decision, split decision’s attributes into subtables and calculate their entropy, choose the attribute with the largest entropy, then repeat the process.

79
New cards

Information Gain

Entropy before split compared to entropy after split.

80
New cards

Heuristic

The attribute that produces the purest node.

81
New cards

Entropy / Expected Information needed to classify tuple D=

\text{Info}(D)=-\sum_{m}^{i=l}p_{i}\log_{2}(p_{i})

82
New cards

Information needed to classify D after using A to split D into v partitions=

\text{Info}_{A}(D)=\sum^{v}_{j=l}\frac{|D_{j}|}{|D|}*I(D_j)

83
New cards

Information gained by branching on attribute A=

\text{Gain}(A)=\text{Info}(D)-\text{Info}_{A}(D)

84
New cards

When picking a distance metric for kNN, the metric doesn’t have to be the ___ on a graph.

Physical distance

85
New cards

The ___ set is used to train the model before testing it.

Training

86
New cards

The ___ set is used to test the model’s abilities after training it.

Testing

87
New cards

Picking an ___ is the 3rd step in creating a kNN model.

Evaluation Metric

88
New cards

The k in kNN represents the ___.

Distance Metric

89
New cards

Unsupervised Learning

Teaching a model to categorize data where no labels are available.

90
New cards

kMeans

An unsupervised learning technique that groups different tuples together based on known attributes.

91
New cards

Centroid

The center of a cluster.

92
New cards

Each cluster in kMeans represents an individual ___.

Attribute

93
New cards

Step 3 of kMeans is to ___.

Move the centroids to the average location of the data points

94
New cards

kMeans should repeat until ___.

The centroids move either very little or not at all.

95
New cards

kMeans has the possibility to fall into an ___ or give a ___ answer.

Infinite loop, Useless

96
New cards

Bootstrapping

The process of generating simulated samples by repeatedly drawing from existing samples.

97
New cards

Clustering Algorithm

Groups instances with similar features.

98
New cards

Outlier Detection Algorithm

Identifies deviations within the data.

99
New cards

Latent Variable Model

Relates observable features to unobservable (latent) features.

100
New cards

Cluster

A set of instances with similar characteristics.