Data Science Fundementals 2-2

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 53

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

54 Terms

1

___ learning should be used for a ML model predicting a value.

Supervised

New cards
2

___ learning should be used for a ML model not predicting a value.

Unsupervised

New cards
3

A supervised learning model finding a discrete value is performing ___.

Classification

New cards
4

A supervised learning model finding a continuous value is performing ___.

Regression

New cards
5

An unsupervised model trying to fit data into discrete groups is performing ___.

Clustering

New cards
6

An unsupervised model making a numeric estimate is performing ___.

Density Estimation

New cards
7

Supervised Learning

Learning from data with known outcomes.

New cards
8

Unsupervised Learning

Learning from data with unknown outcomes.

New cards
9

Step 1 of making a decision tree:

Calculate entropy of the target variable.

New cards
10

Step 2 of making a decision tree:

Split the dataset and calculate the entropy for each sub-set. Add the entropies together and compare to the original entropy.

New cards
11

Step 3 of making a decision tree:

Choose the attribute with the smallest entropy as the decision node. Repeat the steps again.

New cards
12

Entropy/Uncertainty

The amount of information lost for a decision.

New cards
13

The best attribute to split a decision tree with is the one that produces the ___ tree.

Smallest

New cards
14

Decision Node Purity

The amount of outcomes a decision can possibly have.

New cards
15
<p>Is this decision pure?</p>

Is this decision pure?

Yes

New cards
16
<p>Is this decsision pure?</p>

Is this decsision pure?

No

New cards
17

The attribute with the ___ entropy should be selected.

Lowest

New cards
18

Informationed Gained =

Info(D)-Info_A(D) where Info(D) is the old entropy and Info_A(D) is the new entropy.

New cards
19

k-Nearest Neighbors

A supervised learning algorithm that uses a distance-based algorithm to cluster tuples.

New cards
20

The k in k-NN stands for:

The number of neighbors each cluster should have.

New cards
21

Step 1 of k-NN:

Decide on the similarity metric, then split the dataset into training and testing data. Pick an evaluation metric.

New cards
22

Step 2 of k-NN:

Run k-NN a few times, changing k each time.

New cards
23

Step 3 of k-NN:

Choose the “best” k.

New cards
24

k-Means

An unsupervised learning algorithm that clusters similar objects.

New cards
25

Step 1 of k-Means:

Pick a k number of random points to be the centroids.

New cards
26

Step 2 of k-Means:

Assign each data point to the centroid closest to them.

New cards
27

Step 3 of k-Means:

Move the centroids to the average location of all the data points in their cluster. Repeat steps 2 and 3 until all centroids move little to none.

New cards
28

It is possible for k-Means to fall into an ___.

Infinite Loop

New cards
29

Random Forest

A collection of decision trees.

New cards
30

Social Network

A collecion of actors and relations.

New cards
31

Social Actor

A single unit in a social network.

New cards
32

Social Dyad

A pair of actors.

New cards
33

Social Triad

A triplet of actors.

New cards
34

Social Subgroup

A subset of a social network.

New cards
35

Social Relation

A relational tie between actors.

New cards
36

Social Ego Network

The “part of the network surrounding a single actor.”

New cards
37

When presenting to a project sponsor, the presentation should be:

Short, technically simple, and have the results introduced early into the presentation.

New cards
38

When presenting to an end user, the presentation should be:

Focused on how the model improves their day-to-day lives and how to use the model.

New cards
39

When presenting to other data scientists, the presentation should be:

Technically complex and brutally honest about the limitations and assumtions of the model.

New cards
40

True Positive (TP)

Predicted Posivite, Actually Positive

New cards
41

True Negative (TN)

Predicted Negative, Actually Negative

New cards
42

False Positive (FP)

Predicted Positive, Actually Negative

New cards
43

False Negative (FN)

Predicted Negative, Actually Positive

New cards
44

F_1-Score =

2 \times \frac{\text{precision} \times \text{recall}}{\text{precision} + \text{recall}}

New cards
45

Precision =

\frac{TP}{TP+FP}

New cards
46

Recall =

\frac{TP}{TP+FN}

New cards
47

Accuracy =

\frac{TP+TN}{TP+TN+FP+FN}

New cards
48

Linear Regression =

f(x)+b where f(x) is the slope and b is the error term.

New cards
49

Multiple Linear Regression =

f(0)+f(1)i_1+…+f(n)i_n+b where f(x) is the slope and i is the independent variable

New cards
50

Underfitting

When the model’s predictions doesn’t come close to matching the actual data.

New cards
51

Overfitting

When the model’s predictions matches the testing data too well and underperforms in the real world.

New cards
52

Data Leakage

Training data shown to the model that wouldn’t be available in the real world.

New cards
53

Support Vector Machines

A supervised classification model that seperates data into two groups based on finding a line with the maximum distance between both points.

New cards
54

Residuals =

x - \hat{x} where x is the original and \hat{x} is the prediction

New cards

Explore top notes

note Note
studied byStudied by 56 people
145 days ago
5.0(2)
note Note
studied byStudied by 9 people
751 days ago
5.0(1)
note Note
studied byStudied by 51 people
758 days ago
5.0(2)
note Note
studied byStudied by 22 people
968 days ago
4.5(2)
note Note
studied byStudied by 7 people
569 days ago
5.0(1)
note Note
studied byStudied by 1 person
809 days ago
5.0(1)
note Note
studied byStudied by 36 people
720 days ago
5.0(1)
note Note
studied byStudied by 10144 people
699 days ago
4.6(60)

Explore top flashcards

flashcards Flashcard (27)
studied byStudied by 21 people
141 days ago
5.0(3)
flashcards Flashcard (97)
studied byStudied by 18 people
843 days ago
5.0(1)
flashcards Flashcard (61)
studied byStudied by 5 people
94 days ago
5.0(1)
flashcards Flashcard (75)
studied byStudied by 8 people
724 days ago
5.0(2)
flashcards Flashcard (20)
studied byStudied by 2 people
15 days ago
5.0(1)
flashcards Flashcard (32)
studied byStudied by 19 people
719 days ago
5.0(1)
flashcards Flashcard (48)
studied byStudied by 39 people
407 days ago
5.0(1)
flashcards Flashcard (278)
studied byStudied by 172 people
134 days ago
5.0(1)
robot