Exam 2

5.0(1)
studied byStudied by 1 person
5.0(1)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/56

flashcard set

Earn XP

Description and Tags

IT Exam 2 all Powerpoints!

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

57 Terms

1
New cards

How to plot in Pandas

<dataframe name>.plot(x = “column”, y=”column”)

plt.show()

2
New cards

How to set chart type

<dataframe name>.plot(x =”column”, y = “column”, title = <title>)

plt.show()

3
New cards

“Area” is for:

area plot

4
New cards

“bar” is for:

vertical bar charts

5
New cards

“barh” is for:

horizontal bar charts

6
New cards

“box” is for box plots

box plots

7
New cards

“hexbin” is for:

hexbin plots

8
New cards

“hist” is for:

histograms

9
New cards

“kde” is for:

kernel density estimate charts

10
New cards

“density”

an alias for “kde”

11
New cards

“line” is for:

line graphs

12
New cards

“pie” is for:

pie charts

13
New cards

“scatter” is for:

scatter plots

14
New cards

Plotting categorical data

use value_counts on a categorical data

15
New cards

Box Plots

Used to visualize a distribution of data

  • no x-axi parameter in the plot function

  • bottom of the box is 25% threshold

  • median value is the line in box

  • top of box is 75% threshold

  • dots are outliers

16
New cards

Skewed Right

knowt flashcard image
17
New cards

Symmetric

knowt flashcard image
18
New cards

Skewed Left

knowt flashcard image
19
New cards

Exploratory Analytics

Looking at patterns and trends in the data to explain what has already happened

20
New cards

Predictive Analytics

Looking at patterns and trends in the data to explain what will happen (ie forecasting)

21
New cards

Steps for Predictive Models

  1. Learn the relationship between predictors and target

  2. Test if the model has learned the relationship

22
New cards

Training Set

used to learn the relationship

23
New cards

Validation Set

  • used to test the model

  • randomly sampled from original data

  • we can compare target estimates with the known target values to calculate the accuracy or error of the model

24
New cards

Regression

determining the relationship between a variable and one or more other variables

25
New cards

Linear Regression

given a set of observations, determine the equation of a line that can be used to describe the dataset

26
New cards

Exploratory Modeling

obtain the best fit model from all observations

27
New cards

Predictive Modeling

split observations into a training and validation set

28
New cards

General rule for splitting observations into sets

80% used for training, 20% used for validation

29
New cards

Error

absolute of expected - estimated

30
New cards

Mean Error (ME)

the average of the errors

31
New cards

Mean Squared Error (MSE)

the sum of the errors squared, divided by the number of errors

32
New cards

Root Mean Square Error (RMSE)

the square root of MSE

33
New cards

What does fitting a model mean for MLR

solving for the best values of the coefficients

34
New cards

If we have n predictors, how many coefficients is MLR solving for?

n+1

35
New cards

predicted value of y is…

an estimate of y

36
New cards

If we have the actual y, we can…

compare it to the predicted y to understand the accuracy of our model

37
New cards

How do we select the best predictors?

  • brute force method

  • plotting the relationships

38
New cards

Brute force method

  • use all predictors, measure the error

  • take out one predictor, measure the error

  • continue until you have the combination that has the smallest error

39
New cards

Steps for using multiple linear regression with Python

  • import libraries (pandas, scikit-learn)

  • import data (read from csv)

  • split data into training and observation sets

  • fit training data into a linear model

  • use the model with the validation set

  • evaluate the model

40
New cards

Scikit learn

a popular library in python for data analytics, data science, and machine learning

41
New cards

Using the predict function of the model, giving it the x-values will give us…

predictions that we can compare against the y-values from the validation set

42
New cards

R squared

a metric for how well the model is fit to the data (0 —> 1)

aka- how much of the variation in y can be explained by the predictors

43
New cards

Classifiers…

take a set of features and give us back a class label

44
New cards

Classification is…

the process of identifying a label (class) for the data points

45
New cards

What is the difference between regression models and classification models?

the target, or what we are trying to estimate

46
New cards

The premise of the K-Nearest Neighbor (KNN) is…

the most similar class of a data point is the class of its closest neighbors from the training set

47
New cards

How do we know which data points are closest?

  • need to use a distance metric

  • by measuring the distance between data points, we are actually measuring the similarity of the data points

    • smaller the distance, the more similar

    • larger the distance, the more dissimilar

48
New cards

Euclidean Distance

the distance between two points

49
New cards

Why do we have to scale the data?

so each feature has equal influence on the final decision

50
New cards

Data normalization is necessary with KNN…

because the euclidean distance squares the differences in features

51
New cards

When a datapoint’s features are given but does not have a label…

the model computes the distance between the datapoint’s features and all training data

52
New cards

From the K training datapoint’s…

the predicted class label is identified as the class label that occurs most frequently in the K datapoints

53
New cards

accuracy function

accuracy = accuracy_score(validation_y, predicted_regions)

54
New cards

You have trained your model and are happy with the accuracy. How do you use it?

  • use the predict function of the model

  • need to normalize your data first

55
New cards

KNN classifiers require that all features be…

numeric

56
New cards

Encoding

turning categorical features into numeric features

57
New cards

One-hot-encoding

a column with a categorical variable will be expanded in to a column of 1’s and 0’s

the number of columns created will be the number of unique values of the variable