Exam 2

5.0(1)

Studied by 1 person

5.0(1)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/56

Earn XP

Description and Tags

IT Exam 2 all Powerpoints!

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

57 Terms

New cards

How to plot in Pandas

<dataframe name>.plot(x = “column”, y=”column”)

plt.show()

New cards

How to set chart type

<dataframe name>.plot(x =”column”, y = “column”, title = <title>)

plt.show()

New cards

“Area” is for:

area plot

New cards

“bar” is for:

vertical bar charts

New cards

“barh” is for:

horizontal bar charts

New cards

“box” is for box plots

box plots

New cards

“hexbin” is for:

hexbin plots

New cards

“hist” is for:

histograms

New cards

“kde” is for:

kernel density estimate charts

New cards

“density”

an alias for “kde”

New cards

“line” is for:

line graphs

New cards

“pie” is for:

pie charts

New cards

“scatter” is for:

scatter plots

New cards

Plotting categorical data

use value_counts on a categorical data

New cards

Box Plots

Used to visualize a distribution of data

no x-axi parameter in the plot function
bottom of the box is 25% threshold
median value is the line in box
top of box is 75% threshold
dots are outliers

New cards

Skewed Right

New cards

Symmetric

New cards

Skewed Left

New cards

Exploratory Analytics

Looking at patterns and trends in the data to explain what has already happened

New cards

Predictive Analytics

Looking at patterns and trends in the data to explain what will happen (ie forecasting)

New cards

Steps for Predictive Models

Learn the relationship between predictors and target
Test if the model has learned the relationship

New cards

Training Set

used to learn the relationship

New cards

Validation Set

used to test the model
randomly sampled from original data
we can compare target estimates with the known target values to calculate the accuracy or error of the model

New cards

Regression

determining the relationship between a variable and one or more other variables

New cards

Linear Regression

given a set of observations, determine the equation of a line that can be used to describe the dataset

New cards

Exploratory Modeling

obtain the best fit model from all observations

New cards

Predictive Modeling

split observations into a training and validation set

New cards

General rule for splitting observations into sets

80% used for training, 20% used for validation

New cards

Error

absolute of expected - estimated

New cards

Mean Error (ME)

the average of the errors

New cards

Mean Squared Error (MSE)

the sum of the errors squared, divided by the number of errors

New cards

Root Mean Square Error (RMSE)

the square root of MSE

New cards

What does fitting a model mean for MLR

solving for the best values of the coefficients

New cards

If we have n predictors, how many coefficients is MLR solving for?

n+1

New cards

predicted value of y is…

an estimate of y

New cards

If we have the actual y, we can…

compare it to the predicted y to understand the accuracy of our model

New cards

How do we select the best predictors?

brute force method
plotting the relationships

New cards

Brute force method

use all predictors, measure the error
take out one predictor, measure the error
continue until you have the combination that has the smallest error

New cards

Steps for using multiple linear regression with Python

import libraries (pandas, scikit-learn)
import data (read from csv)
split data into training and observation sets
fit training data into a linear model
use the model with the validation set
evaluate the model

New cards

Scikit learn

a popular library in python for data analytics, data science, and machine learning

New cards

Using the predict function of the model, giving it the x-values will give us…

predictions that we can compare against the y-values from the validation set

New cards

R squared

a metric for how well the model is fit to the data (0 —> 1)

aka- how much of the variation in y can be explained by the predictors

New cards

Classifiers…

take a set of features and give us back a class label

New cards

Classification is…

the process of identifying a label (class) for the data points

New cards

What is the difference between regression models and classification models?

the target, or what we are trying to estimate

New cards

The premise of the K-Nearest Neighbor (KNN) is…

the most similar class of a data point is the class of its closest neighbors from the training set

New cards

How do we know which data points are closest?

need to use a distance metric
by measuring the distance between data points, we are actually measuring the similarity of the data points
- smaller the distance, the more similar
- larger the distance, the more dissimilar

New cards

Euclidean Distance

the distance between two points

New cards

Why do we have to scale the data?

so each feature has equal influence on the final decision

New cards

Data normalization is necessary with KNN…

because the euclidean distance squares the differences in features

New cards

When a datapoint’s features are given but does not have a label…

the model computes the distance between the datapoint’s features and all training data