Data Mining Quiz 2

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/14

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

15 Terms

New cards

Classification

Assigns class info each sample (qualitative, categorical)

New cards

Logistic Regression

Models the probability that Y belongs to a particular category

New cards

Maximum Likelihood Estimation

Selecting 𝛽0 and 𝛽1 such that the predicted probability of (default = yes) is as close as possible the individual’s observed default status

New cards

Multinomial Logistic Regression

Allows for K > 2

Assumes that the odds for one class does not depend on the other

Predictors do not necessarily need to be explicitly independent, however low correlation between variables is preferred

New cards

K - 1

Runs idependent binary models

Banana vs Apple → binary model
Cherry vs Apple → binary model

New cards

Generative Models

If classes are too far apart

If X is approximately normal in each class and the sample size is small, generative models can be more accurate

Easier to extend to more than two response classes

New cards

Bayes Theorem

πk=P(Y=k) = probability that a randomly chosen observation belongs to class K.

Example:
- πApple=05→ 50% of all fruits are Apples.
- πBanana=0.3→ 30% Bananas.
- πCherry=0.2 → 20% Cherries.

New cards

Linear Discriminant Analysis

Prior probability = your initial belief about a class.
Likelihood = how well the observed data fits that class.
Posterior probability = updated belief after seeing the data.
For each observation, calculate the discriminant for each class, assign to the class with the highest discriminant
To find the LDA decision boundary, find the point where these two discriminants are equivalent
Performs best with fewer observations, reducing the variance is important

New cards

Quadratic Discriminant Analysis

Like LDA, with the difference of a class specific variance rather an a common variance (how far they differ from the mean), allows for quadratic shaped decision boundaries

Performs best with large amounts of observations

New cards

Naive Bayes Classifier

Does not combine predictors in each class, but assumes they are independent

If Xj is quantitative:

○ Can assume within each class, the j th predictor comes from a normal distribution

In a line up of apples count every 5th apple as data

New cards

K-Nearest Neighbors

Non parametric
Many observations
Select a value of K, plot all training observations, for each test observation find the K nearest training observations, assign predicted value to test observation based on average known response value for the K nearest training observations

New cards

Generalized Linear Models

Useful when data is neither qualitative nor quantitative

○ Value more closely represents counts of a unit

○ Ex: CaBi ride share data, predicting number of riders

New cards

LDA & Logistic Regression

When it can be linear
Yes/No situation

New cards

QDA and Naive Bayes

Moderately non-linear boundaries
QDA allows each class to have its own covariance (curved boundaries).
Naive Bayes can capture more complex shapes depending on predictor distributions.

New cards

Non-parametric approach like KNN

Very non-linear boundaries
KNN does not assume any formula for the boundary. It decides based on local neighborhoods.