Data Mining Quiz 2

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/14

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

15 Terms

1
New cards

Classification

Assigns class info each sample (qualitative, categorical)

2
New cards

Logistic Regression

Models the probability that Y belongs to a particular category

3
New cards

Maximum Likelihood Estimation

Selecting 𝛽0 and 𝛽1 such that the predicted probability of (default = yes) is as close as possible the individual’s observed default status

4
New cards

Multinomial Logistic Regression

Allows for K > 2

Assumes that the odds for one class does not depend on the other

Predictors do not necessarily need to be explicitly independent, however low correlation between variables is preferred

5
New cards

K - 1

Runs idependent binary models

  • Banana vs Apple → binary model

  • Cherry vs Apple → binary model

6
New cards

Generative Models

If classes are too far apart

If X is approximately normal in each class and the sample size is small, generative models can be more accurate

Easier to extend to more than two response classes

7
New cards

Bayes Theorem

πk​=P(Y=k) = probability that a randomly chosen observation belongs to class K.

  • Example:

    • πApple=05→ 50% of all fruits are Apples.

    • πBanana=0.3→ 30% Bananas.

    • πCherry​=0.2 → 20% Cherries.

8
New cards

Linear Discriminant Analysis

  • Prior probability = your initial belief about a class.

  • Likelihood = how well the observed data fits that class.

  • Posterior probability = updated belief after seeing the data.

  • For each observation, calculate the discriminant for each class, assign to the class with the highest discriminant

  • To find the LDA decision boundary, find the point where these two discriminants are equivalent

  • Performs best with fewer observations, reducing the variance is important

9
New cards

Quadratic Discriminant Analysis

Like LDA, with the difference of a class specific variance rather an a common variance (how far they differ from the mean), allows for quadratic shaped decision boundaries

Performs best with large amounts of observations

10
New cards

Naive Bayes Classifier

Does not combine predictors in each class, but assumes they are independent

If Xj is quantitative:

○ Can assume within each class, the j th predictor comes from a normal distribution

In a line up of apples count every 5th apple as data

11
New cards

K-Nearest Neighbors

  • Non parametric

  • Many observations

  • Select a value of K, plot all training observations, for each test observation find the K nearest training observations, assign predicted value to test observation based on average known response value for the K nearest training observations

12
New cards

Generalized Linear Models

Useful when data is neither qualitative nor quantitative

○ Value more closely represents counts of a unit

○ Ex: CaBi ride share data, predicting number of riders

13
New cards

LDA & Logistic Regression

  • When it can be linear

  • Yes/No situation

14
New cards

QDA and Naive Bayes

  • Moderately non-linear boundaries

  • QDA allows each class to have its own covariance (curved boundaries).

  • Naive Bayes can capture more complex shapes depending on predictor distributions.

15
New cards

Non-parametric approach like KNN

  • Very non-linear boundaries

  • KNN does not assume any formula for the boundary. It decides based on local neighborhoods.