Chapter 4: Classification

0.0(0)
studied byStudied by 3 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/21

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

22 Terms

1
New cards
Classification Task
[Classification] The goal is to build a function C(X) that takes a feature vector X as input and predicts its value for the qualitative response Y, where C(X) belongs to a set of unordered categories C. Instead of just classifying, it is often more valuable to estimate the probabilities that X belongs to each category in C.
2
New cards
Linear Regression for Classification
[Classification] Linear regression can sometimes do a good job as a classifier, particularly in the case of a binary outcome. With a binary outcome, linear regression is equivalent to linear discriminant analysis. However, linear regression can produce probabilities less than zero or greater than one, making logistic regression more appropriate. For response variables with more than two values, multiclass logistic regression or discriminant analysis are more appropriate than linear regression.
3
New cards
Logistic Regression
[Classification] Logistic regression is more appropriate than linear regression because it ensures that the estimate for p(X) lies between 0 and 1. Logistic regression uses the form p(X) = e^(β0+β1X) / (1 + e^(β0+β1X)). The logit transformation is log(p(X) / (1− p(X))) = β0 + β1X. The parameters are estimated using maximum likelihood.
4
New cards
Logistic Regression with Several Variables
[Classification] Logistic regression can be extended to include multiple predictors.
5
New cards

Confounding

[Classification] can occur when the coefficient for a variable changes sign when other variables are added to the model.

6
New cards
Example
South African Heart Disease
7
New cards
Logistic Regression with More Than Two Classes
[Classification] Logistic regression can be generalized to more than two classes. Multiclass logistic regression is also referred to as multinomial regression.
8
New cards

Discriminant Analysis

[Classification] it models the distribution of X in each of the classes separately and uses Bayes' theorem to obtain Pr(Y |X). Using normal (Gaussian) distributions for each class leads to linear or quadratic discriminant analysis.

9
New cards
Bayes Theorem for Classification
[Classification] Bayes theorem is used to classify based on the highest density.
10
New cards
Why Discriminant Analysis?
[Classification] When classes are well-separated or n is small and the predictors X are approximately normal in each of the classes, linear discriminant analysis is more stable than logistic regression. Linear discriminant analysis also provides low-dimensional views of the data and is popular when there are more than two response classes.
11
New cards
Linear Discriminant Analysis when p = 1
[Classification] Assuming equal variances, plugging the Gaussian density into Bayes' formula and taking logs simplifies the classification task to assigning to the class with the largest discriminant score
12
New cards
Estimating Parameters
[Classification] Parameters are estimated using the training data.
13
New cards
Linear Discriminant Analysis when p > 1
[Classification] The discriminant function is δk(x) = xTΣ−1µk − 1/2 µTkΣ−1µk + log πk.
14
New cards
Fisher's Iris Data
[Classification] LDA classifies all but 3 of the 150 training samples correctly. With K classes, linear discriminant analysis can be viewed exactly in a K − 1 dimensional plot because it classifies to the closest centroid.
15
New cards
From δk(x) to Probabilities
[Classification] Estimates δ̂k(x) can be turned into estimates for class probabilities using P̂r(Y = k|X = x) = e^(δ̂k(x)) / (∑Kl=1 e^(δ̂l(x))).
16
New cards
LDA on Credit Data
[Classification] An example of LDA on credit data shows a 2.75% misclassification rate. However, classifying to the prior would result in only a 3.33% error rate.
17
New cards
Types of Errors
[Classification] False positive and false negative rates can be changed by varying the threshold. ROC plots display both simultaneously. AUC (area under the curve) is used to summarize overall performance.
18
New cards
Other Forms of Discriminant Analysis
[Classification] Altering the forms for fk(x) results in different classifiers. Using Gaussians but different Σk in each class results in quadratic discriminant analysis. With conditional independence, a naive Bayes classifier is obtained.
19
New cards
Quadratic Discriminant Analysis
[Classification] Because the Σk are different, the quadratic terms matter.
20
New cards

Naive Bayes

[Classification] assumes features are independent in each class and is useful when p is large.

21
New cards

Logistic Regression versus LDA

[Classification] For a two-class problem, LDA has the same form as logistic regression [43-46]. The difference is in how the parameters are estimated: Logistic regression uses conditional likelihood, while LDA uses full likelihood [44-47].

22
New cards
Summary
[Classification] Logistic regression is popular for classification, especially when K = 2. LDA is useful when n is small, or the classes are well separated and Gaussian assumptions are reasonable, and when K > 2. Naive Bayes is useful when p is very large.