Chapter 4: Classification

0.0(0)

Studied by 3 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/21

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

22 Terms

New cards

Classification Task

[Classification] The goal is to build a function C(X) that takes a feature vector X as input and predicts its value for the qualitative response Y, where C(X) belongs to a set of unordered categories C. Instead of just classifying, it is often more valuable to estimate the probabilities that X belongs to each category in C.

New cards

Linear Regression for Classification

[Classification] Linear regression can sometimes do a good job as a classifier, particularly in the case of a binary outcome. With a binary outcome, linear regression is equivalent to linear discriminant analysis. However, linear regression can produce probabilities less than zero or greater than one, making logistic regression more appropriate. For response variables with more than two values, multiclass logistic regression or discriminant analysis are more appropriate than linear regression.

New cards

Logistic Regression

[Classification] Logistic regression is more appropriate than linear regression because it ensures that the estimate for p(X) lies between 0 and 1. Logistic regression uses the form p(X) = e^(β0+β1X) / (1 + e^(β0+β1X)). The logit transformation is log(p(X) / (1− p(X))) = β0 + β1X. The parameters are estimated using maximum likelihood.

New cards

Logistic Regression with Several Variables

[Classification] Logistic regression can be extended to include multiple predictors.

New cards

Confounding

[Classification] can occur when the coefficient for a variable changes sign when other variables are added to the model.

New cards

Example

South African Heart Disease

New cards

Logistic Regression with More Than Two Classes

[Classification] Logistic regression can be generalized to more than two classes. Multiclass logistic regression is also referred to as multinomial regression.

New cards

Discriminant Analysis

[Classification] it models the distribution of X in each of the classes separately and uses Bayes' theorem to obtain Pr(Y |X). Using normal (Gaussian) distributions for each class leads to linear or quadratic discriminant analysis.

New cards

Bayes Theorem for Classification

[Classification] Bayes theorem is used to classify based on the highest density.

New cards

Why Discriminant Analysis?

[Classification] When classes are well-separated or n is small and the predictors X are approximately normal in each of the classes, linear discriminant analysis is more stable than logistic regression. Linear discriminant analysis also provides low-dimensional views of the data and is popular when there are more than two response classes.

New cards

Linear Discriminant Analysis when p = 1

[Classification] Assuming equal variances, plugging the Gaussian density into Bayes' formula and taking logs simplifies the classification task to assigning to the class with the largest discriminant score

New cards

Estimating Parameters

[Classification] Parameters are estimated using the training data.

New cards

Linear Discriminant Analysis when p > 1

[Classification] The discriminant function is δk(x) = xTΣ−1µk − 1/2 µTkΣ−1µk + log πk.

New cards

Fisher's Iris Data

[Classification] LDA classifies all but 3 of the 150 training samples correctly. With K classes, linear discriminant analysis can be viewed exactly in a K − 1 dimensional plot because it classifies to the closest centroid.

New cards

From δk(x) to Probabilities

[Classification] Estimates δ̂k(x) can be turned into estimates for class probabilities using P̂r(Y = k|X = x) = e^(δ̂k(x)) / (∑Kl=1 e^(δ̂l(x))).

New cards

LDA on Credit Data

[Classification] An example of LDA on credit data shows a 2.75% misclassification rate. However, classifying to the prior would result in only a 3.33% error rate.

New cards

Types of Errors

[Classification] False positive and false negative rates can be changed by varying the threshold. ROC plots display both simultaneously. AUC (area under the curve) is used to summarize overall performance.

New cards

Other Forms of Discriminant Analysis

[Classification] Altering the forms for fk(x) results in different classifiers. Using Gaussians but different Σk in each class results in quadratic discriminant analysis. With conditional independence, a naive Bayes classifier is obtained.

New cards

Quadratic Discriminant Analysis

[Classification] Because the Σk are different, the quadratic terms matter.

New cards

Naive Bayes

[Classification] assumes features are independent in each class and is useful when p is large.

New cards

Logistic Regression versus LDA

[Classification] For a two-class problem, LDA has the same form as logistic regression [43-46]. The difference is in how the parameters are estimated: Logistic regression uses conditional likelihood, while LDA uses full likelihood [44-47].

New cards

Summary

[Classification] Logistic regression is popular for classification, especially when K = 2. LDA is useful when n is small, or the classes are well separated and Gaussian assumptions are reasonable, and when K > 2. Naive Bayes is useful when p is very large.