Classification Models I

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/21

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

22 Terms

New cards

classification problems in marketing

• Who are the target segment?

• Who are the profitable consumers?

• Will this person like this movie (for movie recommendation)? (e.g., Netflix)

• Others) Spam email? Potential customers who will open the bank account?

• Commonly used models:

• Linear Discriminant Analysis; Naïve Bayes, Decision Trees, Random

Forest, Neural Networks, Support Vector Machine, etc.

New cards

machine learning

explores the construction of algorithms that can learn from and make predictions on data
how the response variable varies depending on the value of given predictors

New cards

supervised learning

machine learning task of inferring a function from LABELED training data
classification: inputs are divided into two or more categories, and the learner assigns unseen inputs to one category by using model. This is typically tackled in a supervised way. --- Spam filtering: the inputs are emails and the categories are "spam" and "not spam"

New cards

supervised learning classification methods

Decision Trees, Ensembles (Bagging, Boosting, Random Forest), Logistic Regression, Support Vector Machine.

New cards

unsupervised learning

NO LABELS are given to the learning algorithm, leaving it on its own to find structure in its input -- discovering hidden patterns (latent structure) in data (wiki).
clustering

New cards

clustering

unsupervised learning
a set of inputs is to be divided into groups (or segments). Unlike classification, the groups are not known beforehand (i.e., no label information), making this typically an unsupervised task – Searching hidden structure.
Hierarchical Clustering, K-means method, Model-based Clustering, etc.

New cards

Classification and Regression Tree (CART) Advantages

a single decision tree
Computationally simple and quick to fit, even for large problems.
Automatic variable selection.
Very easy to interpret (if the tree is small).
Tree picture provides valuable insights (Intuitive).
Terminal nodes suggest a clustering of data

New cards

Classification and Regression Tree (CART) Disadvantages

• Accuracy – relatively lower than other tree methods (e.g., Random forest, etc.)

• Instability – if changing the data a little, the tree picture can change a lot (especially, if the first splitting is changed).

• Thus, practically boosting or ensemble models are used more.

New cards

CART

There is Y (target DV to be classified) and related X as classifiers.

• We denote the feature space by X.

• Tree structured classifiers are constructed by repeated splits of the space X into smaller and smaller subsets, beginning with X itself.

• Definitions: parent node, child node, terminal (leaf) node – see next slides.

New cards

impurity

Impurity measures how mixed a node is.

Pure node → all observations in one class
Impure node → mixed classes

Goal: maximize purity after each split

<p>Impurity measures <strong>how mixed a node is</strong>.</p><ul><li><p>Pure node → all observations in one class</p></li><li><p>Impure node → mixed classes</p></li></ul><p>Goal: <strong>maximize purity</strong> after each split</p>

New cards

entropy

common way to measure impurity

Key cases:

Entropy = 0 → perfectly pure
Entropy = 1 → 50/50 split (max impurity)

Used to evaluate how good a split is

<p>common way to measure impurity </p><p>Key cases:</p><ul><li><p><strong>Entropy = 0</strong> → perfectly pure</p></li><li><p><strong>Entropy = 1</strong> → 50/50 split (max impurity)</p></li></ul><p>Used to evaluate how good a split is</p>

New cards

information gain

want to determine which attribute is most useful for discriminating between the classes to be learned
tells us how important a given attribute of the feature vectors is

New cards

information gain formula

entropy(parent) - [average entropy (children)]

New cards

CART selects what variables?

CART automatically selects variables that:

Reduce impurity the most (highest information gain)
Improve classification accuracy

ex: duration previous

prediction hit ratio- look at root node error

New cards

prediction hit ratio in CART

look at root node error

ex: 3488/30891= 0.11291

error rate= 0.11291

HIT RATIO(accuracy) = 1 - error rate

1- 0.11291 = .88709

88.7% prediction hit ratio

New cards

best attribute CART

• The one which will result in the smallest tree

• Heuristic: choose the attribute that produces the “purest” nodes

New cards

linear discriminant analysis

[a classic method] As a possible example, a predictor (X1) of “Business Application” attribute might be able to classify students between the “Business Analytics” group and the “statistics/engineering” group

New cards

LDA goal

to find the discriminant function Z (e.g., a linear combination of the predictors) that leads to an optimal division of the groups

New cards

LDA models

• Predictor variables are normally distributed (i.e., Multivariate Normal Distribution) – if not, we can consider other methods (e.g., logistic regression, tree models).

• That is, predictors for LDA should be continuous variables. – usually people transform the data (i.e., scale function in R).

New cards

LDA function

derives the linear combination of 2 (or more) independent variables that will discriminate between discrete groups

linear combination aka discriminant function or axis) takes the following form (in image)

New cards

linear combination

formula

The discriminant weights (bi) are chosen to maximize the ratio of the between-group variance relative to the within- group variance

New cards

quadratic discriminant analysis

provides a non-linear quadratic decision boundary. when the decision boundary is moderately non-linear, QDA may give better results