L4 - Heterogeneity and Probability Models

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/86

There's no tags or description

Looks like no tags are added yet.

Last updated 4:52 PM on 12/6/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

87 Terms

New cards

This week is about heterogeneity. Give three ways in which people can differ.

New cards

What is the result of heterogeneity for brands and strategies?

New cards

services

New cards

We have two types of heterogeneity, namely …

observed and unobserved

New cards

Give two examples of observed heterogeneity.

Household size

Income

New cards

Give two examples of observed heterogeneity.

heavy vs. light user
Coca Cola vs. Pepsi
price sensitive vs. not price sensitive

New cards

What has changed over the years leading to a new approach?

We now have detailed data on customer (purchase) behavior

New cards

What is the goal of basic clustering?

New cards

What are these observations?

New cards

There are two broad classes of algorithms for basic clustering. What are those?

New cards

What is the best known example of a non-parametric method for basic clustering?

k-means

New cards

What are the advantages of the k-means algorithm for basic clustering?

Simple, no distributional assumptions needed
Relatively fast

New cards

What are the disadvantages of the k-means algorithm for basic clustering?

New cards

Parametric methods alleviate some of the disadvantages (of course they have their own...). Which parametric method for clustering do we discuss in his course?

Mixture models

New cards

Consider this mixture model. What is he probability that a person is a male given his height?

New cards

How do we estimate the mixture model?

EM (Expectation Maximization) algorithm. Not ML because the likelihood function is difficult (to maximize)

New cards

What is the likelihood function here?

New cards

What is the idea of the EM Algorithm? What does this mean for the likelihood function?

New cards

Give the log complete data likelihood.

New cards

The EM algorithm consists of two steps. What is the first step?

New cards

The EM algorithm consists of two steps. What is the second step?

New cards

So what is the idea of the EM algorithm and what is the algorithm?

The idea of the EM algorithm is to do as if the states are known.

New cards

<p>What is the <strong>expected</strong> complete data likelihood for the general case of k states?</p>

What is the expected complete data likelihood for the general case of k states?

New cards

What is this thing?

New cards

What is smart to do next?

Divide it into two parts because this makes the M-step easier as each part can be considered separately (if parameters do not appear in multiple segments) and no products appear anymore.

<p>Divide it into two parts because this makes the M-step easier as each part can be considered separately (if parameters do not appear in multiple segments) and no products appear anymore.</p>

New cards

Which parameters are in this theta?

π_s, μ_s, σ_s²

New cards

Which result does maximizing the first half give us?

New cards

Which result does maximizing the second half give us?

New cards

So the EM Algorithm is only two steps?

No. After the M-step, use the updated parameters to do the E-step again. Iterate until convergence.

New cards

What can you say about convergence of EM?

EM usually converges quickly to the neighborhood of a maximum of the likelihood function, but final convergence can be slow

New cards

Why should we use multiple starting values when using EM?

To avoid the risk of finding only a local maximum

New cards

What is the easiest way to obtain standard erros when using the EM algorithm?

use second order derivative of (standard) likelihood function

New cards

We can extnd the EM algorithm to mulivariate normal distributions. What changes in hat case?

New cards

What is cool about multivariate mixture of normals in terms of prediction?

We can predict y₁ given y₂

New cards

What is the first thing we do to predict y₁ given y₂?

New cards

After splitting everything in two, what is the next thing we do to predict y₁ given y₂?

We construct

New cards

<p>How do we rewrite this thing in order to predict y<sub>1</sub> given y<sub>2</sub>?</p>

How do we rewrite this thing in order to predict y₁ given y₂?

New cards

Okay but what is now the first step to turn this into an actual prediction?

New cards

What can we substitute for this?

New cards

What can we substitute for this?

New cards

Now give the complete rewriting yourself

New cards

What are the advantages of clustering using mixtures?

New cards

What are the disadvantages of clustering using mixtures?

New cards

What are the challenges we face when clustering using mixtures?

New cards

Name some ways in which we can generalize the mixture model. In what context is this especially powerful?

New cards

What is this model called?

Latent class model

New cards

Can people switch segments? And according to which process?

NO People stay in the same segment over time!

New cards

What do you notice when looking at the parameters?

Not all parameters are (or have to be) segment specific.

New cards

How do we estimate a latent class model?

New cards

Why cannot we split the maximization over the segments anymore?

due to γ

New cards

Explain what we mean by learning in this course.

New cards

If no observed decisions are available something weird happens in this model. What is it and how could we solve it?

New cards

How?

New cards

Can we still learn by observing behaviour?

New cards

So how many different values of the parameters does this allow?

New cards

How can we generalize this to a continuous distribution?

New cards

What is a well-known application of this idea?

Mixed logit

New cards

Explain the mixed logit model in detail.

New cards

What are advantages of the mixed logit model?

New cards

Express this in terms of the distribution of the data conditional on theta and the distribution of theta.

New cards

Can we give this expectation now?

This expression only contains known (estimated) density functions. However calculating this is not easy (neither is estimation).

New cards

Should we use continuous or discrete heterogeneity if we are interested in segmentation?

Discrete

New cards

Should we use continuous or discrete heterogeneity if we are interested in forecasting?

The forecasting performance between the two is comparable

New cards

Should we use discrete or continuous heterogeneity if we do not want to assume a particular distribution?

If you do not want to assume a particular distribution, use the latent class approach with many classes (the mixture approximates the true distribution as number of mixtures increases)

New cards

A combination of discrete and continuous heterogeneity is also possible. Give an example of when it would make sense to use his.