L4 - Heterogeneity and Probability Models

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/86

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

87 Terms

1
New cards

This week is about heterogeneity. Give three ways in which people can differ.

knowt flashcard image
2
New cards

What is the result of heterogeneity for brands and strategies?

knowt flashcard image
3
New cards
term image

services

4
New cards

We have two types of heterogeneity, namely …

observed and unobserved

5
New cards

Give two examples of observed heterogeneity.

Household size

Income

6
New cards

Give two examples of observed heterogeneity.

  • heavy vs. light user

  • Coca Cola vs. Pepsi

  • price sensitive vs. not price sensitive

7
New cards
<p>What has changed over the years leading to a new approach?</p>

What has changed over the years leading to a new approach?

We now have detailed data on customer (purchase) behavior

8
New cards
term image
knowt flashcard image
9
New cards

What is the goal of basic clustering?

knowt flashcard image
10
New cards
<p>What are these observations?</p>

What are these observations?

knowt flashcard image
11
New cards

There are two broad classes of algorithms for basic clustering. What are those?

knowt flashcard image
12
New cards

What is the best known example of a non-parametric method for basic clustering?

k-means

13
New cards

What are the advantages of the k-means algorithm for basic clustering?

  • Simple, no distributional assumptions needed

  • Relatively fast

14
New cards

What are the disadvantages of the k-means algorithm for basic clustering?

knowt flashcard image
15
New cards

Parametric methods alleviate some of the disadvantages (of course they have their own...). Which parametric method for clustering do we discuss in his course?

Mixture models

16
New cards
<p>Consider this mixture model. What is he probability that a person is a male given his height?</p>

Consider this mixture model. What is he probability that a person is a male given his height?

knowt flashcard image
17
New cards

How do we estimate the mixture model?

EM (Expectation Maximization) algorithm. Not ML because the likelihood function is difficult (to maximize)

18
New cards
<p>What is the likelihood function here?</p>

What is the likelihood function here?

knowt flashcard image
19
New cards
<p>What is the idea of the EM Algorithm? What does this mean for the likelihood function?</p>

What is the idea of the EM Algorithm? What does this mean for the likelihood function?

knowt flashcard image
20
New cards
<p>Give the log complete data likelihood.</p>

Give the log complete data likelihood.

knowt flashcard image
21
New cards

The EM algorithm consists of two steps. What is the first step?

knowt flashcard image
22
New cards

The EM algorithm consists of two steps. What is the second step?

knowt flashcard image
23
New cards

So what is the idea of the EM algorithm and what is the algorithm?

The idea of the EM algorithm is to do as if the states are known.

<p>The idea of the EM algorithm is to do as if the states are known. </p>
24
New cards
<p>What is the <strong>expected</strong> complete data likelihood for the general case of k states?</p>

What is the expected complete data likelihood for the general case of k states?

knowt flashcard image
25
New cards
<p>What is this thing?</p>

What is this thing?

knowt flashcard image
26
New cards
<p>What is smart to do next?</p>

What is smart to do next?

Divide it into two parts because this makes the M-step easier as each part can be considered separately (if parameters do not appear in multiple segments) and no products appear anymore.

<p>Divide it into two parts because this makes the M-step easier as each part can be considered separately (if parameters do not appear in multiple segments) and no products appear anymore.</p>
27
New cards
<p>Which parameters are in this theta?</p>

Which parameters are in this theta?

πs, μs, σs2

28
New cards
<p>Which result does maximizing the first half give us?</p>

Which result does maximizing the first half give us?

knowt flashcard image
29
New cards
<p>Which result does maximizing the second half give us?</p>

Which result does maximizing the second half give us?

knowt flashcard image
30
New cards

So the EM Algorithm is only two steps?

No. After the M-step, use the updated parameters to do the E-step again. Iterate until convergence.

31
New cards

What can you say about convergence of EM?

EM usually converges quickly to the neighborhood of a maximum of the likelihood function, but final convergence can be slow

32
New cards

Why should we use multiple starting values when using EM?

To avoid the risk of finding only a local maximum

33
New cards

What is the easiest way to obtain standard erros when using the EM algorithm?

use second order derivative of (standard) likelihood function

34
New cards

We can extnd the EM algorithm to mulivariate normal distributions. What changes in hat case?

knowt flashcard image
35
New cards

What is cool about multivariate mixture of normals in terms of prediction?

We can predict y1 given y2

36
New cards

What is the first thing we do to predict y1 given y2?

knowt flashcard image
37
New cards

After splitting everything in two, what is the next thing we do to predict y1 given y2?

We construct

<p>We construct </p>
38
New cards
<p>How do we rewrite this thing in order to predict y<sub>1</sub> given y<sub>2</sub>?</p>

How do we rewrite this thing in order to predict y1 given y2?

knowt flashcard image
39
New cards
term image
knowt flashcard image
40
New cards
term image
knowt flashcard image
41
New cards
<p>Okay but what is now the first step to turn this into an actual prediction?</p>

Okay but what is now the first step to turn this into an actual prediction?

knowt flashcard image
42
New cards
<p>What can we substitute for this?</p>

What can we substitute for this?

knowt flashcard image
43
New cards
<p>What can we substitute for this?</p>

What can we substitute for this?

knowt flashcard image
44
New cards
<p>Now give the complete rewriting yourself</p>

Now give the complete rewriting yourself

knowt flashcard image
45
New cards

What are the advantages of clustering using mixtures?

knowt flashcard image
46
New cards

What are the disadvantages of clustering using mixtures?

knowt flashcard image
47
New cards

What are the challenges we face when clustering using mixtures?

knowt flashcard image
48
New cards

Name some ways in which we can generalize the mixture model. In what context is this especially powerful?

knowt flashcard image
49
New cards
<p>What is this model called?</p>

What is this model called?

Latent class model

50
New cards
<p>Can people switch segments? And according to which process?</p>

Can people switch segments? And according to which process?

NO People stay in the same segment over time!

51
New cards
<p>What do you notice when looking at the parameters?</p>

What do you notice when looking at the parameters?

Not all parameters are (or have to be) segment specific.

52
New cards

How do we estimate a latent class model?

knowt flashcard image
53
New cards
<p>Why cannot we split the maximization over the segments anymore? </p>

Why cannot we split the maximization over the segments anymore?

due to γ

54
New cards

Explain what we mean by learning in this course.

knowt flashcard image
55
New cards
<p>If no observed decisions are available something weird happens in this model. What is it and how could we solve it?</p>

If no observed decisions are available something weird happens in this model. What is it and how could we solve it?

knowt flashcard image
56
New cards
<p>How?</p>

How?

knowt flashcard image
57
New cards
<p>Can we still learn by observing behaviour?</p>

Can we still learn by observing behaviour?

knowt flashcard image
58
New cards
<p>So how many different values of the parameters does this allow?</p>

So how many different values of the parameters does this allow?

knowt flashcard image
59
New cards
<p>How can we generalize this to a continuous distribution? </p>

How can we generalize this to a continuous distribution?

knowt flashcard image
60
New cards
<p>What is a well-known application of this idea?</p>

What is a well-known application of this idea?

Mixed logit

61
New cards

Explain the mixed logit model in detail.

knowt flashcard image
62
New cards

What are advantages of the mixed logit model?

knowt flashcard image
63
New cards
<p>Express this in terms of the distribution of the data conditional on theta and the distribution of theta.</p>

Express this in terms of the distribution of the data conditional on theta and the distribution of theta.

knowt flashcard image
64
New cards
<p>Can we give this expectation now?</p>

Can we give this expectation now?

This expression only contains known (estimated) density functions. However calculating this is not easy (neither is estimation).

<p>This expression only contains known (estimated) density functions. However calculating this is not easy (neither is estimation). </p>
65
New cards

Should we use continuous or discrete heterogeneity if we are interested in segmentation?

Discrete

66
New cards

Should we use continuous or discrete heterogeneity if we are interested in forecasting?

The forecasting performance between the two is comparable

67
New cards

Should we use discrete or continuous heterogeneity if we do not want to assume a particular distribution?

If you do not want to assume a particular distribution, use the latent class approach with many classes (the mixture approximates the true distribution as number of mixtures increases)

68
New cards

A combination of discrete and continuous heterogeneity is also possible. Give an example of when it would make sense to use his.

knowt flashcard image
69
New cards
<p>Give the mathematical specification for the segments and parameters. Also tell which estimation method is used.</p>

Give the mathematical specification for the segments and parameters. Also tell which estimation method is used.

knowt flashcard image
70
New cards

Some models completely rely on heterogeneity. What are those models called?

Probability models

71
New cards
<p>Fill in the gaps </p>

Fill in the gaps

knowt flashcard image
72
New cards

Give three examples of probability models.

knowt flashcard image
73
New cards
<p>What are the two questions that we want to answer?</p>

What are the two questions that we want to answer?

knowt flashcard image
74
New cards
<p>What do these symbols denote?</p>

What do these symbols denote?

knowt flashcard image
75
New cards

How is the (unobserved) defection time denoted in the BTYD model and what is its distribution?

knowt flashcard image
76
New cards
<p>What are the expected value and variance of the (unobserved) defection time?</p>

What are the expected value and variance of the (unobserved) defection time?

knowt flashcard image
77
New cards

How are while alive purchases distributed in the BTYD model?

knowt flashcard image
78
New cards
term image
knowt flashcard image
79
New cards
<p>What can we say about these?</p>

What can we say about these?

knowt flashcard image
80
New cards

So what is the distribution of the number of purchases given alive?

Negative Binomial Distribution

81
New cards

So what is the distribution of time until defection?

Pareto distribution

82
New cards

How can we do inference in the Pareto-NBD?

knowt flashcard image
83
New cards

What if we ignore unobserved heterogeneity?

knowt flashcard image
84
New cards

In a choice model: ignoring heterogeneity leads to overestimation of …

state dependence

85
New cards

What is state dependence and what is the difference with loyal?

<p></p>
86
New cards

Loyalty and state dependence give very similar choice patterns, ignoring one leads to …

overestimation of the other.

87
New cards
term image
knowt flashcard image