Quizzes from Categorical Data Analysis

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/20

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

21 Terms

1
New cards

For a dataset with variables relating to heart disease you have a saturated model with maximized likelihood of 187 and df= 10 and a reduced model, where 3 dummy variables that categorized the blood pressure levels of each individual (blood pressure levels and had 4 levels) where removed, had a maximized loglikelihood of 182, and using level of 0.05 and can we conclude about blood pressure levels?

  • The variable blood pressure levels improves the model.

  • At least one of the betas for the dummy variables related to blood pressure levels are likely not equal to zero.

2
New cards

For a dataset with variables related to heart disease you have a statured model with maximized loglikelihood of 200 and df=10 and reduced model, where the 4 dummy variables that categorized the smoking status of each indviduals (smoking status has 5 levels) where removed, had a maximized loglikelihood of 193, and using an alpha level of 0.05 what can we conclude about smoking status?

-At least one of the betas for the dummy variables relating to smoking status are likely not to equal to zero.

-The variable smoking status improves the model.

3
New cards

If you are using a model with multiple interactions between explanatory variables when all the literature suggests no interactions are necessary, based on information alone the model is likely what?

-Model is too complex.

4
New cards

Why are unstable estimates a problem?

They cause non statistically significant variables to always appear statistically significant.

5
New cards

You can use a likelihood ratio test to compare the following two models.

Model 1: logit(P(Y-1)= interpret+ B1*Weight+ B2*Income+B3*Age

Model 2: logit(P(Y=1)= intercept+ B1*Weight+ B2*Income+ B3*Gender+B4*Weight*Income.

False.

6
New cards

You can use a likelihood ratio test to compare the following two models

Model 1: logit(P(Y=1)= intercept+B1*Age+B2*Gender+B3*Race

Model 2: logit(P(Y=1)= intercept+B1*Age+B2*Gender+B3*Salary

False

7
New cards

Only statistically significant variables should be included in the model.

False.

8
New cards

Based on the information provided in the data?

Model 7- 302.16 since it is the smallest value

gender

9
New cards

Based on the information provided which model best fits the data?

Model 2- 320.17 since it is the smallest in value of the data.

10
New cards

If we are conducting a study with a single binary predictor where we want to determine if the intervention will increase the proportion of students who want to go to college what sample size do we need if we have the following set parameters?

Alpha = .05

Power = .85

Control Group Proportion of students who want to go to college = 70%

Treatment Group of students who want to go to college = 80%

670

11
New cards

If we are conducting a study with a single binary predictor where we want to determine if the intervention will reduce teen pregnancy what Power do we have if we have the following set parameters?

Alpha = .1

Control Group Pregnancy Rate = 5%

Treatment Group Pregnancy Rate = 2.5%

Npergroup = 450

Round your answer to 3 decimal places.

0.631

12
New cards

The table below is the result of a SAS output where a multi-nominal logistic regression was conducted on the outcome of Deathcause with Cholesterol as the single explanatory variable.

Deathcause has the following values: Cancer, Cerebral Vascular Disease, and Coronary Heart Disease.  Cancer was used as the reference category for the multi-nominal logistic regression.

Cholesterol is a continuous measure in units of milligrams per deciliter (mg/dL).

Analysis of Maximum Likelihood Estimates

Parameter

DeathCause

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

Intercept

Cerebral Vascular Disease

1

-1.3068

0.3720

12.3431

0.0004

Intercept

Coronary Heart Disease

1

-2.0951

0.3359

38.9038

<.0001

Cholesterol

Cerebral Vascular Disease

1

0.00410

0.00158

6.7747

0.0092

Cholesterol

Coronary Heart Disease

1

0.00932

0.00140

44.2122

<.0001

Using the table above select the appropriate values that creates the prediction equation comparing the outcome values of Cancer to Coronary Heart Disease.

 

Pi1 means probability of Cerebral Vascular Disease

Pi2 means probability of Coronary Heart Disease

Pi3 means probability of Cancer

log(pi3/pi2=2.0951+0.00932*Cholesterol

13
New cards

The table below is the result of a SAS output where a multi-nominal logistic regression was conducted on the outcome of Deathcause with Cholesterol as the single explanatory variable.

Deathcause has the following values: Cancer, Cerebral Vascular Disease, and Coronary Heart Disease.  Cancer was used as the reference category for the multi-nominal logistic regression.

Cholesterol is a continuous measure in units of milligrams per deciliter (mg/dL).

Analysis of Maximum Likelihood Estimates

Parameter

DeathCause

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

Intercept

Cerebral Vascular Disease

1

-1.3068

0.3720

12.3431

0.0004

Intercept

Coronary Heart Disease

1

-2.0951

0.3359

38.9038

<.0001

Cholesterol

Cerebral Vascular Disease

1

0.00410

0.00158

6.7747

0.0092

Cholesterol

Coronary Heart Disease

1

0.00932

0.00140

44.2122

<.0001


 

Using the table above determine the value of Cholesterol when Pi1 = Pi2

 

Pi1 means probability of Cerebral Vascular Disease

Pi2 means probability of Coronary Heart Disease

Pi3 means probability of Cancer

151

14
New cards

The table below is the result of a SAS output where a multi-nominal logistic regression was conducted on the outcome of Deathcause with Cholesterol as the single explanatory variable.

Deathcause has the following values: Cancer, Cerebral Vascular Disease, and Coronary Heart Disease.  Cancer was used as the reference category for the multi-nominal logistic regression.

Cholesterol is a continuous measure in units of milligrams per deciliter (mg/dL).

Analysis of Maximum Likelihood Estimates

Parameter

DeathCause

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

Intercept

Cerebral Vascular Disease

1

-1.3068

0.3720

12.3431

0.0004

Intercept

Coronary Heart Disease

1

-2.0951

0.3359

38.9038

<.0001

Cholesterol

Cerebral Vascular Disease

1

0.00410

0.00158

6.7747

0.0092

Cholesterol

Coronary Heart Disease

1

0.00932

0.00140

44.2122

<.0001


 

Using the table above determine the probability of someone dying of Cancer when Cholesterol equals 450.

 

Do not round the values in the table during your calculations, calculate your final answer to 4 decimals.

0.092

15
New cards

The table below is the result of a SAS output where a multi-nominal logistic regression was conducted on the outcome of Deathcause with Cholesterol as the single explanatory variable.

Deathcause has the following values: Cancer, Cerebral Vascular Disease, and Coronary Heart Disease.  Cancer was used as the reference category for the multi-nominal logistic regression.

Cholesterol is a continuous measure in units of milligrams per deciliter (mg/dL).

Analysis of Maximum Likelihood Estimates

Parameter

DeathCause

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

Intercept

Cerebral Vascular Disease

1

-1.3068

0.3720

12.3431

0.0004

Intercept

Coronary Heart Disease

1

-2.0951

0.3359

38.9038

<.0001

Cholesterol

Cerebral Vascular Disease

1

0.00410

0.00158

6.7747

0.0092

Cholesterol

Coronary Heart Disease

1

0.00932

0.00140

44.2122

<.0001


 

Using the table above select the appropriate interpretation for the Odds Ratio related to Cholesterol where Cerebral Vascular Disease is the outcome of interest compared to Coronary Heart Disease.

As cholesterol increases by 1 md/L the odds that someone dies from Cerebral Vascular Disease rather than Coronary Heart Disease decreases by 0.5%

16
New cards

The table below is the result of a SAS output where a Ordinal logistic regression was conducted on the outcome of Smoking Status (number of cigarettes smoked per day) with Sex and Weight as the explanatory variables.

Smoking Status had the following values:

0 = Non-smoker

1 = Light (1-5)

2 = Moderate (6-15)

3 = Heavy (16-25)

4 = Very Heavy (26+)

The regression modeled logit(P(Smoking Status <= j)) where j equals the values of the variable smoking status

 

Sex had the values of Male and Female (Female was used as the reference category)

Weight is a continuous measure in units pounds (lbs)

 

Analysis of Maximum Likelihood Estimates

Parameter

 

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

Intercept

0

1

-1.7137

0.3178

29.0714

<.0001

Intercept

1

1

-1.298

0.3165

16.8223

<.0001

Intercept

2

1

-0.788

0.3153

6.2452

0.0125

Intercept

3

1

0.6098

0.3172

3.6966

0.0545

Weight

 

1

0.00520

0.00184

8.0234

0.0046

Sex

Male

1

-1.5177

0.1118

184.2406

<.0001


 

Based on the table above where there are multiple intercepts, but only a single estimate for each explanatory variable.  Select the answer below that explains why.

The proportional odds assumption is being applied.

17
New cards

The table below is the result of a SAS output where a Ordinal logistic regression was conducted on the outcome of Smoking Status (number of cigarettes smoked per day) with Sex and Weight as the explanatory variables.

Smoking Status had the following values:

0 = Non-smoker

1 = Light (1-5)

2 = Moderate (6-15)

3 = Heavy (16-25)

4 = Very Heavy (26+)

The regression modeled logit(P(Smoking Status <= j)) where j equals the values of the variable smoking status

 

Sex had the values of Male and Female (Female was used as the reference category)

Weight is a continuous measure in units pounds (lbs)

 

Analysis of Maximum Likelihood Estimates

Parameter

 

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

Intercept

0

1

-1.7137

0.3178

29.0714

<.0001

Intercept

1

1

-1.298

0.3165

16.8223

<.0001

Intercept

2

1

-0.788

0.3153

6.2452

0.0125

Intercept

3

1

0.6098

0.3172

3.6966

0.0545

Weight

 

1

0.00520

0.00184

8.0234

0.0046

Sex

Male

1

-1.5177

0.1118

184.2406

<.0001


 

What is the prediction equation for someone smoking 1-5 cigarettes per day or less?

 

logit(P(Smoking Status<=1)) = -1.298 + (0.0052*weight) + (-1.5177*Sex)

18
New cards

The table below is the result of a SAS output where a Ordinal logistic regression was conducted on the outcome of Smoking Status (number of cigarettes smoked per day) with Sex and Weight as the explanatory variables.

Smoking Status had the following values:

0 = Non-smoker

1 = Light (1-5)

2 = Moderate (6-15)

3 = Heavy (16-25)

4 = Very Heavy (26+)

The regression modeled logit(P(Smoking Status <= j)) where j equals the values of the variable smoking status

 

Sex had the values of Male and Female (Female was used as the reference category)

Weight is a continuous measure in units pounds (lbs)

 

Analysis of Maximum Likelihood Estimates

Parameter

 

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

Intercept

0

1

-1.7137

0.3178

29.0714

<.0001

Intercept

1

1

-1.298

0.3165

16.8223

<.0001

Intercept

2

1

-0.788

0.3153

6.2452

0.0125

Intercept

3

1

0.6098

0.3172

3.6966

0.0545

Weight

 

1

0.00520

0.00184

8.0234

0.0046

Sex

Male

1

-1.5177

0.1118

184.2406

<.0001


 

Using the table above select the appropriate interpretation for the Odds Ratio associated with Weight.

As weight increases by 1lb the odds that someone will smoke less rather than more increases by 0.5%.

19
New cards

The table below is the result of a SAS output where a Ordinal logistic regression was conducted on the outcome of Smoking Status (number of cigarettes smoked per day) with Race and Sex as the explanatory variables.

Smoking Status had the following values:

0 = Non-smoker

1 = Light (1-5)

2 = Moderate (6-15)

3 = Heavy (16-25)

4 = Very Heavy (26+)

The regression modeled logit(P(Smoking Status <= j)) where j equals the values of the variable smoking status

 

Race has the values of White, African American, and Other (where White was used as the reference category).

Sex has the values of Female and Male (Male was used as the reference category)

 

Analysis of Maximum Likelihood Estimates

Parameter

 

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

Intercept

0

1

0.5081

0.1105

21.1564

<.0001

Intercept

1

1

1.0219

0.1133

81.3440

<.0001

Intercept

2

1

1.6273

0.1176

191.4537

<.0001

Intercept

3

1

3.1698

0.1349

552.1106

<.0001

Race

Black

1

-1.8215

0.1396

170.1347

<.0001

Race

Other

1

-2.2165

0.1336

275.3643

<.0001

Sex

Female

1

1.3100

0.1100

141.7405

<.0001

 

Use the table above to Calculate P(Smoking Status = 3) where Race = Black and Sex = Female.

Do not round the values in the table above, do round your final answer to 4 decimal places.

 

Hint: The table above gives you values to create a prediction equation for logit(P(Y<=j)), the question is asking for P(Y=j)

0.1813

20
New cards

Do you get a yearly flu shot

Yes

No

Do you get a yearly physical

Yes

325

56

No

36

100

 

Using the table above calculate the Test statistic for the McNemar Test.

4.348

21
New cards

Do you get a yearly flu shot

Yes

No

Do you get a yearly physical

Yes

125

15

No

30

75

 

Select the appropriate conclusion about people who get yearly physicals and people who get yearly flu shots using the McNemar Test with an alpha of .05.

There is evidence to suggest that people are more willing to get a yearly flu shot than a yearly physical.