STAT 100: EXAM 3

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/102

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 1:02 AM on 5/1/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

103 Terms

1
New cards

Posterior Probability Distribution (ch. 19 - pg. 200)

special type of distribution that specifies how likely DIFFERENT potential values of a parameter are given the sampling strategy, data, and estimate that was collected

2
New cards

What answer does the Posterior Probability Distribution answer? (ch. 19 - pg. 200)

WOULD WE GET THE SAME ESTIMATE IF WE REPEATED THE STUDY?

3
New cards

Posterior Probability Distribution: Point Estimate

estimate of the population parameter, WITHOUT incorporating "ish"-nesss

4
New cards

Posterior Probability Distribution: Interval estimate

instead of using a single number as an estimate, we can use the ENTIRE MIDDLE 95% of the posterior distribution to provide an ENTIRE RANGE OF NUMBERS AS OUR ESTIMATE

5
New cards

Posterior Probability Distribution: Interval Estimate - VALUE IS INSIDE THE INTERVAL ESTIMATE MEANS WHAT?

values we think could PROBABILITY be the true value

6
New cards

Posterior Probability Distribution: Interval Estimate - VALUE IS OUTSIDE THE INTERVAL ESTIMATE MEANS WHAT?

values we think are PROBABLY NOT the true value

7
New cards

Posterior Probability Distribution: how to create an INTERVAL ESTIMATE using equations?

1. compute the point estimate from data in the sample

2. conduct a simulation to obtain the estimation error (standard error)

3. compute the margin of error (margin of error = 2 TIMES estimation error)

4. figure out the LOWER limit of the interval estimate (point estimate - margin of error = internval estimate lower limit)

5. figure out the UPPER limit of the interval estimate (point estimate + margin of error = interval estimate upper limit)

6. report the lower limit and upper limit together

8
New cards

Posterior Probability Distribution: interval estimate - equation for margin of errors

margin of error = 2 TIMES ESTIMATION ERROR

9
New cards

Posterior Probability Distribution: interval estimate - Equation for LOWER limit

point estimate - margin of error

10
New cards

Posterior Probability Distribution: interval estimate - equation for UPPER limit

point estimate + margin of error

11
New cards

Bias (Ch. 20 - pg. 210)

ANY systematic manner in which the data that has been collected has fundamental problems that will prevent the statistical methods from producing a correct estimate

12
New cards

Measurement Bias (Ch. 20 - pg. 210)

any bias in the data due to problems in the measurement process

13
New cards

Sampling Bias (ch. 20 - pg. 211)

any bias in the data due to problems in the representativeness of a sample

14
New cards

Causal Bias (Ch. 20 - pg. 212)

any bias in the data due to differences between the controls and the observations in the treatment group

15
New cards

Estimate's Precision (ch. 20 - pg. 213)

how much uncertainty we have, and therefore HOW MUCH ROOM for ERROR we give, when we make an estimate of a parameter

16
New cards

One common way stats and data scientists think about bias and precision IS ??? (ch. 20 - pg. 213)

comparing the process of estimation to the process of shooting an arrow at a target

17
New cards

What causes uncertainty in GENERALIZATION INFERENCES? (ch. 19 slides)

We only have data from SOME observations, but trying to describe an entire population of interest

18
New cards

What causes uncertainty in CAUSAL INFERENCES? (ch. 19 slides)

only controlled for SOME factors, but are trying to identify a single causal factor's ATE

19
New cards

What do we do about uncertainty? (ch. 19 slides)

QUANTIFY uncertainty by studying patterns in variability

20
New cards

IF a study has a ____ ____, we can conduct simulation to compute the estimation error

RANDOM COMPONENT

(even if you don't use a random component, you could still compute it, but there could be bias)

21
New cards

what are the TWO factors that affect the precision of an estimate?

1. how much information you have = sample size (more info, the better)

2. how different each of the observations are from each other = standard deviation

22
New cards

TRUE OR FALSE: The LARGER the SAMPLE SIZE, the BETTER the precision.

TRUE

23
New cards

TRUE OR FALSE: The LARGER the SAMPLE SIZE, the smaller the estimation error.

TRUE

24
New cards

TRUE OR FALSE: The LARGER the SD, the BETTER the precision

FALSE - The LARGER the SD --> the WORSE the precision

25
New cards

TRUE OR FALSE: the LARGER the SD, the SMALLER the estimation error

FALSE - the larger the SD, the LARGER the estimation error

26
New cards

Can think of "Precision" as ______ AND "Bias" as _____/_____

Precision: consistency

Bias: aim/accuracy

27
New cards

What are the 3 sources of bias? (reliability, internal validity, and external validity)

1. LOW reliability --> measurement bias

2. LOW internal validity --> causal bias

3. LOW external validity --> sample bias

28
New cards

STEPS FOR BIAS/PRECISION (Ch. 20)

1. Always think about the potential biases FIRST

2. quantify precision with estimation error, BUT DON'T forget to thinka bout MEANINFUL DIFFERENT (AD/RD and ES = STILL HELPFUL)

3. accept uncertainty (think of interval estimates in terms of a probability distribution)

29
New cards

What are the two general approaches to statistical testing? (ch. 21 - pg. 222)

1. Evidence-Based Testing

2. Hypothesis-Based Testing

30
New cards

Similarity between Evidence-based Testing and Hypothesis-Based Testing (ch. 21 - pg. 222)

incorporate "ish-ness" (ex: variability and uncertainty) by creating a probability distribution

31
New cards

KEY difference between Evidence-Based Testing and Hypothesis-Based Testing

how the probability distribution is created

32
New cards

Evidence-Based Testing + Hypothesis-Based Testing: BOTH involve what? (6) (ch. 21 - pg. 222)

1. Probability Distributions

2. One or more hypotheses

3. Collecting data

4. a sample statistic serving as an estimate of a parameter

5. estimation error

6. directly comparing hypotheses and evidence to each other

33
New cards

Evidence-Based Testing (ch. 21 - pg. 222)

1. Collect Data

2. Build a likelihood function (AKA posterior probability)

3. Compare a hypothesis to the likelihood function

DATA-FIRST APPROACH

34
New cards

Hypothesis-Based Testing (ch. 21 - pg. 22)

1. Specify a hypothesis

2. Build a null model

3. Compare evidence to the null model

HYPOTHESIS-FIRST approach

35
New cards

Null Model (ch. 21 - pg. 224)

probability distribution based on a hypothesis within the context of a statistical test

36
New cards

What does the NULL MODEL represent? (ch. 21 - pg. 224)

represents YOUR EXPECTATIONS for what the data will be

37
New cards

Null Model: What do we need to FIRST consider? (ch. 21 - pg. 224)

FIRST the consider the hypothesized value of the parameter

38
New cards

IF the HYPOTHESIS is CORRECT, the P-VALUE is _________. (ch. 21 - pg. 225)

probability of seeing something even further from what was expected than the observed data

39
New cards

IF P-Value is VERY LARGE --> Data is _______ with what we expected.

data is CONSISTENT with what we expected

40
New cards

If P-value is VERY SMALL --> Data is _____ with what we expected.

data is NOT CONSISTENT with what we expected

41
New cards

IT IS important to remember to think about comparing evidence and hypotheses NOT JUST in terms of "are they consistent, yes or no?", BUT RATHER in terms of a scale with ______ ___ ________.

DEGREES OF CONSISTENCY

42
New cards

When evaluating potential bias present in a study, make sure to FIRST identify WHAT? (CH. 20 - CA tips)

what kind of CLAIM is the study is making

43
New cards

What do statistical tests compare?

Comparison between some hypotheses AND some evidence that we collected

44
New cards

When thinking statistically... what is the two-part process?

1. Evaluate the validity of the data in terms of any biases

2. Make inferences based on data (IF IT'S GOOD) and the incorporation of uncertainty

45
New cards

statistical model (ch. 22)

equations that describe the relationship between attributes

- explicitly incorporate some notion of variability into the model

46
New cards

Statistical Models: usually we focus on a single attribute --> this term is called _____ or ______

response variable OR the outcome variable

47
New cards

goal of statistical model (ch. 22)

try to find patterns and relationships between the outcome variabe and other attributes

48
New cards

what is the FIRST STEP when studying patterns in the outcome variable? (ch. 22 pg. 238)

compute the summary statistics and examine the distribution

49
New cards

MODEL based on a sample's mean can be written as... (ch. 22 pg. 239)

RESPONSE/OUTCOME VARIABLE = MEAN VALUE + ERROR

50
New cards

Statistical Model: Explanatory variables

attributes that can affect the response varable

51
New cards

FIRST step when exploring the relationship between an EXPLANATORY VARIABLE AND A RESPONSE VARIABLE

compute the summary statistics and side-by-side plots

52
New cards

Statisical Model: Indicators

CATEGORICAL MEASURES --- placeholder in a model that indicates what to do for observations that belong to a specific group or category, relative to a reference group

53
New cards

Model based on a sample's mean PLUS an indicator for a CATEGORICAL explanatory variable can be written as ?????

RESPONSE VARIABLE = MEAN VALUE + (BONUS x INDICATOR) + ERROR

54
New cards

statistical models provide us with ______ and _____?

1. estimated ATE

2. estimation error

55
New cards

when an EXPLANATORY VARIABLE is a QUANT attribute, instead of just ONE bonus, we can think about ????

HOW MANY bonuses an observation gets based on the value of QUANT EXPLANATORY VARIABLE

56
New cards

we ALWAYS map the RESPONSE VARIABLE to the ________ AXIS

VERTICAL

57
New cards

LINEAR Models

multiplicative "bonus" for QUANT attribute forms a line

58
New cards

Linear Models: the SLOPE COEFFICIENT can be interpreted as??

ATE for QUANT explanatory variables' effect on the response variable

59
New cards

statistical models ALLOW US to (3)

1. predict

2. explain

3. control

60
New cards

Main effects

effects on a response variable that a single explanatory variable has

61
New cards

Interaction Plot

shows the means of each group, often with the interval estimate for each mean also graphically depicted, and allows us to visualize each main effect

62
New cards

Interaction Effects and Interaction Plots: What indicates that the ONLY effects on the response variable are MAIN EFFECTS

factors in a model are NOT moderated by any other factor OR no interaction effect --> THE LINES IN AN INTERACTION PLOT HAVE SIMILAR SLOPES

63
New cards

Interaction Effects

effects that change based on a moderating

SLOPES ARE DIFFERENT

64
New cards

If there is NO interaction effect, what would be the estimate of the interaction term? AND what does it mean?

around 0 --> main causal factor's effect on the outcome IS NOT MODERATED BY THE SECOND FACTOR

65
New cards

How can you check that the interaction effect is 0?

check by creating an interval estimate for the interaction effect and seeing if 0 is inside of the itnerval

66
New cards

When our main causal factor is a QUANT attribute, what type of graph do we create?

scatter plot

67
New cards

When an indicator interacts with a QUANT variable, the SLOPE for the relationship between the QUANT variable and the response variable will be ????

different in each group

68
New cards

FOR indicators, main effects can be interpreted as what?

estimates of group difference

69
New cards

How can you determine if there is an interaction?

1. fit an interaction model

2. create an interval estimate for the interaction effect (INTERACTION effect = difference in the ATES)

3. Check whether 0 is a likely value for the interaction effect

70
New cards

What is the purpose of using regression?

figure out the best estimate for the coefficients

71
New cards

definition of regression

method that is based on keeping the distance from all points to the line as SMALL as possible

72
New cards

prediction error

The difference between the actual value for the response variable for an observation and the predicted value of the response variable for an observation

73
New cards

Scatter plot:

1. Actual value of the RESPONSE variable for an observation will be its position relative to the ____ ____

2. observation's predictor value is the _______ __ __ ______

1. vertical axis

2. position of the model

74
New cards

The AVERAGE prediction error for a GOOD model should be equal to __ and the PATTERN should generally make the SHAPE of a _____ ____

1. equal to 0

2. shape of a NORMAL CURVE

75
New cards

residual standard error

describes the TYPICAL prediction error

- quantify the uncertainty in our predictions

76
New cards

the BEST linear model for the data is the model with the _______ residual standard error (RSE)

SMALLEST

77
New cards

what does the residual standard error (RSE) help us create?

prediction interval

78
New cards

prediction interval

estimate of the response variable value will be and the explanatory variable

- allows to incorporate "ish"-ness into our model-based predictions

79
New cards

equation for prediction interval margin of error

2 TIMES residual standard error

80
New cards

equation for prediction interval

predicted value ADD/SUBTRACT prediction interval margin of error

81
New cards

prediction intervals will be ______ than interval estimate for parameters

WIDER

82
New cards

equation for prediction error

ACTUAL VALUE - PREDICTED VALUE

83
New cards

Prediction intervals are for ______ observations.

SINGLE OBSERVATIONS

84
New cards

Model Error/Prediction Intervals: THE (Y) value of the LINE represents the _______ value of the RESPONSE variable for an observation with some known (x) value

PREDICTED VALUE

85
New cards

how do we quantify different levels of surprise?

by computing a standardized difference in the form of a z-score

86
New cards

2 equations to compute the amount of prediction error for a certain observation (SINGLE OBSERVATION)

1. actual value = predicted value + predicted error

2. actual value - predicted value = prediction error

87
New cards

z-score

a measure of how many standard deviations you are away from the norm (average or mean)

88
New cards

prediction z score equation

prediction z-score = (prediction error)/RSE

prediction error = actual value - predicted value

89
New cards

PREDICTION Z-SCORES = BELOW -2 MEANING

The actual value is SURPRISINGLY LOWER THAN PREDICTED

90
New cards

z-scores = above +2

The actual value is SURPRISINGLY higher than predicted

91
New cards

Z-scores: any time we're more than ______ off our normal amount, we consider the actual value ______

1. TWICE

2. Surprising

92
New cards

Equation of Estimate Z-Score

(estimate - hypothesized value) / estimation error

93
New cards

P-Value Indicators: ABOVE 0.1

CONSISTENT with the null model (data and hypothesis)

94
New cards

P-Value Indicators: BELOW 0.02

INCONSISTENT with null model (data and hypothesis)

95
New cards

P-Value Indicators: BETWEEN 0.02-0.1

“GREY AREA” - Study result is inconclusive

96
New cards

IF 0 DOES NOT lie in the interval estimate within the INTERACTION EFFECT, does that mean there IS or IS NOT a moderator?

THERE IS A MODERATOR

97
New cards

IF 0 DOES lie in the interval estimate within the INTERACTION EFFECT, does that mean there IS or IS NOT a moderator?

THERE IS NOT A MODERATOR

98
New cards

Model R-Squared

the PERCENTAGE of variability in the response variable that the model helps to explain

99
New cards

we interpret model R-squared in terms of a model’s _______

utility

100
New cards

Useful models have an r-squared ABOVE WHAT

0.3 — BUT we want it to have 1.0 (perfect utility)