STAT 100: EXAM 3

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/102

There's no tags or description

Looks like no tags are added yet.

Last updated 1:02 AM on 5/1/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

103 Terms

New cards

Posterior Probability Distribution (ch. 19 - pg. 200)

special type of distribution that specifies how likely DIFFERENT potential values of a parameter are given the sampling strategy, data, and estimate that was collected

New cards

What answer does the Posterior Probability Distribution answer? (ch. 19 - pg. 200)

WOULD WE GET THE SAME ESTIMATE IF WE REPEATED THE STUDY?

New cards

Posterior Probability Distribution: Point Estimate

estimate of the population parameter, WITHOUT incorporating "ish"-nesss

New cards

Posterior Probability Distribution: Interval estimate

instead of using a single number as an estimate, we can use the ENTIRE MIDDLE 95% of the posterior distribution to provide an ENTIRE RANGE OF NUMBERS AS OUR ESTIMATE

New cards

Posterior Probability Distribution: Interval Estimate - VALUE IS INSIDE THE INTERVAL ESTIMATE MEANS WHAT?

values we think could PROBABILITY be the true value

New cards

Posterior Probability Distribution: Interval Estimate - VALUE IS OUTSIDE THE INTERVAL ESTIMATE MEANS WHAT?

values we think are PROBABLY NOT the true value

New cards

Posterior Probability Distribution: how to create an INTERVAL ESTIMATE using equations?

1. compute the point estimate from data in the sample

2. conduct a simulation to obtain the estimation error (standard error)

3. compute the margin of error (margin of error = 2 TIMES estimation error)

4. figure out the LOWER limit of the interval estimate (point estimate - margin of error = internval estimate lower limit)

5. figure out the UPPER limit of the interval estimate (point estimate + margin of error = interval estimate upper limit)

6. report the lower limit and upper limit together

New cards

Posterior Probability Distribution: interval estimate - equation for margin of errors

margin of error = 2 TIMES ESTIMATION ERROR

New cards

Posterior Probability Distribution: interval estimate - Equation for LOWER limit

point estimate - margin of error

New cards

Posterior Probability Distribution: interval estimate - equation for UPPER limit

point estimate + margin of error

New cards

Bias (Ch. 20 - pg. 210)

ANY systematic manner in which the data that has been collected has fundamental problems that will prevent the statistical methods from producing a correct estimate

New cards

Measurement Bias (Ch. 20 - pg. 210)

any bias in the data due to problems in the measurement process

New cards

Sampling Bias (ch. 20 - pg. 211)

any bias in the data due to problems in the representativeness of a sample

New cards

Causal Bias (Ch. 20 - pg. 212)

any bias in the data due to differences between the controls and the observations in the treatment group

New cards

Estimate's Precision (ch. 20 - pg. 213)

how much uncertainty we have, and therefore HOW MUCH ROOM for ERROR we give, when we make an estimate of a parameter

New cards

One common way stats and data scientists think about bias and precision IS ??? (ch. 20 - pg. 213)

comparing the process of estimation to the process of shooting an arrow at a target

New cards

What causes uncertainty in GENERALIZATION INFERENCES? (ch. 19 slides)

We only have data from SOME observations, but trying to describe an entire population of interest

New cards

What causes uncertainty in CAUSAL INFERENCES? (ch. 19 slides)

only controlled for SOME factors, but are trying to identify a single causal factor's ATE

New cards

What do we do about uncertainty? (ch. 19 slides)

QUANTIFY uncertainty by studying patterns in variability

New cards

IF a study has a ____ ____, we can conduct simulation to compute the estimation error

RANDOM COMPONENT

(even if you don't use a random component, you could still compute it, but there could be bias)

New cards

what are the TWO factors that affect the precision of an estimate?

1. how much information you have = sample size (more info, the better)

2. how different each of the observations are from each other = standard deviation

New cards

TRUE OR FALSE: The LARGER the SAMPLE SIZE, the BETTER the precision.

TRUE

New cards

TRUE OR FALSE: The LARGER the SAMPLE SIZE, the smaller the estimation error.

TRUE

New cards

TRUE OR FALSE: The LARGER the SD, the BETTER the precision

FALSE - The LARGER the SD --> the WORSE the precision

New cards

TRUE OR FALSE: the LARGER the SD, the SMALLER the estimation error

FALSE - the larger the SD, the LARGER the estimation error

New cards

Can think of "Precision" as ______ AND "Bias" as _____/_____

Precision: consistency

Bias: aim/accuracy

New cards

What are the 3 sources of bias? (reliability, internal validity, and external validity)

1. LOW reliability --> measurement bias

2. LOW internal validity --> causal bias

3. LOW external validity --> sample bias

New cards

STEPS FOR BIAS/PRECISION (Ch. 20)

1. Always think about the potential biases FIRST

2. quantify precision with estimation error, BUT DON'T forget to thinka bout MEANINFUL DIFFERENT (AD/RD and ES = STILL HELPFUL)

3. accept uncertainty (think of interval estimates in terms of a probability distribution)

New cards

What are the two general approaches to statistical testing? (ch. 21 - pg. 222)

1. Evidence-Based Testing

2. Hypothesis-Based Testing

New cards

Similarity between Evidence-based Testing and Hypothesis-Based Testing (ch. 21 - pg. 222)

incorporate "ish-ness" (ex: variability and uncertainty) by creating a probability distribution

New cards

KEY difference between Evidence-Based Testing and Hypothesis-Based Testing

how the probability distribution is created

New cards

Evidence-Based Testing + Hypothesis-Based Testing: BOTH involve what? (6) (ch. 21 - pg. 222)

1. Probability Distributions

2. One or more hypotheses

3. Collecting data

4. a sample statistic serving as an estimate of a parameter

5. estimation error

6. directly comparing hypotheses and evidence to each other

New cards

Evidence-Based Testing (ch. 21 - pg. 222)

1. Collect Data

2. Build a likelihood function (AKA posterior probability)

3. Compare a hypothesis to the likelihood function

DATA-FIRST APPROACH

New cards

Hypothesis-Based Testing (ch. 21 - pg. 22)

1. Specify a hypothesis

2. Build a null model

3. Compare evidence to the null model

HYPOTHESIS-FIRST approach

New cards

Null Model (ch. 21 - pg. 224)

probability distribution based on a hypothesis within the context of a statistical test

New cards

What does the NULL MODEL represent? (ch. 21 - pg. 224)

represents YOUR EXPECTATIONS for what the data will be

New cards

Null Model: What do we need to FIRST consider? (ch. 21 - pg. 224)

FIRST the consider the hypothesized value of the parameter

New cards

IF the HYPOTHESIS is CORRECT, the P-VALUE is _________. (ch. 21 - pg. 225)

probability of seeing something even further from what was expected than the observed data

New cards

IF P-Value is VERY LARGE --> Data is _______ with what we expected.

data is CONSISTENT with what we expected

New cards

If P-value is VERY SMALL --> Data is _____ with what we expected.

data is NOT CONSISTENT with what we expected

New cards

IT IS important to remember to think about comparing evidence and hypotheses NOT JUST in terms of "are they consistent, yes or no?", BUT RATHER in terms of a scale with ______ ___ ________.

DEGREES OF CONSISTENCY

New cards

When evaluating potential bias present in a study, make sure to FIRST identify WHAT? (CH. 20 - CA tips)

what kind of CLAIM is the study is making

New cards

What do statistical tests compare?

Comparison between some hypotheses AND some evidence that we collected

New cards

When thinking statistically... what is the two-part process?

1. Evaluate the validity of the data in terms of any biases

2. Make inferences based on data (IF IT'S GOOD) and the incorporation of uncertainty

New cards

statistical model (ch. 22)

equations that describe the relationship between attributes

- explicitly incorporate some notion of variability into the model

New cards

Statistical Models: usually we focus on a single attribute --> this term is called _____ or ______

response variable OR the outcome variable

New cards

goal of statistical model (ch. 22)

try to find patterns and relationships between the outcome variabe and other attributes

New cards

what is the FIRST STEP when studying patterns in the outcome variable? (ch. 22 pg. 238)

compute the summary statistics and examine the distribution

New cards

MODEL based on a sample's mean can be written as... (ch. 22 pg. 239)

RESPONSE/OUTCOME VARIABLE = MEAN VALUE + ERROR

New cards

Statistical Model: Explanatory variables

attributes that can affect the response varable

New cards

FIRST step when exploring the relationship between an EXPLANATORY VARIABLE AND A RESPONSE VARIABLE

compute the summary statistics and side-by-side plots

New cards

Statisical Model: Indicators

CATEGORICAL MEASURES --- placeholder in a model that indicates what to do for observations that belong to a specific group or category, relative to a reference group

New cards

Model based on a sample's mean PLUS an indicator for a CATEGORICAL explanatory variable can be written as ?????

RESPONSE VARIABLE = MEAN VALUE + (BONUS x INDICATOR) + ERROR

New cards

statistical models provide us with ______ and _____?

1. estimated ATE

2. estimation error

New cards

when an EXPLANATORY VARIABLE is a QUANT attribute, instead of just ONE bonus, we can think about ????

HOW MANY bonuses an observation gets based on the value of QUANT EXPLANATORY VARIABLE

New cards

we ALWAYS map the RESPONSE VARIABLE to the ________ AXIS

VERTICAL

New cards

LINEAR Models

multiplicative "bonus" for QUANT attribute forms a line

New cards

Linear Models: the SLOPE COEFFICIENT can be interpreted as??

ATE for QUANT explanatory variables' effect on the response variable

New cards

statistical models ALLOW US to (3)

1. predict

2. explain

3. control

New cards

Main effects

effects on a response variable that a single explanatory variable has

New cards

Interaction Plot

shows the means of each group, often with the interval estimate for each mean also graphically depicted, and allows us to visualize each main effect

New cards

Interaction Effects and Interaction Plots: What indicates that the ONLY effects on the response variable are MAIN EFFECTS

factors in a model are NOT moderated by any other factor OR no interaction effect --> THE LINES IN AN INTERACTION PLOT HAVE SIMILAR SLOPES

New cards

Interaction Effects

effects that change based on a moderating

SLOPES ARE DIFFERENT

New cards

If there is NO interaction effect, what would be the estimate of the interaction term? AND what does it mean?

around 0 --> main causal factor's effect on the outcome IS NOT MODERATED BY THE SECOND FACTOR

New cards

How can you check that the interaction effect is 0?

check by creating an interval estimate for the interaction effect and seeing if 0 is inside of the itnerval

New cards

When our main causal factor is a QUANT attribute, what type of graph do we create?

scatter plot

New cards

When an indicator interacts with a QUANT variable, the SLOPE for the relationship between the QUANT variable and the response variable will be ????

different in each group

New cards

FOR indicators, main effects can be interpreted as what?

estimates of group difference

New cards

How can you determine if there is an interaction?

1. fit an interaction model

2. create an interval estimate for the interaction effect (INTERACTION effect = difference in the ATES)

3. Check whether 0 is a likely value for the interaction effect

New cards

What is the purpose of using regression?

figure out the best estimate for the coefficients

New cards

definition of regression

method that is based on keeping the distance from all points to the line as SMALL as possible

New cards

prediction error

The difference between the actual value for the response variable for an observation and the predicted value of the response variable for an observation

New cards

Scatter plot:

1. Actual value of the RESPONSE variable for an observation will be its position relative to the ____ ____

2. observation's predictor value is the _______ __ __ ______

1. vertical axis

2. position of the model

New cards

The AVERAGE prediction error for a GOOD model should be equal to __ and the PATTERN should generally make the SHAPE of a _____ ____

1. equal to 0

2. shape of a NORMAL CURVE

New cards

residual standard error

describes the TYPICAL prediction error

- quantify the uncertainty in our predictions

New cards

the BEST linear model for the data is the model with the _______ residual standard error (RSE)

SMALLEST

New cards

what does the residual standard error (RSE) help us create?

prediction interval

New cards

prediction interval

estimate of the response variable value will be and the explanatory variable

- allows to incorporate "ish"-ness into our model-based predictions

New cards

equation for prediction interval margin of error

2 TIMES residual standard error

New cards

equation for prediction interval

predicted value ADD/SUBTRACT prediction interval margin of error

New cards

prediction intervals will be ______ than interval estimate for parameters

WIDER

New cards

equation for prediction error

ACTUAL VALUE - PREDICTED VALUE

New cards

Prediction intervals are for ______ observations.

SINGLE OBSERVATIONS

New cards

Model Error/Prediction Intervals: THE (Y) value of the LINE represents the _______ value of the RESPONSE variable for an observation with some known (x) value

PREDICTED VALUE

New cards

how do we quantify different levels of surprise?

by computing a standardized difference in the form of a z-score

New cards

2 equations to compute the amount of prediction error for a certain observation (SINGLE OBSERVATION)

1. actual value = predicted value + predicted error

2. actual value - predicted value = prediction error

New cards

z-score

a measure of how many standard deviations you are away from the norm (average or mean)

New cards

prediction z score equation

prediction z-score = (prediction error)/RSE

prediction error = actual value - predicted value

New cards

PREDICTION Z-SCORES = BELOW -2 MEANING

The actual value is SURPRISINGLY LOWER THAN PREDICTED

New cards

z-scores = above +2

The actual value is SURPRISINGLY higher than predicted

New cards

Z-scores: any time we're more than ______ off our normal amount, we consider the actual value ______

1. TWICE

2. Surprising

New cards

Equation of Estimate Z-Score

(estimate - hypothesized value) / estimation error

New cards

P-Value Indicators: ABOVE 0.1

CONSISTENT with the null model (data and hypothesis)

New cards

P-Value Indicators: BELOW 0.02

INCONSISTENT with null model (data and hypothesis)

New cards

P-Value Indicators: BETWEEN 0.02-0.1

“GREY AREA” - Study result is inconclusive

New cards

IF 0 DOES NOT lie in the interval estimate within the INTERACTION EFFECT, does that mean there IS or IS NOT a moderator?

THERE IS A MODERATOR

New cards

IF 0 DOES lie in the interval estimate within the INTERACTION EFFECT, does that mean there IS or IS NOT a moderator?

THERE IS NOT A MODERATOR

New cards

Model R-Squared

the PERCENTAGE of variability in the response variable that the model helps to explain

New cards

we interpret model R-squared in terms of a model’s _______

utility

100

New cards

Useful models have an r-squared ABOVE WHAT

0.3 — BUT we want it to have 1.0 (perfect utility)