1/102
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Posterior Probability Distribution (ch. 19 - pg. 200)
special type of distribution that specifies how likely DIFFERENT potential values of a parameter are given the sampling strategy, data, and estimate that was collected
What answer does the Posterior Probability Distribution answer? (ch. 19 - pg. 200)
WOULD WE GET THE SAME ESTIMATE IF WE REPEATED THE STUDY?
Posterior Probability Distribution: Point Estimate
estimate of the population parameter, WITHOUT incorporating "ish"-nesss
Posterior Probability Distribution: Interval estimate
instead of using a single number as an estimate, we can use the ENTIRE MIDDLE 95% of the posterior distribution to provide an ENTIRE RANGE OF NUMBERS AS OUR ESTIMATE
Posterior Probability Distribution: Interval Estimate - VALUE IS INSIDE THE INTERVAL ESTIMATE MEANS WHAT?
values we think could PROBABILITY be the true value
Posterior Probability Distribution: Interval Estimate - VALUE IS OUTSIDE THE INTERVAL ESTIMATE MEANS WHAT?
values we think are PROBABLY NOT the true value
Posterior Probability Distribution: how to create an INTERVAL ESTIMATE using equations?
1. compute the point estimate from data in the sample
2. conduct a simulation to obtain the estimation error (standard error)
3. compute the margin of error (margin of error = 2 TIMES estimation error)
4. figure out the LOWER limit of the interval estimate (point estimate - margin of error = internval estimate lower limit)
5. figure out the UPPER limit of the interval estimate (point estimate + margin of error = interval estimate upper limit)
6. report the lower limit and upper limit together
Posterior Probability Distribution: interval estimate - equation for margin of errors
margin of error = 2 TIMES ESTIMATION ERROR
Posterior Probability Distribution: interval estimate - Equation for LOWER limit
point estimate - margin of error
Posterior Probability Distribution: interval estimate - equation for UPPER limit
point estimate + margin of error
Bias (Ch. 20 - pg. 210)
ANY systematic manner in which the data that has been collected has fundamental problems that will prevent the statistical methods from producing a correct estimate
Measurement Bias (Ch. 20 - pg. 210)
any bias in the data due to problems in the measurement process
Sampling Bias (ch. 20 - pg. 211)
any bias in the data due to problems in the representativeness of a sample
Causal Bias (Ch. 20 - pg. 212)
any bias in the data due to differences between the controls and the observations in the treatment group
Estimate's Precision (ch. 20 - pg. 213)
how much uncertainty we have, and therefore HOW MUCH ROOM for ERROR we give, when we make an estimate of a parameter
One common way stats and data scientists think about bias and precision IS ??? (ch. 20 - pg. 213)
comparing the process of estimation to the process of shooting an arrow at a target
What causes uncertainty in GENERALIZATION INFERENCES? (ch. 19 slides)
We only have data from SOME observations, but trying to describe an entire population of interest
What causes uncertainty in CAUSAL INFERENCES? (ch. 19 slides)
only controlled for SOME factors, but are trying to identify a single causal factor's ATE
What do we do about uncertainty? (ch. 19 slides)
QUANTIFY uncertainty by studying patterns in variability
IF a study has a ____ ____, we can conduct simulation to compute the estimation error
RANDOM COMPONENT
(even if you don't use a random component, you could still compute it, but there could be bias)
what are the TWO factors that affect the precision of an estimate?
1. how much information you have = sample size (more info, the better)
2. how different each of the observations are from each other = standard deviation
TRUE OR FALSE: The LARGER the SAMPLE SIZE, the BETTER the precision.
TRUE
TRUE OR FALSE: The LARGER the SAMPLE SIZE, the smaller the estimation error.
TRUE
TRUE OR FALSE: The LARGER the SD, the BETTER the precision
FALSE - The LARGER the SD --> the WORSE the precision
TRUE OR FALSE: the LARGER the SD, the SMALLER the estimation error
FALSE - the larger the SD, the LARGER the estimation error
Can think of "Precision" as ______ AND "Bias" as _____/_____
Precision: consistency
Bias: aim/accuracy
What are the 3 sources of bias? (reliability, internal validity, and external validity)
1. LOW reliability --> measurement bias
2. LOW internal validity --> causal bias
3. LOW external validity --> sample bias
STEPS FOR BIAS/PRECISION (Ch. 20)
1. Always think about the potential biases FIRST
2. quantify precision with estimation error, BUT DON'T forget to thinka bout MEANINFUL DIFFERENT (AD/RD and ES = STILL HELPFUL)
3. accept uncertainty (think of interval estimates in terms of a probability distribution)
What are the two general approaches to statistical testing? (ch. 21 - pg. 222)
1. Evidence-Based Testing
2. Hypothesis-Based Testing
Similarity between Evidence-based Testing and Hypothesis-Based Testing (ch. 21 - pg. 222)
incorporate "ish-ness" (ex: variability and uncertainty) by creating a probability distribution
KEY difference between Evidence-Based Testing and Hypothesis-Based Testing
how the probability distribution is created
Evidence-Based Testing + Hypothesis-Based Testing: BOTH involve what? (6) (ch. 21 - pg. 222)
1. Probability Distributions
2. One or more hypotheses
3. Collecting data
4. a sample statistic serving as an estimate of a parameter
5. estimation error
6. directly comparing hypotheses and evidence to each other
Evidence-Based Testing (ch. 21 - pg. 222)
1. Collect Data
2. Build a likelihood function (AKA posterior probability)
3. Compare a hypothesis to the likelihood function
DATA-FIRST APPROACH
Hypothesis-Based Testing (ch. 21 - pg. 22)
1. Specify a hypothesis
2. Build a null model
3. Compare evidence to the null model
HYPOTHESIS-FIRST approach
Null Model (ch. 21 - pg. 224)
probability distribution based on a hypothesis within the context of a statistical test
What does the NULL MODEL represent? (ch. 21 - pg. 224)
represents YOUR EXPECTATIONS for what the data will be
Null Model: What do we need to FIRST consider? (ch. 21 - pg. 224)
FIRST the consider the hypothesized value of the parameter
IF the HYPOTHESIS is CORRECT, the P-VALUE is _________. (ch. 21 - pg. 225)
probability of seeing something even further from what was expected than the observed data
IF P-Value is VERY LARGE --> Data is _______ with what we expected.
data is CONSISTENT with what we expected
If P-value is VERY SMALL --> Data is _____ with what we expected.
data is NOT CONSISTENT with what we expected
IT IS important to remember to think about comparing evidence and hypotheses NOT JUST in terms of "are they consistent, yes or no?", BUT RATHER in terms of a scale with ______ ___ ________.
DEGREES OF CONSISTENCY
When evaluating potential bias present in a study, make sure to FIRST identify WHAT? (CH. 20 - CA tips)
what kind of CLAIM is the study is making
What do statistical tests compare?
Comparison between some hypotheses AND some evidence that we collected
When thinking statistically... what is the two-part process?
1. Evaluate the validity of the data in terms of any biases
2. Make inferences based on data (IF IT'S GOOD) and the incorporation of uncertainty
statistical model (ch. 22)
equations that describe the relationship between attributes
- explicitly incorporate some notion of variability into the model
Statistical Models: usually we focus on a single attribute --> this term is called _____ or ______
response variable OR the outcome variable
goal of statistical model (ch. 22)
try to find patterns and relationships between the outcome variabe and other attributes
what is the FIRST STEP when studying patterns in the outcome variable? (ch. 22 pg. 238)
compute the summary statistics and examine the distribution
MODEL based on a sample's mean can be written as... (ch. 22 pg. 239)
RESPONSE/OUTCOME VARIABLE = MEAN VALUE + ERROR
Statistical Model: Explanatory variables
attributes that can affect the response varable
FIRST step when exploring the relationship between an EXPLANATORY VARIABLE AND A RESPONSE VARIABLE
compute the summary statistics and side-by-side plots
Statisical Model: Indicators
CATEGORICAL MEASURES --- placeholder in a model that indicates what to do for observations that belong to a specific group or category, relative to a reference group
Model based on a sample's mean PLUS an indicator for a CATEGORICAL explanatory variable can be written as ?????
RESPONSE VARIABLE = MEAN VALUE + (BONUS x INDICATOR) + ERROR
statistical models provide us with ______ and _____?
1. estimated ATE
2. estimation error
when an EXPLANATORY VARIABLE is a QUANT attribute, instead of just ONE bonus, we can think about ????
HOW MANY bonuses an observation gets based on the value of QUANT EXPLANATORY VARIABLE
we ALWAYS map the RESPONSE VARIABLE to the ________ AXIS
VERTICAL
LINEAR Models
multiplicative "bonus" for QUANT attribute forms a line
Linear Models: the SLOPE COEFFICIENT can be interpreted as??
ATE for QUANT explanatory variables' effect on the response variable
statistical models ALLOW US to (3)
1. predict
2. explain
3. control
Main effects
effects on a response variable that a single explanatory variable has
Interaction Plot
shows the means of each group, often with the interval estimate for each mean also graphically depicted, and allows us to visualize each main effect
Interaction Effects and Interaction Plots: What indicates that the ONLY effects on the response variable are MAIN EFFECTS
factors in a model are NOT moderated by any other factor OR no interaction effect --> THE LINES IN AN INTERACTION PLOT HAVE SIMILAR SLOPES
Interaction Effects
effects that change based on a moderating
SLOPES ARE DIFFERENT
If there is NO interaction effect, what would be the estimate of the interaction term? AND what does it mean?
around 0 --> main causal factor's effect on the outcome IS NOT MODERATED BY THE SECOND FACTOR
How can you check that the interaction effect is 0?
check by creating an interval estimate for the interaction effect and seeing if 0 is inside of the itnerval
When our main causal factor is a QUANT attribute, what type of graph do we create?
scatter plot
When an indicator interacts with a QUANT variable, the SLOPE for the relationship between the QUANT variable and the response variable will be ????
different in each group
FOR indicators, main effects can be interpreted as what?
estimates of group difference
How can you determine if there is an interaction?
1. fit an interaction model
2. create an interval estimate for the interaction effect (INTERACTION effect = difference in the ATES)
3. Check whether 0 is a likely value for the interaction effect
What is the purpose of using regression?
figure out the best estimate for the coefficients
definition of regression
method that is based on keeping the distance from all points to the line as SMALL as possible
prediction error
The difference between the actual value for the response variable for an observation and the predicted value of the response variable for an observation
Scatter plot:
1. Actual value of the RESPONSE variable for an observation will be its position relative to the ____ ____
2. observation's predictor value is the _______ __ __ ______
1. vertical axis
2. position of the model
The AVERAGE prediction error for a GOOD model should be equal to __ and the PATTERN should generally make the SHAPE of a _____ ____
1. equal to 0
2. shape of a NORMAL CURVE
residual standard error
describes the TYPICAL prediction error
- quantify the uncertainty in our predictions
the BEST linear model for the data is the model with the _______ residual standard error (RSE)
SMALLEST
what does the residual standard error (RSE) help us create?
prediction interval
prediction interval
estimate of the response variable value will be and the explanatory variable
- allows to incorporate "ish"-ness into our model-based predictions
equation for prediction interval margin of error
2 TIMES residual standard error
equation for prediction interval
predicted value ADD/SUBTRACT prediction interval margin of error
prediction intervals will be ______ than interval estimate for parameters
WIDER
equation for prediction error
ACTUAL VALUE - PREDICTED VALUE
Prediction intervals are for ______ observations.
SINGLE OBSERVATIONS
Model Error/Prediction Intervals: THE (Y) value of the LINE represents the _______ value of the RESPONSE variable for an observation with some known (x) value
PREDICTED VALUE
how do we quantify different levels of surprise?
by computing a standardized difference in the form of a z-score
2 equations to compute the amount of prediction error for a certain observation (SINGLE OBSERVATION)
1. actual value = predicted value + predicted error
2. actual value - predicted value = prediction error
z-score
a measure of how many standard deviations you are away from the norm (average or mean)
prediction z score equation
prediction z-score = (prediction error)/RSE
prediction error = actual value - predicted value
PREDICTION Z-SCORES = BELOW -2 MEANING
The actual value is SURPRISINGLY LOWER THAN PREDICTED
z-scores = above +2
The actual value is SURPRISINGLY higher than predicted
Z-scores: any time we're more than ______ off our normal amount, we consider the actual value ______
1. TWICE
2. Surprising
Equation of Estimate Z-Score
(estimate - hypothesized value) / estimation error
P-Value Indicators: ABOVE 0.1
CONSISTENT with the null model (data and hypothesis)
P-Value Indicators: BELOW 0.02
INCONSISTENT with null model (data and hypothesis)
P-Value Indicators: BETWEEN 0.02-0.1
“GREY AREA” - Study result is inconclusive
IF 0 DOES NOT lie in the interval estimate within the INTERACTION EFFECT, does that mean there IS or IS NOT a moderator?
THERE IS A MODERATOR
IF 0 DOES lie in the interval estimate within the INTERACTION EFFECT, does that mean there IS or IS NOT a moderator?
THERE IS NOT A MODERATOR
Model R-Squared
the PERCENTAGE of variability in the response variable that the model helps to explain
we interpret model R-squared in terms of a model’s _______
utility
Useful models have an r-squared ABOVE WHAT
0.3 — BUT we want it to have 1.0 (perfect utility)