1/26
27 question-and-answer flashcards covering correlation, regression, odds, risk measures, probability rules, expected value, sampling distributions, and Simpson’s Paradox.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What does correlation measure and what can it NOT establish?
The strength and direction of a linear relationship between two variables; it cannot establish causation.
How can outliers influence the correlation coefficient?
They can substantially inflate or deflate the coefficient, misleading the perceived strength or direction of the relationship.
Why should correlation not be used for non-linear relationships?
Because correlation only captures linear associations; a strong non-linear pattern can produce a low correlation value.
What is extrapolation in regression and why is it risky?
Using a regression line to predict values outside the observed data range; the linear relationship may not hold there.
Write the simple linear regression equation and label its components.
y = a + b x, where a is the y-intercept and b is the slope.
How is the slope (b) of a regression line interpreted?
It represents the predicted change in y for each one-unit increase in x.
What does the y-intercept (a) represent in a regression model?
The predicted value of y when x = 0 (assuming x = 0 is meaningful for the data).
On a scatterplot, which axis normally shows the explanatory (predictor) variable?
The horizontal x-axis.
What does a positive slope indicate about two variables?
As the explanatory variable increases, the response variable also increases (positive association).
What does a negative slope indicate?
As the explanatory variable increases, the response variable decreases (negative association).
Using the equation y = 2 + 3x, what is the predicted y when x = 4?
y = 2 + 3(4) = 14.
How are the odds of an event calculated from its probability p?
Odds = p / (1 − p).
How is an odds ratio computed?
Divide the odds of the event in one group by the odds in another group.
Define relative risk and give its formula.
Relative risk is the ratio of risk in the exposed group to the risk in the baseline group: RR = riskexposed / riskbaseline.
What is “increased risk” in percentage terms?
(Riskexposed − Riskbaseline) ÷ Risk_baseline × 100%.
When determining a proportion from a two-way table, what two counts are needed?
A numerator (number in the category of interest) and a denominator (relevant total).
If two events are mutually exclusive, how do you find P(A or B)?
Add their probabilities: P(A) + P(B).
If two events are independent, how do you calculate P(A and B)?
Multiply their probabilities: P(A) × P(B).
What is the probability of the coin sequence THTH?
(1/2)^4 = 1/16 = 6.25%.
For independent events A and B, what is P(B | A)?
It equals P(B); event A does not affect the probability of B.
Give the formula for expected value of a discrete variable.
Expected value = Σ(value × probability) over all outcomes.
Provide the standard error of a sample proportion (SEP).
SEP = √[ p(1 − p) / n ].
Provide the standard error of a sample mean (SEM).
SEM = population standard deviation / √n.
When is the sampling distribution of sample proportions approximately normal?
When np ≥ 5 and n(1 − p) ≥ 5.
Give a rule of thumb for normal approximation of sample means when the population shape is unknown.
n > 30 usually suffices unless the population is strongly skewed; larger n is needed for more skewness.
How does increasing sample size affect a sampling distribution?
It becomes more nearly normal and its standard error decreases.
Explain Simpson’s Paradox.
A trend present in separate groups can disappear or reverse when the data are combined because of a lurking (third) variable.