Statistics Review: Correlation, Regression, Probability, and Sampling

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/26

Earn XP

Description and Tags

27 question-and-answer flashcards covering correlation, regression, odds, risk measures, probability rules, expected value, sampling distributions, and Simpson’s Paradox.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

27 Terms

New cards

What does correlation measure and what can it NOT establish?

The strength and direction of a linear relationship between two variables; it cannot establish causation.

New cards

How can outliers influence the correlation coefficient?

They can substantially inflate or deflate the coefficient, misleading the perceived strength or direction of the relationship.

New cards

Why should correlation not be used for non-linear relationships?

Because correlation only captures linear associations; a strong non-linear pattern can produce a low correlation value.

New cards

What is extrapolation in regression and why is it risky?

Using a regression line to predict values outside the observed data range; the linear relationship may not hold there.

New cards

Write the simple linear regression equation and label its components.

y = a + b x, where a is the y-intercept and b is the slope.

New cards

How is the slope (b) of a regression line interpreted?

It represents the predicted change in y for each one-unit increase in x.

New cards

What does the y-intercept (a) represent in a regression model?

The predicted value of y when x = 0 (assuming x = 0 is meaningful for the data).

New cards

On a scatterplot, which axis normally shows the explanatory (predictor) variable?

The horizontal x-axis.

New cards

What does a positive slope indicate about two variables?

As the explanatory variable increases, the response variable also increases (positive association).

New cards

What does a negative slope indicate?

As the explanatory variable increases, the response variable decreases (negative association).

New cards

Using the equation y = 2 + 3x, what is the predicted y when x = 4?

y = 2 + 3(4) = 14.

New cards

How are the odds of an event calculated from its probability p?

Odds = p / (1 − p).

New cards

How is an odds ratio computed?

Divide the odds of the event in one group by the odds in another group.

New cards

Define relative risk and give its formula.

Relative risk is the ratio of risk in the exposed group to the risk in the baseline group: RR = riskexposed / riskbaseline.

New cards

What is “increased risk” in percentage terms?

(Riskexposed − Riskbaseline) ÷ Risk_baseline × 100%.

New cards

When determining a proportion from a two-way table, what two counts are needed?

A numerator (number in the category of interest) and a denominator (relevant total).

New cards

If two events are mutually exclusive, how do you find P(A or B)?

Add their probabilities: P(A) + P(B).

New cards

If two events are independent, how do you calculate P(A and B)?

Multiply their probabilities: P(A) × P(B).

New cards

What is the probability of the coin sequence THTH?

(1/2)^4 = 1/16 = 6.25%.

New cards

For independent events A and B, what is P(B | A)?

It equals P(B); event A does not affect the probability of B.

New cards

Give the formula for expected value of a discrete variable.

Expected value = Σ(value × probability) over all outcomes.

New cards

Provide the standard error of a sample proportion (SEP).

SEP = √[ p(1 − p) / n ].

New cards

Provide the standard error of a sample mean (SEM).

SEM = population standard deviation / √n.

New cards

When is the sampling distribution of sample proportions approximately normal?

When np ≥ 5 and n(1 − p) ≥ 5.

New cards

Give a rule of thumb for normal approximation of sample means when the population shape is unknown.

n > 30 usually suffices unless the population is strongly skewed; larger n is needed for more skewness.

New cards

How does increasing sample size affect a sampling distribution?

It becomes more nearly normal and its standard error decreases.

New cards

Explain Simpson’s Paradox.

A trend present in separate groups can disappear or reverse when the data are combined because of a lurking (third) variable.