8- Correlation Doesn't Equal Causation: Crash Course Statistics #8

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/25

Earn XP

Description and Tags

Source: https://www.youtube.com/watch?v=GtV-VYdNt_g&list=PL8dPuuaLjXtNM_Y-bUAhblSAdWRnmBUcr&index=9

Last updated 6:05 PM on 2/24/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

26 Terms

New cards

What does a regression line in simple linear regression technically represent?

A) The line that visually looks closest to the data
B) The line that minimizes the sum of absolute errors
C) The line that minimizes the sum of squared residuals
D) The line with the steepest possible slope

Correct Answer: C) The line that minimizes the sum of squared residuals

Explanation:
The regression line is calculated using ordinary least squares (OLS), which minimizes the sum of squared residuals, not just visual closeness (A) or absolute errors (B).

New cards

If the slope of a regression line is 0.5 (in inches), what does this mean?

A) The son is always exactly half as tall as the father
B) For each additional inch of father’s height, son’s height increases by 0.5 inches on average
C) The variables are strongly correlated
D) The relationship is causal

Correct Answer: B) For each additional inch of father’s height, son’s height increases by 0.5 inches on average

Explanation:
The slope represents the average change in Y for a 1-unit increase in X. It does not imply perfect prediction (A), strength of correlation (C), or causation (D).

New cards

What happens to the slope if we change the units of measurement (e.g., inches to meters)?

A) It stays exactly the same
B) It becomes zero
C) It changes because slope depends on units
D) The correlation changes dramatically

Correct Answer: C) It changes because slope depends on units

Explanation:
Slope depends on measurement units. If units change, the slope changes. However, correlation does not change, because it is standardized.

New cards

If the correlation coefficient (r) equals 0, what does this imply?

A) There is no relationship at all
B) There is no linear relationship
C) The slope must be zero in the population
D) The variables are independent

Correct Answer: B) There is no linear relationship

Explanation:
r = 0 means no linear relationship, but a nonlinear relationship may still exist. It does not automatically mean independence.

New cards

Which of the following best describes what correlation measures?

A) Causation between two variables
B) The steepness of a regression line
C) The direction and strength of a linear relationship
D) The prediction accuracy of any model

Correct Answer: C) The direction and strength of a linear relationship

Explanation:
Correlation measures direction (positive/negative) and strength of a linear relationship. It does not measure causation or slope steepness.

New cards

Which statement about correlation is TRUE?

A) Correlation changes if units change
B) Correlation ranges from 0 to 1
C) Correlation is unit-free
D) A steep slope guarantees strong correlation

Correct Answer: C) Correlation is unit-free

Explanation:
Correlation is standardized using standard deviations, making it unit-free. It ranges from -1 to 1.

New cards

If r = -0.9, this indicates:

A) A strong positive linear relationship
B) A strong negative linear relationship
C) No relationship
D) A weak relationship

Correct Answer: B) A strong negative linear relationship

Explanation:
The sign indicates direction (negative = opposite movement), and magnitude close to 1 indicates strong relationship.

New cards

What does an R² value of 0.7 mean?

A) 70% of Y is caused by X
B) 70% of the variance in Y is explained by the linear model with X
C) The model predicts perfectly
D) 70% of observations lie exactly on the regression line

Correct Answer: B) 70% of the variance in Y is explained by the linear model with X

Explanation:
R² represents the proportion of variance explained by the linear model, not causation (A) or perfect prediction (C).

New cards

Why does correlation not imply causation?

A) Because correlation is always weak
B) Because a third variable may explain both variables
C) Because regression lines are unreliable
D) Because scatterplots are misleading

Correct Answer: B) Because a third variable may explain both variables

Explanation:
Correlation may occur due to:

A causing B
B causing A
A third variable causing both
Coincidence

New cards

If sample data shows a non-zero slope, what must we do before concluding a real relationship exists in the population?

A) Nothing; slope proves relationship
B) Check the steepness visually
C) Perform statistical significance testing
D) Change measurement units

Correct Answer: C) Perform statistical significance testing

Explanation:
A sample slope may occur due to random variation. Statistical testing (e.g., t-test for slope) is needed to infer a population relationship.

New cards

Why is it important to look at a scatterplot even if you know r?

A) Because r tells you nothing
B) Because different datasets can have the same r but very different shapes
C) Because scatterplots change correlation
D) Because R² cannot be calculated without it

Correct Answer: B) Because different datasets can have the same r but very different shapes

Explanation:
The “Datasaurus Dozen” demonstrates that datasets can share identical correlation values but have very different patterns. Visual inspection is crucial.

New cards

Which statement about prediction and R² is most accurate?

A) High R² guarantees perfect prediction
B) High R² means no residual error
C) Higher R² generally indicates better predictive fit, but residual variability still exists
D) R² proves causation

Correct Answer: C) Higher R² generally indicates better predictive fit, but residual variability still exists

Explanation:
Even with R² = 0.7, 30% of variance remains unexplained. Prediction is improved but not perfect.

New cards

When a scatterplot shows two distinct “blobs” or clusters of points, this most likely suggests:

A) A strong linear relationship
B) The presence of subgroups within the data
C) Perfect correlation
D) Measurement error

Correct Answer: B) The presence of subgroups within the data

Explanation:
Clusters often indicate different underlying groups or processes, as in the Old Faithful eruption example (short vs long eruptions). This may suggest the need for separate analyses.

New cards

Which of the following is TRUE regarding nonlinear relationships?

A) If r = 0, there is no relationship of any kind
B) Correlation only measures linear relationships
C) Nonlinear relationships always produce strong r values
D) Regression cannot model any nonlinear pattern

Correct Answer: B) Correlation only measures linear relationships

Explanation:
The Pearson correlation coefficient measures linear association only. A nonlinear relationship can exist even if r ≈ 0.

New cards

In a positively correlated scatterplot, most data points tend to fall in which quadrants (when divided by the means)?

A) Upper left and lower right
B) Upper right and lower left
C) Only upper right
D) Evenly in all quadrants

Correct Answer: B) Upper right and lower left

Explanation:
Positive correlation means both variables move together.

Upper right: both above average
Lower left: both below average

New cards

The relationship between hours asleep and hours awake (in a 24-hour day) is an example of:

A) Weak positive correlation
B) Perfect positive correlation
C) Perfect negative correlation
D) No correlation

Correct Answer: C) Perfect negative correlation

Explanation:
Because total time is fixed (24 hours), knowing one value allows exact prediction of the other. This creates perfect negative linear correlation (r = -1).

New cards

When researchers test many different subsets of data until they find a significant relationship, this increases the risk of:

A) Strong causation
B) Reduced variability
C) Spurious correlation due to multiple comparisons
D) Perfect prediction

Correct Answer: C) Spurious correlation due to multiple comparisons

Explanation:
Searching through many comparisons increases the probability of finding relationships purely by chance. This is sometimes called data dredging or p-hacking.

New cards

If a scatterplot shows a random cloud of points evenly distributed across quadrants, this most likely indicates:

A) Strong nonlinear relationship
B) Strong negative correlation
C) No linear relationship
D) Perfect prediction

Correct Answer: C) No linear relationship

Explanation:
When points are evenly scattered, correlation will be near zero, indicating no linear association.

New cards

The relationship between Celsius and Fahrenheit temperatures results in R² = 1 because:

A) They measure unrelated concepts
B) The relationship is nonlinear
C) One is a perfect linear transformation of the other
D) They are measured in the same units

Correct Answer: C) One is a perfect linear transformation of the other

Explanation:
Fahrenheit is a perfect linear transformation of Celsius. Therefore, 100% of the variance is explained, leading to R² = 1. This does not imply causation — it reflects mathematical conversion.

New cards

A non-zero regression coefficient (slope) indicates:

A) A strong relationship between variables
B) A causal relationship
C) Some degree of linear association in the sample
D) Perfect prediction

Correct Answer: C) Some degree of linear association in the sample

Explanation:
A non-zero slope suggests some linear association in the sample, but it does not indicate strength (that’s correlation), causation, or statistical significance.

New cards

Which statement best distinguishes slope (m) from correlation (r)?

A) Slope measures strength; correlation measures units
B) Slope depends on units; correlation does not
C) Slope ranges from -1 to 1; correlation does not
D) They always have the same numerical value

Correct Answer: B) Slope depends on units; correlation does not

Explanation:
Slope depends on the measurement units of X and Y.
Correlation is standardized and unit-free.

New cards

A stronger linear relationship is indicated when:

A) The regression line is steeper
B) Data points are tightly clustered around the regression line
C) The slope is positive
D) The intercept is large

Correct Answer: B) Data points are tightly clustered around the regression line

Explanation:
Strength of linear relationship depends on how closely points cluster around the line — not on steepness or intercept size.

New cards

Which of the following best describes R² as discussed conceptually?

A) The percentage of data points on the regression line
B) The proportion of variance explained by the linear model
C) The probability that X causes Y
D) The slope squared

Correct Answer: B) The proportion of variance explained by the linear model

Explanation:
R² represents the proportion of variance in the outcome explained by the predictor in a linear model.

New cards

If two datasets have identical slopes but one has much more scatter around the line, the dataset with more scatter will likely have:

A) Higher correlation
B) Lower correlation
C) Identical correlation
D) Perfect correlation

Correct Answer: B) Lower correlation

Explanation:
More scatter means weaker clustering around the line, leading to a lower correlation coefficient.

New cards

The relationship between car weight and gas efficiency is typically an example of:

A) Strong positive linear relationship
B) Strong negative linear relationship
C) Perfect correlation
D) No relationship

Correct Answer: B) Strong negative linear relationship

Explanation:
Heavier cars tend to have lower gas efficiency, representing a negative linear association.