BRM Chapter 15 - Describing Relationships: Regression, Prediction, and Causation

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/40

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

41 Terms

1
New cards

Regression - overall idea

Statistical methods that fit a model to data in order to predict a response variable from one or more explanatory variables.​

2
New cards

Regression line - definition

A straight line that describes how a response variable y changes as an explanatory variable x changes; used to predict y for a given x.​

3
New cards

When to use a regression line

When a scatterplot shows an approximately straight‑line relationship and one variable is used to explain or predict the other.​

4
New cards

Least-squares regression line - definition

The line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.​

5
New cards

Equation of a regression line

Written in the form y = a + bx, y=a+bx, where a is the intercept and b is the slope.​

6
New cards

Slope of regression line - interpretation

The slope b is the amount by which the predicted y changes when x increases by 1 unit.​

7
New cards

Intercept of regression line - interpretation

The intercept a is the predicted value of y when x = 0 (though x = 0 may be outside the meaningful data range).​

8
New cards

Using the regression equation for prediction

Substitute a given x‑value into the equation y = a + bx, y=a+bx to compute the predicted y‑value.​

9
New cards

Example: fossil bones - pattern

Femur and humerus lengths of archaeopteryx fossils follow a strong straight‑line pattern, making regression prediction accurate.​

10
New cards

Example: fossil bones - equation

For the fossils, the least-squares line is: humerus length = −3.66 + (1.197 × femur length).​

11
New cards

Understanding prediction - model idea

Prediction is based on fitting some "model" (such as a straight line) to the data; better fit leads to more reliable predictions.​

12
New cards

Prediction works best when

The model fits the data closely and there is a clear, strong pattern in the relationship.​

13
New cards

Extrapolation - definition

Using a regression line to predict values of y for x‑values outside the range of the data; this is risky and often unreliable.​

14
New cards

Warning about extrapolation

Patterns may change outside the observed x‑range, so predictions far beyond the data can be seriously misleading.​

15
New cards

Example of extrapolation error

Using a child's linear growth from ages 3 to 8 to predict adult height at 25 would give an unrealistic height (like 8 feet).​

16
New cards

Correlation vs regression - key difference

Correlation measures direction and strength of a linear relationship; regression fits a specific line and requires choosing an explanatory and response variable.​

17
New cards

Effect of outliers on correlation and regression

Both correlation and regression are strongly affected by outliers; a single extreme point can change r and the regression line substantially.​

18
New cards

Coefficient of determination r² - definition

The square of the correlation r; r² is the proportion of the variation in y explained by the least-squares regression of y on x.​

19
New cards

Interpreting r² - example

If r = 0.994, then r² = 0.988, meaning 98.8% of the variation in y is explained by the straight‑line relationship with x.​

20
New cards

Example: fossil bones - r²

For the fossil data, r = 0.994 and r² = 0.988, so femur length explains 98.8% of the variation in humerus length.​

21
New cards

Prediction vs causation - distinction

A relationship can be used to make predictions even when there is no evidence that changes in one variable cause changes in the other.​

22
New cards

Statistics and causation - key warning

A strong relationship between two variables does not necessarily mean that changes in one variable cause changes in the other.​

23
New cards

Lurking variable - definition (causation context)

A variable not included in the analysis that influences both x and y, potentially creating a misleading association.​

24
New cards

Common response - definition

A type of lurking situation where a third variable influences both x and y, producing a correlation even if x and y do not directly affect each other.​

25
New cards

Confounding - definition

When the effects of two or more variables on a response are mixed together, making it difficult to distinguish their separate influences.​

26
New cards

Best evidence for causation

Comes from randomized comparative experiments, which control for lurking variables by random assignment.​

27
New cards

Using associations for prediction without causation

An observed relationship can still be used for prediction as long as the past pattern continues, even without knowing the causal mechanism.​

28
New cards

Smoking and lung cancer - causation example

Nonexperimental evidence for smoking causing lung cancer is very strong based on multiple consistent observational studies.​

29
New cards

Criteria for causation without experiments - strength

The association is strong: smokers have much higher lung cancer rates than similar nonsmokers.​

30
New cards

Criteria for causation without experiments - consistency

The association appears in many different studies, groups, and countries, reducing the chance a specific lurking variable explains it.​

31
New cards

Criteria for causation - dose-response

Higher doses are associated with stronger responses: heavier and longer-term smoking leads to higher lung cancer risk; quitting reduces risk.​

32
New cards

Criteria for causation - time order

The alleged cause precedes the effect: increases in smoking were followed about 30 years later by rises in lung cancer deaths.​

33
New cards

Criteria for causation - plausibility

Experiments with animals show that tars from cigarette smoke cause cancer, making the causal mechanism biologically plausible.​

34
New cards

Big data - definition/idea

Massive databases (often petabytes in size) of information from sources like web searches, social media, and credit card records used to find patterns and correlations.​

35
New cards

Google Flu Trends - example

Google used correlations between flu‑related search terms and flu cases to track influenza spread faster than the CDC, until it later over‑predicted cases.​

36
New cards

Limitations of big data - sampling

Big data often come from large but biased convenience samples (like Twitter users), not from representative samples of the whole population.​

37
New cards

Limitations of big data - extrapolation risk

Without understanding why a correlation exists, predictions can fail badly when the situation changes or when extrapolating to new conditions.​

38
New cards

Big data and theory

Claims that "the numbers speak for themselves" are misleading; statistical theory is still needed to avoid bias, misinterpretation, and extrapolation errors.​

39
New cards

Statistics in summary - regression

Regression fits models (often straight lines) to data to predict y from x; least-squares is the standard method for fitting a line.​

40
New cards

Statistics in summary - r² and extrapolation

r² tells what fraction of variation in y is explained by the linear model; extrapolation beyond the data range remains risky and must be treated with caution.​

41
New cards

Statistics in summary - causation

Strong association does not prove causation; lurking variables, common response, and confounding can explain observed relationships, especially without experiments