BRM Chapter 15 - Describing Relationships: Regression, Prediction, and Causation

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/40

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

41 Terms

1
New cards

Regression - overall idea

Statistical methods that fit a model to data in order to predict a response variable from one or more explanatory variables.​

2
New cards

Regression line - definition

A straight line that describes how a response variable y changes as an explanatory variable x changes; used to predict y for a given x.​

3
New cards

When to use a regression line

When a scatterplot shows an approximately straight‑line relationship and one variable is used to explain or predict the other.​

4
New cards

Least-squares regression line - definition

The line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.​

5
New cards

Equation of a regression line

Written in the form y = a + bx, y=a+bx, where a is the intercept and b is the slope.​

6
New cards

Slope of regression line - interpretation

The slope b is the amount by which the predicted y changes when x increases by 1 unit.​

7
New cards

Intercept of regression line - interpretation

The intercept a is the predicted value of y when x = 0 (though x = 0 may be outside the meaningful data range).​

8
New cards

Using the regression equation for prediction

Substitute a given x‑value into the equation y = a + bx, y=a+bx to compute the predicted y‑value.​

9
New cards

Example: fossil bones - pattern

Femur and humerus lengths of archaeopteryx fossils follow a strong straight‑line pattern, making regression prediction accurate.​

10
New cards

Example: fossil bones - equation

For the fossils, the least-squares line is: humerus length = −3.66 + (1.197 × femur length).​

11
New cards

Understanding prediction - model idea

Prediction is based on fitting some "model" (such as a straight line) to the data; better fit leads to more reliable predictions.​

12
New cards

Prediction works best when

The model fits the data closely and there is a clear, strong pattern in the relationship.​

13
New cards

Extrapolation - definition

Using a regression line to predict values of y for x‑values outside the range of the data; this is risky and often unreliable.​

14
New cards

Warning about extrapolation

Patterns may change outside the observed x‑range, so predictions far beyond the data can be seriously misleading.​

15
New cards

Example of extrapolation error

Using a child's linear growth from ages 3 to 8 to predict adult height at 25 would give an unrealistic height (like 8 feet).​

16
New cards

Correlation vs regression - key difference

Correlation measures direction and strength of a linear relationship; regression fits a specific line and requires choosing an explanatory and response variable.​

17
New cards

Effect of outliers on correlation and regression

Both correlation and regression are strongly affected by outliers; a single extreme point can change r and the regression line substantially.​

18
New cards

Coefficient of determination r² - definition

The square of the correlation r; r² is the proportion of the variation in y explained by the least-squares regression of y on x.​

19
New cards

Interpreting r² - example

If r = 0.994, then r² = 0.988, meaning 98.8% of the variation in y is explained by the straight‑line relationship with x.​

20
New cards

Example: fossil bones - r²

For the fossil data, r = 0.994 and r² = 0.988, so femur length explains 98.8% of the variation in humerus length.​

21
New cards

Prediction vs causation - distinction

A relationship can be used to make predictions even when there is no evidence that changes in one variable cause changes in the other.​

22
New cards

Statistics and causation - key warning

A strong relationship between two variables does not necessarily mean that changes in one variable cause changes in the other.​

23
New cards

Lurking variable - definition (causation context)

A variable not included in the analysis that influences both x and y, potentially creating a misleading association.​

24
New cards

Common response - definition

A type of lurking situation where a third variable influences both x and y, producing a correlation even if x and y do not directly affect each other.​

25
New cards

Confounding - definition

When the effects of two or more variables on a response are mixed together, making it difficult to distinguish their separate influences.​

26
New cards

Best evidence for causation

Comes from randomized comparative experiments, which control for lurking variables by random assignment.​

27
New cards

Using associations for prediction without causation

An observed relationship can still be used for prediction as long as the past pattern continues, even without knowing the causal mechanism.​

28
New cards

Smoking and lung cancer - causation example

Nonexperimental evidence for smoking causing lung cancer is very strong based on multiple consistent observational studies.​

29
New cards

Criteria for causation without experiments - strength

The association is strong: smokers have much higher lung cancer rates than similar nonsmokers.​

30
New cards

Criteria for causation without experiments - consistency

The association appears in many different studies, groups, and countries, reducing the chance a specific lurking variable explains it.​

31
New cards

Criteria for causation - dose-response

Higher doses are associated with stronger responses: heavier and longer-term smoking leads to higher lung cancer risk; quitting reduces risk.​

32
New cards

Criteria for causation - time order

The alleged cause precedes the effect: increases in smoking were followed about 30 years later by rises in lung cancer deaths.​

33
New cards

Criteria for causation - plausibility

Experiments with animals show that tars from cigarette smoke cause cancer, making the causal mechanism biologically plausible.​

34
New cards

Big data - definition/idea

Massive databases (often petabytes in size) of information from sources like web searches, social media, and credit card records used to find patterns and correlations.​

35
New cards

Google Flu Trends - example

Google used correlations between flu‑related search terms and flu cases to track influenza spread faster than the CDC, until it later over‑predicted cases.​

36
New cards

Limitations of big data - sampling

Big data often come from large but biased convenience samples (like Twitter users), not from representative samples of the whole population.​

37
New cards

Limitations of big data - extrapolation risk

Without understanding why a correlation exists, predictions can fail badly when the situation changes or when extrapolating to new conditions.​

38
New cards

Big data and theory

Claims that "the numbers speak for themselves" are misleading; statistical theory is still needed to avoid bias, misinterpretation, and extrapolation errors.​

39
New cards

Statistics in summary - regression

Regression fits models (often straight lines) to data to predict y from x; least-squares is the standard method for fitting a line.​

40
New cards

Statistics in summary - r² and extrapolation

r² tells what fraction of variation in y is explained by the linear model; extrapolation beyond the data range remains risky and must be treated with caution.​

41
New cards

Statistics in summary - causation

Strong association does not prove causation; lurking variables, common response, and confounding can explain observed relationships, especially without experiments