Unit 2: Exploring Two-Variable Data

0.0(0)

Studied by 2 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/28

Earn XP

Description and Tags

Notes from every Collegeboard daily video turned into flashcards

Statistics

Linear Regression and Correlation

AP Statistics

Unit 2: Exploring Two-Variable Data

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

29 Terms

New cards

Segmented bar chart characteristics

Bars with spaces in between measured by relative frequency on the x axis. Each bar has sections indicated by a key.

New cards

Mosaic Plot

Same as segmented bar chart but bars touch and the width of the bars is proportional to the range size.

New cards

Scatterplot

Represents 2 quantitative variables. Scale shown with tickmarks and there must be a title.

New cards

Side-by-side bar graph

Values are measured in relative frequency (percents are taken from a 2-way table) and each of the relative frequencies of one x-variable are graphed side-by-side touching but separate from the next x-variable range.

New cards

Variable Association

If the distribution of conditional relative frequencies differ for each group.

New cards

Joint relative frequency

Cell frequency divided by total for entire table

New cards

Marginal relative frequency

Row + Column totals in a 2-way table divided by the table total

New cards

Conditional relative frequency

Relative frequency for a specific part of the 2-way table

New cards

x-variable name

Explanatory (x can explain y)

New cards

y-variable name

Response

New cards

Positive/Negative association

Positive: as x increases, y tends to increase

Negative: as x increases, y tends to decrease

New cards

Forms

Linear or non-linearU

New cards

Unusual Features

Clusters and apparent outliers

New cards

Strong/Weak strength

Strong: Data closely follows pattern

Weak: doesn’t closely follow but there is still a pattern

Determines how good of a predictor x is of y

New cards

Correlation Coefficient - r

Gives direction and strength of a linear relationship with the closeness to 1/-1 indicating a strong correlation

Alone - Doesn’t provide information for claims on form or features (need scatterplot for that)

New cards

Causation Fallacies

Some variables aren’t considered or correlations are completely coincidental

New cards

Linear Regression Model

Y-hat = a + bx

Y-hat = predicted value

New cards

Extrapolation

Unreliable. Prediction made outside the current data’s interval, meaning the current trend is not guaranteed to continue

New cards

Residuals

r = observed y - predicted y

r = y - y-hat

[positive = underestimate]

New cards

Residual plot (scatterplot)

y-axis plots residual values

Good fit = apparent randomness centered at zero with no clear patterns

Bad fit = curved pattern (not random)

New cards

Least Squares Regression Line (LSRL)

Linear model that minimizes the sum of r²

New cards

LSRL Properties

(x-hat, y-hat) will be represented on the linear model

Slope and Correlation —> b = r(S_y/S_x) [b = slope S = Standard deviation of either y or x]

New cards

Coefficient of Determination (r²)

Proportion of variation in the y-variable that is explained by the x-variable (how good the model is)

New cards

r and r² ranges

-1≤ r ≤1

0≤ r² ≤1

New cards

Effect of removing low/high-leverage points

Low: LSRL slope/y-intercept doesn’t change much

High: Substantial shift from original LSRL

New cards

High-leverage points

Unusually large or small x-values from x-bar (mean). Not high-leverage if close to x-bar

New cards

Low-leverage points

Close to (x-bar, y-bar), and especially close to x-bar

New cards

Influential points

Points that if removed would result in a big change in slope, y-intercept, and/or correlation (outliers, high-leverage points, both)

New cards

Log transformations

Taking the log of a variable makes high values less extreme and reduces skew (for x-variable logs) and is used to turn non-linear data into linear data