Unit 2: Exploring Two-Variable Data

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
GameKnowt Play
New
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/28

flashcard set

Earn XP

Description and Tags

Notes from every Collegeboard daily video turned into flashcards

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

29 Terms

1
New cards

Segmented bar chart characteristics

Bars with spaces in between measured by relative frequency on the x axis. Each bar has sections indicated by a key.

2
New cards

Mosaic Plot

Same as segmented bar chart but bars touch and the width of the bars is proportional to the range size.

3
New cards

Scatterplot

Represents 2 quantitative variables. Scale shown with tickmarks and there must be a title.

4
New cards

Side-by-side bar graph

Values are measured in relative frequency (percents are taken from a 2-way table) and each of the relative frequencies of one x-variable are graphed side-by-side touching but separate from the next x-variable range.

5
New cards

Variable Association

If the distribution of conditional relative frequencies differ for each group.

6
New cards

Joint relative frequency

Cell frequency divided by total for entire table

7
New cards

Marginal relative frequency

Row + Column totals in a 2-way table divided by the table total

8
New cards

Conditional relative frequency

Relative frequency for a specific part of the 2-way table

9
New cards

x-variable name

Explanatory (x can explain y)

10
New cards

y-variable name

Response

11
New cards

Positive/Negative association

Positive: as x increases, y tends to increase

Negative: as x increases, y tends to decrease

12
New cards

Forms

Linear or non-linearU

13
New cards

Unusual Features

Clusters and apparent outliers

14
New cards

Strong/Weak strength

Strong: Data closely follows pattern

Weak: doesn’t closely follow but there is still a pattern

Determines how good of a predictor x is of y

15
New cards

Correlation Coefficient - r

Gives direction and strength of a linear relationship with the closeness to 1/-1 indicating a strong correlation

Alone - Doesn’t provide information for claims on form or features (need scatterplot for that)

16
New cards

Causation Fallacies

Some variables aren’t considered or correlations are completely coincidental

17
New cards

Linear Regression Model

Y-hat = a + bx

Y-hat = predicted value

18
New cards

Extrapolation

Unreliable. Prediction made outside the current data’s interval, meaning the current trend is not guaranteed to continue

19
New cards

Residuals

r = observed y - predicted y

r = y - y-hat

[positive = underestimate]

20
New cards

Residual plot (scatterplot)

y-axis plots residual values

Good fit = apparent randomness centered at zero with no clear patterns

Bad fit = curved pattern (not random)

21
New cards

Least Squares Regression Line (LSRL)

Linear model that minimizes the sum of r²

22
New cards

LSRL Properties

(x-hat, y-hat) will be represented on the linear model

Slope and Correlation —> b = r(Sy/Sx) [b = slope S = Standard deviation of either y or x]

23
New cards

Coefficient of Determination (r²)

Proportion of variation in the y-variable that is explained by the x-variable (how good the model is)

24
New cards

r and r² ranges

-1≤ r ≤1

0≤ r² ≤1

25
New cards

Effect of removing low/high-leverage points

Low: LSRL slope/y-intercept doesn’t change much

High: Substantial shift from original LSRL

26
New cards

High-leverage points

Unusually large or small x-values from x-bar (mean). Not high-leverage if close to x-bar

27
New cards

Low-leverage points

Close to (x-bar, y-bar), and especially close to x-bar

28
New cards

Influential points

Points that if removed would result in a big change in slope, y-intercept, and/or correlation (outliers, high-leverage points, both)

29
New cards

Log transformations

Taking the log of a variable makes high values less extreme and reduces skew (for x-variable logs) and is used to turn non-linear data into linear data