1/28
Notes from every Collegeboard daily video turned into flashcards
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Segmented bar chart characteristics
Bars with spaces in between measured by relative frequency on the x axis. Each bar has sections indicated by a key.
Mosaic Plot
Same as segmented bar chart but bars touch and the width of the bars is proportional to the range size.
Scatterplot
Represents 2 quantitative variables. Scale shown with tickmarks and there must be a title.
Side-by-side bar graph
Values are measured in relative frequency (percents are taken from a 2-way table) and each of the relative frequencies of one x-variable are graphed side-by-side touching but separate from the next x-variable range.
Variable Association
If the distribution of conditional relative frequencies differ for each group.
Joint relative frequency
Cell frequency divided by total for entire table
Marginal relative frequency
Row + Column totals in a 2-way table divided by the table total
Conditional relative frequency
Relative frequency for a specific part of the 2-way table
x-variable name
Explanatory (x can explain y)
y-variable name
Response
Positive/Negative association
Positive: as x increases, y tends to increase
Negative: as x increases, y tends to decrease
Forms
Linear or non-linearU
Unusual Features
Clusters and apparent outliers
Strong/Weak strength
Strong: Data closely follows pattern
Weak: doesn’t closely follow but there is still a pattern
Determines how good of a predictor x is of y
Correlation Coefficient - r
Gives direction and strength of a linear relationship with the closeness to 1/-1 indicating a strong correlation
Alone - Doesn’t provide information for claims on form or features (need scatterplot for that)
Causation Fallacies
Some variables aren’t considered or correlations are completely coincidental
Linear Regression Model
Y-hat = a + bx
Y-hat = predicted value
Extrapolation
Unreliable. Prediction made outside the current data’s interval, meaning the current trend is not guaranteed to continue
Residuals
r = observed y - predicted y
r = y - y-hat
[positive = underestimate]
Residual plot (scatterplot)
y-axis plots residual values
Good fit = apparent randomness centered at zero with no clear patterns
Bad fit = curved pattern (not random)
Least Squares Regression Line (LSRL)
Linear model that minimizes the sum of r²
LSRL Properties
(x-hat, y-hat) will be represented on the linear model
Slope and Correlation —> b = r(Sy/Sx) [b = slope S = Standard deviation of either y or x]
Coefficient of Determination (r²)
Proportion of variation in the y-variable that is explained by the x-variable (how good the model is)
r and r² ranges
-1≤ r ≤1
0≤ r² ≤1
Effect of removing low/high-leverage points
Low: LSRL slope/y-intercept doesn’t change much
High: Substantial shift from original LSRL
High-leverage points
Unusually large or small x-values from x-bar (mean). Not high-leverage if close to x-bar
Low-leverage points
Close to (x-bar, y-bar), and especially close to x-bar
Influential points
Points that if removed would result in a big change in slope, y-intercept, and/or correlation (outliers, high-leverage points, both)
Log transformations
Taking the log of a variable makes high values less extreme and reduces skew (for x-variable logs) and is used to turn non-linear data into linear data