1/31
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
univariate data
one set of data
ex: boxplot, ogive, histogram, timeplot, dotplot, ribbon chart, pie chart
describe w/ SOCS
bivariate data
two quantitative data sets
always graph data on scatterplots!
ex: tables, scatterplots, correlation, LSRL
describe w/ FODS (form, outliers, direction, strength)
scatterplot
shows relationship between two QUANTITATIVE variables that were measured on the same individual
has a horizontal & vertical axis
each individual = one point
describe distribution (BIVARIATE)
form, strength, direction, outliers/deviations
IN CONTEXT!!!!
form
general shape of the scatterplot
linear or nonlinear
nonlinear = curved, exponential, cluster, multiple clusters, etc
strength
describes the association between the two variables; how closely related are the two variables
ex: strong, moderate, weak
ALWAYS use the r-value
direction
the type of association; the region the scatterplot appears to be going to
can be:
positive - increases in explanatory variable = increases in response variable
negative - increases in explanatory variable = decreases in response variable
none/no - increases in explanatory variable = no predicted region the scatterplot is going toward
outliers/deviations
any points that don’t really fit the pattern, have large residuals
this is measured approximately
formula for describing distribution
there is a strong/moderate/weak (r = a) positive/negative linear/nonlinear association between variable x and variable y
correlation coefficient (r)
measure of the direction and strength of the association
only used for LINEAR relationships
does not depend on units of measurement (can interchange variables)
between -1 and 1
sensitive to outliers
does NOT mean causation or form
calculate r-value
product of the z-scores (x and y) over n-1
regression line
summarizies the relationship between two variables, but only in a specific setting: when one variable helps explain the other
ŷ = a + bx
ŷ = y-hat, the predicted value of the response variable
x = the explanatory variable
a
y-intercept
b
slope, can be calculated w/
r × (Sy/Sx)
least-squares regression line/LSRL
the line of best fit; the regression line that makes the sum of the squared residuals a small as possible
NOT the same as a regression line; it is a very specific type of regression line
how to find LSRL
ŷ = a + bx
(x̄, ȳ) is always on the line
slope = r × (Sy/Sx)
plug and chug
plug in mean coordinates into ur general equation
solve for a
write out equation and define variables
residuals
the leftovers or prediction errors in the vertical axis; y - ŷ = actual - predicted
positive —> y > ŷ
actual is higher than predicted
negative —> y < ŷ
actual is lower than predicted
none —> y = ŷ
actual is the same as predicted
slope in context formula
On average, or every increase in 1 (unit of explanatory/x variable), the predicted (response variable) increases/decreases by slope (unit of response/y variable)
y-intercept in context formula
When the (explanatory/x variable) is at 0 (unit of explanatory variable), the predicted (response/y variable) is at “a” (unit of response variable)
extrapolation
the use of the regression line for a prediction far outside of the interval of the x-values used to create the line
often not accurate
residual plot
a scatterplot that displays the residuals on the vertical axis and explanatory variable on the horizontal axis
linear model = appropriate if
no obvious patterns
relatively small in size
even scatter above and below x-axis
standard error (Se)
the average size of a residual; measures how far, on average, each value differs from the predicted
Se in context formula
On average, the predicted (response variable) differs from the actual (response variable) by about Se (units of response variable)
Total Sum of Squares of Errors (SST) or Sum of Squares Total
the overall measure of variation in the y-values
uses y-bar, not y-hat
sum of the difference between the actual and average squared
on a scatterplot —> draw average as a horizontal line and fine difference
standard deviation from average (Sy)
measures how far, on average, each value differs from the mean
Sy = square root of variance
Sum of the Squares of Errors (SSE) or Residuals
amount of variation in the residuals
uses y-hat, not y-bar
R-squared/Coefficient of Determination
measures the percent reduction in the sum of squared residuals when using the LSRL to make predictions, rather than the mean value of y
the percent of the variablity in the response variable that is accounted by the LSRL
can also use the correlation coefficient squared
r-squared in context formula
the amount of variation that has been explained/accounted for by the linear relationship between (response and explanatory variables) is __%
learn how to read computer regression output
be able to find
the slope b
the y intercept a
the values of s
the value of r2
is the linear model appropriate
residual plot
r2
sum of residuals squared
high leverage
points that are extreme in the x direction
influential
points that, if removed, substantially change the regression line
change in slope, y-intercept, correlation, coefficient of determination, or increases in standard deviation