bivariate
two variables
what are we interested in when exploring bivariate data
whether not/how changes in one variable can allow us to predict changes in the other variable
scatterplot
2-dimensional graph of ordered pairs
what do we say about positively associated variables
higher than average values of one variable TEND TO BE PAIRED WITH higher than average values of the other variable
correlation coefficient (r) is a measure of what?
the strength of the LINEAR relationship between 2 variables, as well as the direction of this relationship (+ or -)
if r is equal to the absolute value of 1…
all points lie on a line
if abs(r) is greater than .8…
the LINEAR correlation is generally regarded to be strong
if abs(r) is between .5 & .8…
the LINEAR correlation is generally regarded to be moderateif
if abs(r) is less than .5…
the LINEAR correlation is generally regarded to be weak
if r is = 0, there is…
no LINEAR correlation (may be nonlinear correlation)
if you change the order of the explanatory/dependent variables, what effect will this change have on r?
none!
if you change the units of measurement of one of your variables (e.g. ft to yrds), what effect will this change have on r?
none!
is r resistant to extreme values? why/why not?
no; r is based on the MEAN & is effected by extreme values
least squares regression line does what?
minimizes the sum of squared errors/distances of points from our line
residual
difference between the observed values of the response variable (y) and the predicted values (ŷ) from the model
a negative residual means that ŷ was too…
large
a positive residual means that ŷ was too…
small
a pattern of residuals that doesn’t appear to be randomly distributed about 0 indicates…
a regression line that isn’t a good model of our data
interpolation
trying to predict a value of y from a value of x which is WITHIN the range of x-values we have (your traditional prediction, think internal)
extrapolation
trying to predict a value of y from a value of x which ISN’T within the range of x-values we have (unadvised, rarely have confidence, think external)
coefficient of determination
r^2; the proportion of the total variability in y which is explained by the regression of y on x
outlier when dealing with bivariate data…
lies outside the general pattern of data (for regression, datapoint has a LARGE RESIDUAL)
influential observation
an observation that has a strong influence on the regression model; most influential points tend to be extreme in the x direction (high LEVERAGE points)