Covariance: tool used to determine the relationship between 2 random variables
Scatter plot: used to determine whether a linear correlation exists between two variables
Covariance formula: captures degree to which pairs of points systematically vary around their respective means
high positive covariance: occurs when paired x and y values both tend to be above or below their means at the same time; if both are up
High negative covariance: occurs when paired x and y values tend to be on opposite sides of their respective means; if one is up and the other is down
Zero covariance: no systematic tendencies of any sort between paired x and y values
Pearson’s correlation coefficient: measure of the strength and direction of a linear relationship between two variables ranging between -1 and 1, with -1 representing negative linear correlation and 1 representing positive correlation. R is close to zero when there is no linear correlation
Caveats of correlation: only works with interval or ratio data; only demonstrates linear relationship; correlation doesn’t imply causation
Regression: quantifies the relationship between variables where causation is implied; explanatory variables are termed independent and explained variables are termed dependent;
Regression models: estimate nature of relationship between independent and dependent variables; relationship between variables is shown as a linear function via slope intercept form
Least squares regression: mathematical procedure for finding the best-fitting curve to a given set of points; difference between actual y values and predicted y values
Residual: difference between the observed and predicted values for y; residual = observed y - predicted y
R2: statistical measure of how close the data are to fitted regression line; percentage of the response variable variation explained by linear model; always between 0 and 100%; higher R2 percentage means that the model better fits your data
R2 formula: explained variation / total variation
0% R2: indicates that model explains none of the variability of the response data around its mean
100% R2: model explains all the variability of the response data
Assumptions of regression: dependent variable should be normally distributed; predictors should not be strongly correlated with each other; observations should be independent of each other; 10 observations per independent variable
Difference between correlation and regression: Regression attempts to establish causality and produces an entire equation while correlation is a single statistic
Spatial regression: attempts to account for variation across a landscape with one equation
Geographically weighted regression: coefficients are allowed to vary between areas caross landscape