1/26
Flashcards covering outliers, correlation, regression, and linear models.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Outliers
Points that lie far away from the typical variation in your data; can be identified using 1.5*Interquartile range, Z score cutoff (+/- 2.5 or even 3), or impossible values.
Correlation
A standardized covariation measure that gets around the unit issue, ranging from -1 to 1; it estimates the parameter ρ (rho) with the statistic r.
Correlation
Tells us how close our data lie to a trendline.
P-value
Tells us how likely it is that our data would be that close to a trendline by random chance.
Correlation - When to Use
Summarizes the direct relationship between two variables.
Regression - When to Use
Predicts or explains the numeric response.
Correlation - Able to quantify the direction of the relationship?
Yes
Regression - Able to quantify the direction of the relationship?
Yes
Correlation - Able to quantify the strength of the relationship?
Yes
Regression - Able to quantify the strength of the relationship?
Yes
Correlation - Able to show cause and effect?
No
Regression - Able to show cause and effect?
Yes
Correlation - Able to predict and optimize?
No
Regression - Able to predict and optimize?
Yes
Correlation - X and Y are interchangeable?
Yes
Regression - X and Y are interchangeable?
No
Correlation - Uses a mathematical equation?
No
Regression - Uses a mathematical equation?
y = a + b(x)
General Linear Models
Models how a dependent variable (Y) changes over an independent variable (X).
Slope
The “m” in the equation Y = mX + b.
Y intercept
The “b” in the equation Y = mX + b.
Ordinary Least Squares
Estimates β1 (slope) and β0 (y-intercept) using the formula: β1 ~ b1 = cov(X,Y) / var(X)
Fitted values
Predicted values; calculated as Ŷi = b1 · Xi + b0
Residual sum of squares SS Error
The sum of squared deviations; calculated as ∑(Yi - Ŷi)^2
Model sum of squares SS Regression
The sum of squared deviations; calculated as ∑(Ŷi - Ÿ)^2
Total sum of squares SS Total
The sum of squared deviations; calculated as ∑(Yi - Ÿ)^2
Coefficient of determination R2
Measures the proportion of variance in the dependent variable that can be predicted from the independent variable(s); calculated as R^2 = σ(Ŷi - Ÿ)^2 / σ(Yi - Ÿ)^2 or R^2 = 1 - σ(Yi - Ŷi)^2 / σ(Yi - Ÿ)^2