1/64
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
correlation coeffeicient
measurement of the strength of a linear relationship between two variables
interpolation
predicting y values for values that are between the x values in the dataset
extrapolation
predciting y values for x values that are beyond the values in the dataset
coefficient of determination / RÂČ
the ratio of how much one variable explains the variation
MSE
what can you use as an estimate of variance for significance testing
t test
tests for a significant regression relationship
f test
tests the regression model for statistical significance
adjusted RÂČ
ratio of explained variation in the regression model that accounts for the number of predictor variables in the model
multicollinearity
when two predictor variables high a high linear correlation
VIF > 10
how do you determine serious multicollinearity?
VIF > 4
how do you determine potential multicollinearity?
confidence intervals around a mean response
measures the accuracy of the mean response of a population, constructed around a predicted value of Y at a given value of X
prediction intervals for an individual response
measures the accuracy of a single individualâs predicted response
larger
is the standard error of a prediction interval for an individual response larger or smaller than the standard error of E[y|x]?
analysis of variance
statistical models used to analyze the differences amongst group means and amongst the associated procedures with each
dummy variables / indicator variables
variables representing a non-numerical response by encoding the responses into numbers
K - 1 rule
the number of dummy variables needed to represent a categorical variable with K levels
not different from reference variable
what does a high p value mean when using dummy variables?
mean centering
model where the intercept is centered around the mean of the predictors instead of when the predictors = 0
interaction terms
statistical test evaluating whether a variable has the same or a different effect (slope) across different dummy variables
curvilinear relationship
relationship between variables where the effect is not a straight line, requiring the use of a polynomial term in the regression equation
residuals plot is non-linear/curved
when is there a curvilinear relationship?
simple first order model with one predictor variable
regression modeling a linear relationship with one predictor variable
second order model with one predictor variable
regression modeling a curvilinear relationship with one predictor variable
linearity, independence, normality, constant variance
what are the 4 core assumptions of a valid regression model?
independence
assumption that each residual error term is independent of the others
normally distributed, randomly scattered around 0, constant spread from left to right, no obvious patterns
what are the 4 factors considered when checking residuals?
homoscedasticity
residuals display a constant, even band of variance across all fitted values
heteroscedasticity
residuals are fanned out, funneled, contract, or have a shape, indicating non constant error variance
logarithmic transformation, power transformation
what are the two transformations used on the response variable to fix heteroscedasticity?
time ordered, clustered data, repeated measures
what 3 qualifications prove that data points are not independent?
time ordered
observations of independent data should not be sequential over time
clustered data
observations of independent data should not be similar geographically or organiziationally/companies
repeated measures
observations of independent data should not be repeated measurements of the same subject
influential observation / outlier
observation that has a disproportionate affect on the slope and Y-intercept in regression
leverage
how much an observation influences the model
standardized residuals
residuals scaled by the overall modelâs variability so you can compare residuals accross observations
studentized residuals
residuals scaled using the error estimate excluding that particular observation, which is better for detecting outliers
cookâs distance
measure of how much the regression model will change if you remove one observation
collect more data near that point, use transformations, remove the point
what 3 methods can you use to handle outliers?
D > 4/n
when do you remove an influential point from data?
reg = lm(y ~ x, data = df)
what is the R syntax for lienar regression
abline(reg, col = âredâ)
what is the R syntax for adding a regression line to a graph
res = reg$residuals
what is the R syntax for finding residuals
str(variable)
what is the R syntax for checking variable types
variable = factor(variable)
what is the R syntax forcreating dummy variables
qq plot
plot of residuals telling us if residuals are normally distributed. should be a relatively straight, diagonal line
the sum of each data point minus the mean
how do you find the mean center of a variable?
parsimonious model
relatively simple regression models with few predictor variables and relatively high RÂČ
forward selection
model selection method adding one predictor at a time to an intercept only model as long as there is a significant reduction in the residual SSE. variables cannot be removed once added
backward elimination
model selection method removing one predictor at a time from a full model based on P-values. variables cannot be added again once removed.
best subsets regression
model selection method evaluating every combination of predictors and selects the best model based on a chosen criterion, like BIC or RÂČ. gives the lowest error
plot()
what is the R script for generating four diagnostic plots?
vif()
what the R script for the variance inflation factor?
stepwise regression
hybrid model selection method that can both add remove variables at each step. starts with an intercept only model
heuristics
procedures that are one variable at a time, such as stepwise regression, forward selection, or backward elimination
Marlows CP
model selection criterion that compares a reduced model to the full multiple model to see if the model has enough predictors without overfitting
lowest CP
how do you select a model using marlows CP?
akaike information criterion / AIC
model selection criterion that balances model fit and complexity by judging the amount of information lost by a given model while penaliziing for number of predictors. focuses on best fit
Bayesian information criterion / BIC
model selection criterion that evaluates a model's fit while adjusting for the number of predictors, used to prevent overfitting. tends to pick simplest model
lowest BIC
how do you select a model using BIC?
lowest AIC
how do you select a model using AIC?
step()
what is the R script for stepwise regression?
regsubsets()
what is the R script for best subsets regression?
qqplot(reg)
what is the R script for qq plot?