1/58
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
regression analysis
simple method for investigating functional relationships among variables
the relationship is expressed in the form of
an equation or a model connecting the response or dependent variable and one or more explanatory or predictor variables
response variable denotation
Y
predictor variable denotation
x1, x2, x3
other names for independent variables
covariates, regressors, factors, carriers
how does regression analysis usually start?
formulation of a problem
what happens if a question is not carefully formulated?
can lead to wrong choice of a model
what to do after forming a question?
select variables that could explain or predict response variable
what to do after selecting variables that could predict response variable?
collect the data from the environment
the analysis of variance
if all predictor variables are qualitative
analysis of covariance
if some predictor variables are qualitative and others are quantitative
forms of function (types)
linear and nonlinear
linear function
no
nonlinear function
nonlinear functions
linearizable functions
nonlinear functions that can be transformed into linear functions
intrinsically nonlinear functions
nonlinear functions that are not linearizable
simple regression equation
regression equation containing only one predictor variable
multiple regression equation
equation containing more than one predictor variable
univariate regression
one quantitative response variable`
multivariate regression
two or more quantitative response variables
simple regression
only one predictor variable
multiple regression
two or more predictor variablesli
near regression
all parameters enter equation linearlyn
nonlinear
relationship between response and predictors is nonlinear
analysis of variance
all predictors are qualitative
analysis of covariance
some predictors are quantitative and others are qualitative
logistic regression
response variable is qualitative
simple and multiple regressions should not be confused with
univariate and multivariate regressions
what to do after the model has been defined and data has been collected
estimate parameters of the model based on collected data
Y hat
fitted value
regression equation
Y = a + bX
Y = a + bX symbols
Y = dependent variable
X = independent variable
a = intercept
b = slope
inputs in regression
subject matter theories, model, data, statistical techniques, auxiliary assumptions
outputs in regression
parameter estimates, confidence regions, test statistics, graphical displays
objective of regression analysis
understand interrelationship between variables
process of regression
formulate the problem, fit the model, validate the assumptions, ask - is it okay? if yes, evaluate the fitted model. if no, go back to the start. is the fitted model okay? if yes, then you’re done. if no, then go back to the start.
formulate the problem
choose a set of variables, choose form of model, choose method of fitting, and specify assumptions
fit the model
use method of fitting
validate assumptions
residual plots, outliers detection, sensitivity analysis
evaluate the fitted model
goodness of fit test
covariance
indicates the direction of the linear relationship between y and x
covariance in R
cov()
correlation coefficient
covariance between the standardized x and y
Cor
correlation coefficient
properties of correlation coefificient
sign indicates direction (+ or -)
between -1 and 1
unitless
not affected by change in center of scale
correlation of x with y is the same as y with x
sensitive to outliers
least squares regression line
(Y hat) = (B hat) 0 + (B hat) 1X
residuals
difference between observed (y) and predicted (y hat) values of response variable
residuals equation
ei = yi - (y hat)i
least squares line
line that minimizes the sum of squared residuals
assumptions of least squares line
linearity, nearly normal distributions, constant variability
how to check linearity
scatterplot
if scatter points resemble a straight line
we assume linearity
degrees of freedom
number of observations minus number of estimated regression coefficients
t-test
determine whether there is a significant difference between the means of two groups
what do we get from a t-test?
p-values
to construct confidence intervals for regression parameters, we need to assume standard deviation
has normal distribution
95% confidence means
95% of these intervals would be expected to contain true value of slope