focus on 3 confounding effects
What is the definition of homoscedasticity?
Variance- vertical shape
AKA how “flat” the data is
How does an investigator deal with bias?
variable control
In observational studies it is hard to account for _______, but we can still do what?
In observational studies it is hard to account for bias, but we can still identify and account for them.
What is the definition of bias?
anything that can non-randomly skew my results
Statistical methods used to account for sources of bias are called ________________________.
regression analysis
What is a confounder?
any factor that prevents appropriate statistical interpretation of results within practical context of a study
What is an omitted variable?
a specific and observable factor that is omitted from the analysis
T/F: All omitted variables are confounders, but not all confounders are omitted variables.
T
What does the magnitude of bias refer to?
difference between the average confounded sample estimate and the non-confounded population parameter.
If I did a regression analysis not accounting for bias, then I did a regression analysis including the bias, the results of the FIRST regression analysis would be considered _______________.
inefficient
What are 3 types of confounders?
mediating effect
modifying effect
confounding effect
What is the mediating effect?
a mediator is an intermediate between the independent and dependent variable
What is the modifying effect?
alters the degree (strength) and/or direction of the relationship between the independent and dependent variable
What is this equation used for? What does each of the variables mean?
This is the equation for linear regression.
Yi= dependent variable
b0= intercept
b1= slope coefficient
Xi= independent variable
ui= error term
Will “u” or the error term be the same for each data point?
no
The error term is also referred to as the ________________.
residual
The sum of all residuals is equal to ___.
0
Simplest regression analysis is ___________________________________(____).
Ordinary Least Squares (OLS)
For OLS the dependent variable must be _____________________. (continuous or discrete)
continuous
What are the assumptions for OLS?
random sampling
normal distribution
In analyzing OLS, what is the null and alternative hypothesis? What do each of these mean in terms of a linear relationship?
null= no linear relationship
alternative= there is a linear relationship
What parameter assess how well the trend line fits the data?
R2
An R2 of 0.8 would indicate what?
the line fits the data pretty well
Unlike linear regression, multiple regression accounts for _______.
bias
What is this equation used for? What does Z and ε stand for?
This equation is used for multiple regression.
Z stands for the confounder
ε is the error term/residual
Will ε be larger or smaller than u? Why?
ε will be smaller
why? less bias, less error in multiple regression
What is the ONE EXCEPTION to when ε is NOT smaller than u?
If there are superfluous regressors ε=u
What is a superfluous regressor?
variable that doesn’t mean shit
example: how many clouds were in the sky on the day you measured the patient’s weight
By looking at _____ values for multiple and linear regression you can see the magnitude of bias/ how much bias omitting that variable caused.
b1
What is the marginal effect?
How is y effected by a small change in x
In multiple regression can we use R2? Why or Why not?
NO must use adjusted R2
this is because R2 can be effected by superfluous regressors
When is a dummy variable used?
used to include discrete(dichotomous) variables in our regression analysis
Dummy variables use the ______ coding scheme and can ONLY be _____ or ______.
Dummy variables use the a-1 coding scheme and can ONLY be 0 or 1.
Dummy variables cannot represent what?
marginal effect
In unstandardized multiple regression we cannot do what? What is the solution to this problem?
we cannot say which variable had the most impact because there are a bunch of different units (ex: you can’t compare age in years to height)
solution: standardization of units (turn variables in to z-score)
Standardized mutliple regression is ONLY for ________________, not marginal effect.
comparing variables