1/87
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
ANOVA
Analysis of variance. A statistical procedure examining variations between two or more sets of interval or ratio data.
for an ANOVA test what is the numerator of the variance
the basis of the test
with each ANOVA test
our probability of making a type 1 error increases
type one error
rejecting null hypothesis when it is actually true - a false positive
type 2 error
failing to reject a false null hypothesis
TS > CV
reject null
TS < CV
fail to reject null
assumptions of ANOVA Test
the null is true
at least interval level data
the Central Limit Theorem is Satisfied
random independent samples
the variances are equal
ANOVA factor/classification/treatment
overarching title on how our data is organized
ANOVA categories
nominal level data that identifies a group
criterion variable
the actual numerical values
ANOVA data variation
within and betweem
ANOVA Data Variation : Within
due to chance, randomness and error
ANOVA Data Variation: Between
due to factor/classification/treatment
ANOVA: Distance from x to x double bar is
sum of squares total or total variation
ANOVA: distance from x to x bar is
sum of squares between or variation due to factor
ANOVA: distance from x bar to x double bar
sum of the squares within or variation due to chance
Underlying theory of ANOVA Test
total variation can be portioned into 2 parts- within and between, and those two components can be compared to determine which is affecting the data at a greater degree
ANOVA formula for test stat
Between Term/ Within Term
ANOVA: large test stat means
more likely to reject the null because the factor is affecting the data
ANOVA: small test stat means
more likely to fail to reject because the variation is due more to chance than the factor
bivariate data
Data with two variables
multivariate data
More than two variables are measured on a single experimental unit.
time series
a time-ordered sequence of observations taken at regular intervals
cross section
variation across units of observation during one point of time
panel data
information collected from a group of consumers, organized into panels, over time
r
sample correlation coefficient
The value of r is always between
-1 and 1
the size of the correlation r indicates...
the strength of the linear relationship of x1 and x2
values of r close to 1 or -1 indicate
strong correlation
if r is 0
there is no correlation
if r is 1
there is a perfect positive correlation- when x1 increase, x2 increase
if r is -1
there is perfect negative correlation- when x1 increase x2 decrease
for correlation, data is compared to ______ then to ______
compared to their means then to each other
sample size needed for correlation
10 data points for x and 10 for y
relationships change
over time, outside the data, and across space
correlation leads to
causation then liability, opportunity, or beneficiary
a high r does
not always mean we need to reject the null
sample size effect
as sample size increase, r significance goes down
if r is significant and strong correlation
it is useful
if r is not significant, and weak correlation
it is not useful
if r is not significant and the correlation is weak
it is not useful
if r is significant but weak correlation
it is not useful
if r is significant and moderate correlated
it could be useful
simple linear regression
regression analysis involving one independent variable and one dependent variable in which the relationship between the variables is approximated by a straight line
error for SLR is
actual value- predicted value
total variation in y can be broken into two variables
residual term and regression term
residual term
y's relationship with the x variable
regression term
random factors not in the model- error
four concepts of slr
1. the coefficient of determination
2. isolating the slope: effect of marginal inputs on the outputs
3. over and underperforming the model (y above the line, positive e, over performing; y below the line, negative e, under performing)
4. Restricted model: some predictor variables
alpha > p value
reject the null
alpha < p value
fail to reject null
line of best fit
the regression line that best fits the observed data and minimizes the error in prediction
properties of residual error
if r = +/-1, e= 0, ordinarily, least squares regression pulls down total error
df
degrees of freedom
SS
sum of squares
MS
mean square
F
test stat
F=
MS regression/ MS residual
as the predictor variable increase
the adjusted r squared drops, so you can't just keep adding more predictor variables to drive up multiple r because r square will be penalized
if the general rule regarding sample size is not met...
adjusted r squared is a better measurement of regression strength
as shared variation between the x variable and the y variable increases
r approaches it's upper and lower limit respectfully
significance f =
p-value
dummy variable
A variable for which all cases falling into a specific category assume the value of 1, and all cases not falling into that category assume a value of 0.
collinearity check
if r is greater than or equal to .6, be concerned
if r is greater than or equal to ,8, there is a collinearity error
for a multivariate regression
R is always positive, does not suggest the direction of the relationship, R is greater than or equal to any single X-Y relationship, R is a single value representing the strength of a simultaneous relationship between the x variables and the y variables
Multiple Regression population notation
β0
Multiple Regression partial correlation coefficient sample notation
β1
Collinearity
when 2 or more x variables are highly correlated with each other
Y hat
predicted value
y bar
chance model
DFregression
k-1
DFresidual
nt-k
DFtotal
nt-1
high MS regression/ low MS residual
F is high, probs gonna reject null
low ms regression/ high ms residual
low f, prob gonna fail to reject
ANOVA table tells us
whether or not the model is a good one- reject= a good model, FTR = a bad model
regression stats tell us
multiple r which tells us the level of correlation between the x variable and y
coefficient table
tell us the significance of each component- FTR = the x variable is not a good predictor, reject= the x value is a good predictor
the ANOVA table is a ____ question
stat question
the coefficient table is a _____ question
stat question
as collinearity decreases, there is an increase in
each predictor variable's unique position of the variability within the y variable
y below the line
negative e, under performing
y above the line
positive e, over performing
when p is low
reject Ho
regression stat table is a _____ question
judgement
when n goes up
ms residual goes down
when ms residual goes down
the test stat goes up