Looks like no one added any tags here yet for you.
normal quantile plots
straight line if perfectly normal
shape changes due to skew or different amount of probability in tails than a perfect normal
leptokurtic
too pointed
platykurtic
too flat
left skew
tail on left
right skew
tail on right
rule of thumb of normality
if absolute value of kurtosis/skew statistic > 2 X SE of the statistics
shapiro-wilks test
use when n < 50 (smaller sample sizes)
kolmogorov-smirnov test
use when n>50 (larger sample sizes)
low p value means
reject null hypothesis
what should we do when assumptions are strongly violated
evaluate outliers
transform the data to better approximate normality
use non-parametric tests
evaluating outliers
(is there legitimate reason to remove outlier? if noā¦ do you get diff statistical test result with and without outlier. If outlier is only thing causing a result it should be removed)
common transformations
log10 and ln (similar effect)
square root
square
inverse
fixing positive skew
if slightly skewed, use square root transformation
if moderately skewed use log transformation
if extremely skewed use inverse
problem with square root transformation
negative numbersā¦ add a constant to make all values greater than 0
numbers 0-1 increase while numbers >1 decreaseā¦ add a constant to make all values greater than
problem with log transformation
negative numbers and values between 0 and 1ā¦ add constant to make all values greater than 1
fixing negative skew
first try square transformationā¦ if that failsā¦ 2 step transformation
square transformation caveat
all values need to be same sign
2 step transformation
negative skew can first be reflected to be positively skewed and then use the previous transformations
Multiply by -1 then add constant to bring all values above 1
what if your groups are skewed in diff directions?
canāt use different transformation on each group
back transforming data
sqaure root ā> square
log 10 ā> anti log10
ln ā> ex
square ā> square root
1/Y ā> *Y
what parameters should you back transform
mean
standard error/95% confidence intervals
what does back transforming results look like
narrower confidence intervals and slightly different means
when transformations canāt fix itā¦
consider non-parametric tests
non-parametric
fewer assumptions about shape and spread of data
lower statistical power
2 sample t test non-parametric test
mann-whitney U test
mann-whitney U Test
converts data into ranks
tests for difference b/w medians
mann whitney u test assumption
similar shape and variance
is mann whitney u test good?
fairly powerful test at large sample sizes
paired and 1 sample t test non-parametric test
wilcoxon signed rank test or sign test
wilcoxon signed rank test
tests difference b/w sample median and hypothesizes median
turns difference data into ranks
wilcoxon signed rank test assumptions
data are symmetric around mean
sign test
tests difference b/w sample median and hypothesized median
turns differences into +1 and -1
sign test assumptions
non other than unbiased/random sample
is the sign test good?
low statistical power. throws out a lot of info
type 1 errors
incorrectly rejecting null hypothesis
p-value
is an estimate of your likelihood of committing a type 1 error
warnings of type 1 error
when assumptions of a statistical test are not met, the likelihood of a type 1 error may be greater or less than reported by the p-value
but still robust so even if modest deviation from assumptions, test statistic wonāt change greatly
type 2 errors
fail to reject the null hypothesis even though the null hypothesis is false
difficult to quantify
why are type 2 errors hard to quantify
caused by small sample size or high variance
statistical power
your ability to reject a null hypothesis when there is a difference
statistical power of mann whitney u test and wilcoxon signed rank test when parametric assumptions are met
can be ~95% as powerful at large sample sizes
much less powerful at small sample sizes
statistical power when parametric assumptions are met for sign test
only 64% as powerful at large sample sizes
MUCH less powerful at small sample sizes
assumptions for 2 sample t test
independent
random
normal
equal variances
similar sample sizes