Looks like no one added any tags here yet for you.
parameter
quantity describing entire population
estimate
inferring unknown parameter from sample data
3 major goals of stats
estimate characteristics of pops
objectively answer scientific questions
describe degrees of uncertainty in scientific findings
variable
charcateristic measured on individual drawn from population
properties of good sample
ind
random
sufficiently large
random sample
each member of pop has equal and ind chance of being selected
sample is representative of the population
effects of sample size increasing
SD will become more accurate but will not systematically directionally change
SE will become smaller/narrower
1 numerical graph
histogram
2 numerical graph
scatter plot
1 categorical graph
bar graph
2 categorical graph
grouped bar graph
1 numerical 1 categorical graph
multiple histograms, dot/box plot
when is mode useful
in voting/surveys
when to use mean or median as measure of average
usually use mean but if outliers exist, use median
variance
average of the squared differences from the mean
SD
square root of variance
measure of inherent variability among individuals
coefficient of variation
SD/mean x 100
SE
variability between samples
most dangerous eq
SE
if sample size is small, SE appears more significant than it actually is. Small sample sizes can produce extreme values that can be misinterpreted
confidence interval rule of thumb
mean ± 2SE prpvides rough estimate of 95% CI
what does 95% CI mean
we are 95% confident that the true population mean lies within the 95% confidence interval
what happens to confidence interval as sample size inc
gets narrower
pseudoreplication
error that occurs when samples that are not independent are treated as though they are
characteristics of normal dist
symmetric around mean
about 2/3 of random samples are within 1 SD of the mean
about 95% of random samples are within 2 SD of the mean
mean=median=mode
bell shaped
standard normal distribution characteristics
mean=0
SD=1
standard normal deviate
CLT
in a large sample, the mean of samples approaches a normal distribution regardless of if the population’s distribution is normal or not
how is t distribution different than z distribution
probailities based on a sample so need to account for greater uncertainty
confidence intervals wider and more probaility in the tails b/c only have estimates of mean and SD
DF
number of observations = # of parameters
1 sample t text
compares mean of random sample w population mean
1 sample t test assumptions
random sample
independent measurements
varibale is normally distributed
how is 1 sample t test robust
CLT
paired t test
1 sample t test on differences between pairs
why is paired t test good
allows you to account for extraneous variation; greater statistical power
paired t test assumptions
random sample
each pair of data is independent
the diff between the pairs is normally distributed
paired t test robust
CLT
paired t test DF
#pairs - 1
2 sample t test
compares means of 2 samples
2 sample t test assumptions
random
independent
normal distribution
equal variance
equal sample size
2 sample t test robust
performs adequately if diff in SD is 3 fold or less and sample size is moderately large and sample size is similar in both groups
unequal variance/welch test
adjusts for very unequal variances
assumptions for welch
same as 2 sample but no equal variance
why not just always use welch instead of 2 sample t test
less statistically powerful
robust definition
test performs adequately even if assumptions aren’t met exactly
Q-Q plot
straight line means perfectly normal
informal normality checks
histogram and Q-Q
leptokurtic
too pointed
platykurtic
too flat
formal tests for normality
shapiro-wilks (n<50)
kolmogorov smirnow (n>50)
steps if parametric assumptions are violated
evaluate outliers
transform data
non-parametric test
positive skew transformation
slight: square root
moderate: ln or log
extreme: inverse
transform negative skew
first try square and if that fails
reflect data so it is now pos skewed and then use the pos transformations
backtransforming
backtransform important parameters like mean and SE/ 95% CI
DONT BACKTRANSFORM SD
when transformations don’t work
use non-parametric tests
non-parametric tests have…
fewer assumptions about shape and spread of data but are less statistically pwoerful
non-parametric version of 2 sample t test
mann-whitney u test
mann-whitney u test
converts data into ranks and tests for difference between medians
mann0whitney u test assumptions
similar shape and variance
non parametric test for paired and 1 sample
wilcoxon signed rank test and sign test
wilcoxon signed rank test
test difference between sample median and hypothesized median
turned diff data into ranks
wilcoxon signed rank test assumptions
data are symmetric around the mean
sign test
tests diff between sample median and hypotheiszed median
turned differences into +1 and -1
sign test assumptions
none
low statistical power
type 1 error
incorrectly rejecting the null hypothesis
populations not diff but saying they are
p value
estimate of likelihood of committing type 1 error
type 2 error
failure to reject the null hypothesis even though it is false
populations are different but saying they arent
ANOVA
1 categorical variable (2+ groups_) and 1 numerical value
example of ANOVA`
effect of 3 drugs and a placebo on blood pressure
what does anova compare
mean of 2+ groups
null hypothesis of ANOVa
all means are the same; there is no difference
alternate hypothesis for anova
there is at least one difference
anova assumptions
same as 2 sample tt est
anova robust
same as 2 sample t test
CLT and 3 fold
anova f statistic
F= s2 between groups / s2 within groups
when F=1
groups come from same population
when F>1
groups come from diff populations
degrees of freedom for ANOVA
1: k-1 = # groups -1
2: n-k= # observations = # groups
why not just do multiple t tests instead of ANOVA
-p value no longer representative of type 1 error
large probability of erroneous results
multiple comparisons would lead to too many type 1 errors
what to do after anova shows there is a diff
tukey test
why do we need tukey test
need to see WHICH groups are diff form each other
tukey test
compares all groups and determines which pairs are different
HSD
honestly significantly difference
why is tukey test important
protects us from making false conclusions due to many comparisons
tukey test assumptions
same as 2 sample t test