Two types: descriptive & inferential
scales of measurement:
nominal (categorical) - categorizing birthdays by month
ex: Alzheimer’s or not, pleasant or unpleasant temp.
comparing categories
Chi-Square
ordinal (order & categorical) - first born, second born, third born
ex: morning, midday, afternoon, night
does not account for time apart like interval data
Wilcoxon Signed Rank
interval (continuous) - year 1, year 2, year 3 (+ or -)
does not have a meaningful zero like ratio data, so ratios cannot be made (* and /)
ex: BC → 0 → AD, F of C, military time
correlation, T-tests, ANOVA, Mann-Whitney U, Kruskal-Wallis, Welch
ratio (continuous) - “meaningful zero”: can’t go lower than 0 (+,-,*, or /)
age in days: 0 → death
t = 0
ex: number of bites, time after 0 secs
correlation, T-tests, ANOVA, Mann-Whitney U, Kruskal-Wallis, Welch
Ex: temperature: nominal (unpleasant/pleasant), ordinal (hot, middle temp, cold), interval (degrees C or F), ratio (degrees K)
descriptive stats: data distribution (uses interval and ratio)
shape - shown via histogram (and short hand of curves)
modality - bimodal vs unimodal
symmetric or asymmetric (skewed)
positive skew - right skewed to positive numbers
negative skew - left skewed to negative numbers
central tendency
mean - not good if the data is skewed (ex: housing prices in atl)
median
mode (uses histogram)
spread (variability)
uses box and whiskers plot (median) - with its outliers
OR =IQR*1.5
uses standard deviation (mean)
know equation meanings
s = sqr[sum{(x-x bar)²}/n-1]
sigma = sum of
x = individual observation
x bar = mean of observations
n = number of observations
n - 1 = degrees of freedom
how many data points are necessary to know the mean (-1)
standard deviation > standard error of mean
SE = SD/sqr of samples
1 std = 68.2%; 2 std = 94.6%; 3 std = 99.8% (of data)
stdev - mean and stdev + mean
stdev is added to error bar
Certain stats test require a normal (gaussian) distribution
In neuroscience, using the log of the points creates a log-normal distribution
inferential stats: infers what is happening in a pop. using a small subset (sample) - uses deduction and probability (includes parametric and non stats)
sampling error - samples will not always give the best estimate of the larger population
avoids sampling bias
Confidence Intervals: 95% CI or 99% CI (estimates possible sampling error)
range of data (the smaller the interval, the better it is to take the mean)
the distance of the samples from the sample mean, along with the % of the samples that make up the total pop. determine the strength of the CI
CI’s of multiple groups should be spaced out between groups (larger effect size)
ex: 0—6 vs. 20—40
Errors:
sign error (type s) - error in direction
estimate an increase when it is a decrease or a decrease when it is an increase
magnitude error (type m) - error in effect size
estimate huge effect when it is a small or a small effect when it is large
Null hypothesis - hypothesis that there is no difference between groups in the experiment (seek to disprove in an experiment—deduction)
reject null hypothesis → statistically significant difference
ex: no effect from chemicals on cell death - null hypothesis
hypothesis is plausible if it is within the CI
type 1 error - rejecting the null hypothesis, even though the null hypothesis was true
can never be 100% certain that a type 1 error was made
type 2 error - accepting the null hypothesis, even though the null hypothesis was false
lower alpha increases chances of type II error and decreases chances of type I error
alpha = 0.05 (the chance that a type 1 error will occur)
difference due to random chance
harder to find significant differences with lower alpha, causes an increase in beta
more data causes alpha and beta to go down
set before the experiment
alpha = 0.05 → 95% CI
beta (the chance that a type 2 error will occur)
power (finding the difference) of a statistical test (1 - beta)
increasing sample size → increase of power (w/o changing alpha or beta)
decreases chances of type I and II error
expensive and time-consuming
p-value < 0.05
compared to alpha = statistical significance
Parametric:
assumes randomly chosen samples, independent samples, normal (gaussian) distribution (interval or ratio data), large enough sample, and homogeneity of variance (same error bar size)
T tests
two groups for mean data
uses the t statistic (t =)
if the means are further apart the t is larger and vice versa
reject the null hypothesis when: calculated t-value > critical t-value
directional hypothesis: looking to see if there is a difference in one direction or not
one-tailed t tests: comparing one side of the data (using alpha)
comparison: significantly higher/lower or no
two-detailed t tests: splits the alpha value, so that the statistical significance is bidirectional, but smaller
uses the location of the mean within or not within the alpha range
shows significantly higher, significantly lower, or not significant
t(dof) = t stat, p value, d [CI]
Cohen’s d statistic = lower closer, far higher (effect size)
2 = 2 stdev apart
[CI] between the groups (range)
dof + # groups = total samples
total samples/groups = # in each group
ex: t(18) = 1.5; alpha = 0.05; 1.5 not > 2.101, so no significant difference & p>0.05
graphs would not be that different in height w/ no stats dif.
ANOVA
3 or more groups for mean data
can not tell you which one is different via the ANOVA
exceptions: independent samples (uncorrelated model errors: data = model + errors) & normal distribution (model errors normally distributed)
statistic: f-ratio (between groups variability/within-groups variability)
between - difference between means
within - spread from mean (on both sides) via error bars (or curve width)
more spread makes f-ratio smaller (possibly smaller than table value)
smaller error bars = less within-groups variability → larger f-ratio; larger mean variation = greater between groups variability → larger f-ratio (if within-groups is smaller) = graph 2
f-ratio: F(dof for groups, dof for samples) = given value, p value (0.05), w2 = effect size
dof for groups = between; dof for samples = within
ex: F([3-1], [60-3]) = 15.91, p<0.05, w2 = 0.26
ex: F(2,57) = 5.61, p<0.05
3 groups, 20 in each group
p< 0.05 significant variance
ex: F(3,76) = 7.96, p> 0.05
4 groups, 20 in each group
p> 0.05 no significant variance
F(3,76) = 32.68, p<0.05
2-Factor/Way ANOVA
Has 3 F-ratios:
Main effect of A
Main effect of B
Interaction of A and B
Post-Hoc Tests for ANOVA: (increases chances of type I error)
Fisher LSD test (1 vs. 2; 1 vs. 3)
Scheffe’s HSD - less power, but can make complex comparisons (1 vs. 2+3; 1+2 vs. 3)
Tukey Test
Tukey-Kramer
If n sizes aren’t equal
lowering alpha helps limit type I errors, but decreases power
Bonferroni correction — alpha/n
ex: alpha 0.05/ 5 different comparisons
increases chances of type II errors and decreases chances of finding significance
Holm-Bonferroni method or False Discovery Rate helps prevent both error types
Nonparametric:
assumes independent samples
Chi-Square Test - X² (nominal data)
X² = sum[(o - e)²/e]
o = observed terms
e = expected
degrees of freedom = # of columns/groups - 1 + # of rows - 1
if X² is large it is more likely to higher than the table value and show a significant difference (eliminating null hypothesis)
only used on actual numbers (not %, proportions, means, etc.)
X² should not be calculated if the expected value in any category is < 5 (must be > 5)
no stat difference if the probability is greater than the calculated X² data, reverse if X² is higher than probability
Mann-Whitney U Test
for two groups using medians
Kruskal-Wallis Test
for three or more groups using medians
follow up with post hoc test like Dunn’s test
Wilcoxon Signed Rank
for paired median differences, like matched sample or repeated measures
Welch or Brown Forsythe Test
for situations with heterogeneity of variance
Welch test has more power and lower chances of Type I error
Brown Forsythe test if data are also skewed
Follow with Games Howell post hoc test
Publication Bias
Bias for large differences between groups
magnified in the press
Bias against negative results
remedy: putting results in
pHacking
Types:
running different stats tests until a significant p is found (not mentioning the prior tests)
running stats, then adding more
This is problematic because it violates statistical assumptions
Correct for Multiple Comparisons:
expand CI for estimation and decrease alpha for testing
if you are going to run stats and add more samples, you can set up sequential analysis with stopping rules when Type I and II error rates are met (must be pre-planned)
Importance of Replication
publicly available data that many can analyze
replication can address sampling errors that are due to random chance