Statistics

  • Two types: descriptive & inferential

  • scales of measurement:

    • nominal (categorical) - categorizing birthdays by month

      • ex: Alzheimer’s or not, pleasant or unpleasant temp.

      • comparing categories

      • Chi-Square

    • ordinal (order & categorical) - first born, second born, third born

      • ex: morning, midday, afternoon, night

      • does not account for time apart like interval data

      • Wilcoxon Signed Rank

    • interval (continuous) - year 1, year 2, year 3 (+ or -)

      • does not have a meaningful zero like ratio data, so ratios cannot be made (* and /)

      • ex: BC → 0 → AD, F of C, military time

      • correlation, T-tests, ANOVA, Mann-Whitney U, Kruskal-Wallis, Welch

    • ratio (continuous) - “meaningful zero”: can’t go lower than 0 (+,-,*, or /)

      • age in days: 0 → death

      • t = 0

      • ex: number of bites, time after 0 secs

      • correlation, T-tests, ANOVA, Mann-Whitney U, Kruskal-Wallis, Welch

    • Ex: temperature: nominal (unpleasant/pleasant), ordinal (hot, middle temp, cold), interval (degrees C or F), ratio (degrees K)

  • descriptive stats: data distribution (uses interval and ratio)

    • shape - shown via histogram (and short hand of curves)

      • modality - bimodal vs unimodal

      • symmetric or asymmetric (skewed)

        • positive skew - right skewed to positive numbers

        • negative skew - left skewed to negative numbers

    • central tendency

      • mean - not good if the data is skewed (ex: housing prices in atl)

      • median

      • mode (uses histogram)

    • spread (variability)

      • uses box and whiskers plot (median) - with its outliers

        • OR =IQR*1.5

      • uses standard deviation (mean)

        • know equation meanings

          • s = sqr[sum{(x-x bar)²}/n-1]

            • sigma = sum of

            • x = individual observation

            • x bar = mean of observations

            • n = number of observations

            • n - 1 = degrees of freedom

              • how many data points are necessary to know the mean (-1)

        • standard deviation > standard error of mean

          • SE = SD/sqr of samples

        • 1 std = 68.2%; 2 std = 94.6%; 3 std = 99.8% (of data)

          • stdev - mean and stdev + mean

          • stdev is added to error bar

    • Certain stats test require a normal (gaussian) distribution

      • In neuroscience, using the log of the points creates a log-normal distribution

  • inferential stats: infers what is happening in a pop. using a small subset (sample) - uses deduction and probability (includes parametric and non stats)

    • sampling error - samples will not always give the best estimate of the larger population

    • avoids sampling bias

    • Confidence Intervals: 95% CI or 99% CI (estimates possible sampling error)

      • range of data (the smaller the interval, the better it is to take the mean)

      • the distance of the samples from the sample mean, along with the % of the samples that make up the total pop. determine the strength of the CI

        • CI’s of multiple groups should be spaced out between groups (larger effect size)

          • ex: 0—6 vs. 20—40

      • Errors:

        • sign error (type s) - error in direction

          • estimate an increase when it is a decrease or a decrease when it is an increase

        • magnitude error (type m) - error in effect size

          • estimate huge effect when it is a small or a small effect when it is large

    • Null hypothesis - hypothesis that there is no difference between groups in the experiment (seek to disprove in an experiment—deduction)

      • reject null hypothesis → statistically significant difference

      • ex: no effect from chemicals on cell death - null hypothesis

        • hypothesis is plausible if it is within the CI

      • type 1 error - rejecting the null hypothesis, even though the null hypothesis was true

        • can never be 100% certain that a type 1 error was made

      • type 2 error - accepting the null hypothesis, even though the null hypothesis was false

        • lower alpha increases chances of type II error and decreases chances of type I error

      • alpha = 0.05 (the chance that a type 1 error will occur)

        • difference due to random chance

          • harder to find significant differences with lower alpha, causes an increase in beta

            • more data causes alpha and beta to go down

        • set before the experiment

          • alpha = 0.05 → 95% CI

      • beta (the chance that a type 2 error will occur)

        • power (finding the difference) of a statistical test (1 - beta)

        • increasing sample size → increase of power (w/o changing alpha or beta)

          • decreases chances of type I and II error

          • expensive and time-consuming

      • p-value < 0.05

        • compared to alpha = statistical significance

    • Parametric:

      • assumes randomly chosen samples, independent samples, normal (gaussian) distribution (interval or ratio data), large enough sample, and homogeneity of variance (same error bar size)

        • T tests

          • two groups for mean data

          • uses the t statistic (t =)

          • if the means are further apart the t is larger and vice versa

            • reject the null hypothesis when: calculated t-value > critical t-value

          • directional hypothesis: looking to see if there is a difference in one direction or not

            • one-tailed t tests: comparing one side of the data (using alpha)

              • comparison: significantly higher/lower or no

          • two-detailed t tests: splits the alpha value, so that the statistical significance is bidirectional, but smaller

            • uses the location of the mean within or not within the alpha range

            • shows significantly higher, significantly lower, or not significant

            • t(dof) = t stat, p value, d [CI]

              • Cohen’s d statistic = lower closer, far higher (effect size)

                • 2 = 2 stdev apart

              • [CI] between the groups (range)

            • dof + # groups = total samples

              • total samples/groups = # in each group

            • ex: t(18) = 1.5; alpha = 0.05; 1.5 not > 2.101, so no significant difference & p>0.05

              • graphs would not be that different in height w/ no stats dif.

        • ANOVA

          • 3 or more groups for mean data

            • can not tell you which one is different via the ANOVA

          • exceptions: independent samples (uncorrelated model errors: data = model + errors) & normal distribution (model errors normally distributed)

          • statistic: f-ratio (between groups variability/within-groups variability)

            • between - difference between means

            • within - spread from mean (on both sides) via error bars (or curve width)

              • more spread makes f-ratio smaller (possibly smaller than table value)

            • smaller error bars = less within-groups variability → larger f-ratio; larger mean variation = greater between groups variability → larger f-ratio (if within-groups is smaller) = graph 2

          • f-ratio: F(dof for groups, dof for samples) = given value, p value (0.05), w2 = effect size

            • dof for groups = between; dof for samples = within

            • ex: F([3-1], [60-3]) = 15.91, p<0.05, w2 = 0.26

            • ex: F(2,57) = 5.61, p<0.05

              • 3 groups, 20 in each group

              • p< 0.05 significant variance

            • ex: F(3,76) = 7.96, p> 0.05

              • 4 groups, 20 in each group

              • p> 0.05 no significant variance

            • F(3,76) = 32.68, p<0.05

        • 2-Factor/Way ANOVA

          • Has 3 F-ratios:

            • Main effect of A

            • Main effect of B

            • Interaction of A and B

        • Post-Hoc Tests for ANOVA: (increases chances of type I error)

          • Fisher LSD test (1 vs. 2; 1 vs. 3)

          • Scheffe’s HSD - less power, but can make complex comparisons (1 vs. 2+3; 1+2 vs. 3)

          • Tukey Test

          • Tukey-Kramer

            • If n sizes aren’t equal

          • lowering alpha helps limit type I errors, but decreases power

            • Bonferroni correction — alpha/n

              • ex: alpha 0.05/ 5 different comparisons

              • increases chances of type II errors and decreases chances of finding significance

            • Holm-Bonferroni method or False Discovery Rate helps prevent both error types

    • Nonparametric:

      • assumes independent samples

        • Chi-Square Test - X² (nominal data)

          • X² = sum[(o - e)²/e]

            • o = observed terms

            • e = expected

          • degrees of freedom = # of columns/groups - 1 + # of rows - 1

          • if X² is large it is more likely to higher than the table value and show a significant difference (eliminating null hypothesis)

          • only used on actual numbers (not %, proportions, means, etc.)

          • X² should not be calculated if the expected value in any category is < 5 (must be > 5)

          • no stat difference if the probability is greater than the calculated X² data, reverse if X² is higher than probability

        • Mann-Whitney U Test

          • for two groups using medians

        • Kruskal-Wallis Test

          • for three or more groups using medians

          • follow up with post hoc test like Dunn’s test

        • Wilcoxon Signed Rank

          • for paired median differences, like matched sample or repeated measures

        • Welch or Brown Forsythe Test

          • for situations with heterogeneity of variance

          • Welch test has more power and lower chances of Type I error

          • Brown Forsythe test if data are also skewed

          • Follow with Games Howell post hoc test

    • Publication Bias

      • Bias for large differences between groups

        • magnified in the press

      • Bias against negative results

        • remedy: putting results in

    • pHacking

      • Types:

        • running different stats tests until a significant p is found (not mentioning the prior tests)

        • running stats, then adding more

      • This is problematic because it violates statistical assumptions

      • Correct for Multiple Comparisons:

        • expand CI for estimation and decrease alpha for testing

        • if you are going to run stats and add more samples, you can set up sequential analysis with stopping rules when Type I and II error rates are met (must be pre-planned)

    • Importance of Replication

      • publicly available data that many can analyze

      • replication can address sampling errors that are due to random chance

robot