biostats - unit 2

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/64

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

65 Terms

1
New cards

inferential statistics

determine how close our sample estimates are to our population parameters

2
New cards

null hypothesis

a specific statement about a population made for the sake of argument

  • typically says there’s no difference between the true value of a population and hypothesized value or no difference between 2+ samples drawn from a population

3
New cards

alternative hypothesis

states that our parameter estimate is different from null hypothesis

  • hypothesis that parameters are influenced by non-random cause

4
New cards

two-tailed test

no direction of alternative hypothesis, either outcome is tested/possible

5
New cards

one-tailed test

direction of hypothesis is predicted (need good reason for one-tailed test)

6
New cards

binomial exact test

use to determine if observed proportion of successes in a sample differs significantly from hypothesized population proportion

7
New cards

test-statistic

single, standardized measure of a sample distribution

  • allows us to evaluate how comparable our sample is to what we would expect under the null hypothesis

8
New cards

null distribution

sampling distribution of the test statistic if the null hypothesis were true

9
New cards

why is null distribution

tells you whether something interesting is going on or not!!

key to understanding all statistical tests and p-values

  • if test-statistic falls in the middle of null distribution, you can assume nothing special is happening (fail to reject null hypothesis)

10
New cards

p-value

probability of obtaining test-statistic or more extreme values if the null hypothesis is true

11
New cards

interpretation if p-value is 0.031

3.1% of the time, if p = 0.5 in a population of toads, we would expect to obtain 14 or more righties or lefties in a sample of 18 toads —> therefore, our sample seems unlikely if Ho is true

12
New cards

significance level

probability used as the threshold for rejecting the null hypothesis

  • if p-value is =< threshold value, we find support for Ha and can say result is “statistically significant” —> reject Ho

13
New cards

critical value

value of a test-statistic required to achieve P =< threshold value

14
New cards

type I error

mistakenly rejecting a true null, false positive

  • occurs with probability threshold value

15
New cards

why can’t we just lower our threshold value to prevent type I errors?

almost no chance any sample could lead to rejection of null and would cause us to routinely fail to reject false null hypotheses

16
New cards

type II error

false negative, mistakenly accepting a false null

  • keeps us from setting threshold value too low

    • depends on sample size, effect size, and precision —> more difficult to estimate

17
New cards

ecological/biological hypothesis

what is the mechanism we think underlies the difference we expect to see

18
New cards

exploratory analysis

goal is to find the story of the data

19
New cards

explanatory analysis

goal is to share the story of the data

20
New cards

goal of two variable comparisons

visualize the association between variables

21
New cards

numeric vs numeric

scatterplot

22
New cards

categorical vs numeric

mean and error bars or boxplots

23
New cards

contingency table

frequency of occurence of all combinations of 2 or more variables

<p>frequency of occurence of all combinations of 2 or more variables </p>
24
New cards

categorical vs categorical

  • contingency table

  • grouped bar graph

  • mosaic plot

25
New cards

correlation

correlation coefficient (r ) measures the strength and direction of the association between 2 numerical variables

  • no implications of causality (no explanatory or response variables)

26
New cards

regression

measures the functional/causal relationship between two variables

  • gives us fit line (incorporates correlation and slope)

27
New cards

regression - big slope but small correlation

when x changes, y changes a lot BUT pattern isn’t reliable (very noisy)

28
New cards

finding line of best fit for regression

from sum of squares (think: variance)

29
New cards

regression - confidence bands

measure precision of the predicted Y for each value of x

<p>measure precision of the predicted Y for each value of x</p>
30
New cards

regression - prediction intervals

measures the precision of a single predicted y-value for each X

<p>measures the precision of a single predicted y-value for each X </p>
31
New cards

what we report for regression

r², F-statistic, and p-value

32
New cards

t-tests and ANOVAs

both compare difference between groups, so predictor is categorical and each category is a different group

33
New cards

t-test

is our sample mean more than two 1.96 SE away from null distribution parameter mean? if so, we would only expect to see that data 5% of the time

34
New cards

ANOVA

is the amount of variability in each group greater than the amount of variability between the group means?

35
New cards

F-distribution

null distribution of F-statistic, skewed right, all non-negative values (because variances can’t be negative)

36
New cards

ANOVA and F-distribution

if F-statistic from ANOVA falls in tail of F-distribution - reject null hypothesis

  • f-statistic comes from comparing among to within group variances to detect differences in group means

    • F > 1 = more difference between treatment groups than variation within groups

    • F = 1 means equal among and within group variance

37
New cards

what we report for t-test

t-statistic, p-value, and df

38
New cards

what we report for ANOVA

F-statistic, df, and p-value

39
New cards

when to use t-test or ANOVA

categorical predictor and numerical response

40
New cards

when to use x²

categorical predictor and categorical response

41
New cards

goal of x²

what is the observed ratio of members of each group? does it differ from expected ratio?

are the two categorical variables independent of each other?

42
New cards

x² distribution

frequency distribution of x² under a null hypothesis predicting expected categorical counts

  • not symmetric and right skewed

  • all non-negative values

43
New cards

what we report for x²

x², df, and p-value

44
New cards

parametric assumptions

  • random, independent sampling and sufficient sample size

  • normality

  • homoscedasticity

  • no outliers

45
New cards

requirements for x² test

all expected values greater than 1 and 80% greater than 5

46
New cards

testing for normality

visualization of residuals (histogram) or shapiro-wilk test

47
New cards

shapiro-wilk

p-value has to be greater than 0.05 to indicate that the data is normal

48
New cards

testing for homoscedasticity

look at actual values of raw data, look at residuals vs fitted plots (dotted line should be approximately horizontal)

  • use levene’s test

49
New cards

homoscedasticity

one group should not be much more variable than the others

50
New cards

power of log transformations

helps with right skew, outliers, and heteroscedasticity

51
New cards

why are transformations like log permitted

still preserves the relationship in the data, it just rescales it to make things behave better

52
New cards

non parametric tests

  • less powerful so you’re more likely to miss a real difference

  • much harder to interpret biologically

53
New cards

why it’s important to start with a thorough visualization of data

if you start out by only testing one hypothesis, you could get tunnel vision and miss finding something really important

  • data visualization opens your mind so you don’t miss important patterns, weird shapes, outliers, etc.

54
New cards

QQ plot

visualize normality

55
New cards

residuals vs fitted values

visualize heteroscedasticity

56
New cards

Post-ANOVA analysis for fixed effects

planned comparisons and unplanned comparisons

57
New cards

syntax for t-test

t.test(y ~ x, data = dataName) OR t.test(Data$Var1, Data$Var2, paired = TRUE)

58
New cards

calculating expected for chi squared test

chisq.test(fish_table)$expected

59
New cards

syntax for chi squared test

chisq.test(fish_table, correct = FALSE)

60
New cards

planned comparisons

focus in on a few scientifically sensible comparisons. You can't decide which comparisons to do after looking at the data. The choice must be based on the scientific questions you are asking, and be chosen when you design the experiment

61
New cards

unplanned comparisons

post hoc tests

testing differences between all pairs of group means while providing protection against rising Type I errors that would result from multiple comparisons

62
New cards

Q-Q plot

plots theoretical distribution values against residuals

63
New cards

scale-location homoscedasticity

checks the assumption of equal variances

64
New cards

histogram of residuals

checks the assumption of normality looking at results from sum-of-squares line

65
New cards

linearity

checks the relationship between fitted values and residuals