Biostats test 2

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/94

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

95 Terms

1
New cards

properties of a normal distribution

  • fully described by its mean and standard deviation

  • symmetric around its mean

  • mean=median=mode

  • 2/3 of random draws are within one SD of the mean

  • ~95% of random draws are within 2 SD of the mean

2
New cards

standard normal distribution

  • mean is zero

  • standard deviation is 1

3
New cards

standard normal table

gives the probability of getting a random draw from a standard normal distribution greater than a given value

4
New cards

standard normal is symmetric so…

Pr[Z>x] = Pr[Z<-x]

Pr[Z<x]=1-Pr[Z>x]

5
New cards

what about other normal distributions

  • all normal distributions are shaped alike, just with different means and variances

  • any normal distribution can be converted to a standard normal distribution by Z=Y-μ/σ

6
New cards

What does Z tell us

how many standard deviations Y is from the mean

7
New cards

sample means are normally distributed

  • the mean of the sample means is μ

  • the standard deviation of the sample means is SD/square root of number of samples

8
New cards

standard error

standard deviation of the distribution of sample means

= s/square root of n

9
New cards

central limit theorem

the sum or mean of a large number of measurements randomly sampled from any population is approximately normally distributed

10
New cards

inference about means

because y bar is normally distributed, we can convert its distribution to a standard normal distribution

  • this gives a probability distribution of the difference between a sample mean and the population mean

11
New cards

what can s be used for

an estimation of SD

12
New cards

student’s t test

good approximation to the standard normal, has a t distribution

13
New cards

degrees of freedom for t-test

n-1

14
New cards

what can we use the t-distribution for

calculate confidence interval of the mean

15
New cards

one-sample t-test

compares the mean of a random sample from a normal population with the population mean proposed in a null hypothesis

16
New cards

test statistic for one sample t-test

y-bar - mean proposed by the null divided by s/square root of n

17
New cards
18
New cards

one sample t-test assumptions

  • variable is normally distributed

  • the sample is a random sample

19
New cards

comparing two means

  • tests with one categorical and one numeric variable

  • goal: to compare the mean of a numerical variable for different groups

20
New cards

paired design examples

  • before and after treatment

  • upstream and downstream of a power plant

  • identical twins: one with a treatment and one without

  • earwigs in each ear: how to get them out? compare tweezers to hot oil

21
New cards

paired t-test

  • compares the mean of the differences to a value given in the null hypothesis

  • for each pair, calculate the difference. the paired t-test is simply a one-sample t-test on the differences

22
New cards

degrees of freedom for paired t-test

number of pairs-1

23
New cards

assumptions of paired t-test

  • pairs are chosen at random

  • differences have a normal distribution

24
New cards

2 sample t-test

compares the means of a numerical variable between two populations

25
New cards

assumptions of two-sample t-test

  • both samples are random samples

  • both populations have normal distributions

  • the variances of both populations is equal

26
New cards

Welch’s t-test

compares the means of two normally distributed populations that have unequal variances

27
New cards

how to compare variance between groups

the f-test

28
New cards

f-test f

  • two different degrees of freedom, one for the numerator and one for the denominator

  • very sensitive to assumption that both distributions are normal

29
New cards

levene’s test

more robust test to compare variances (between 2 or more groups)

30
New cards

how to detect deviations from normality

  • previous data/theory

  • histograms

  • quantile plots

  • shapiro-wilk test

31
New cards

shapiro-wilk test

used to test statistically whether a set of data comes from a normal distribution

32
New cards

what to do when assumptions aren’t true

  • transformations

  • non-parametric tests

  • randomization and resampling

33
New cards

the normal approximation

  • means of large samples are normally distributed

  • the parametric tests on large samples work relatively well, even for non-normal data

  • rule of thumb, if n>~50, the normal approximations may work

34
New cards

parametric tests - unequal variance

  • welch’s t-test would work

  • if sample sizes are equal and large, then even a ten-fold difference in variance is approximately acceptable

35
New cards

data transformations

changes each data point by some simple mathematical formula

36
New cards

log-transformation

y = ln[y]

37
New cards

when is the log transformation useful

  • the variable is likely to be the result of multiplication of various components

  • the frequency distribution of the data is skewed to the right

  • the variance seems to increase as the mean gets larger (in comparisons across groups)

38
New cards

other transformations

arcsine, square-root, square, reciprocal, antilog

39
New cards

valid transformations

  • require the same transformation be applied to each individual

  • have one-to-one correspondence to original values

  • have a monotonic relationship with the original values

40
New cards

choosing transformations

  • must transform each individual in the same way

  • you CAN try different transformations until you find one that makes that makes the data fit the assumptions

  • you CANNOT keep trying transformations until P<0.05

41
New cards

non-parametric methods

  • assume less about the underlying distributions

  • also called “distribution-free'“

  • “parametric” methods assume a distribution or a parameter

42
New cards

non-parametric test

sign test

  • compares data from one sample to a constant

  • simple: for each data point, record whether individual is above (+) or below (-) the hypothesized constant

  • use a binomial test to compare result to 1/2

43
New cards

the sign test has very low power

it is quite likely to not reject a false null hypothesis

44
New cards

most non-parametric methods use ranked order of data points

  • rank each data point in all samples from lowest to highest

  • lowest data point gets rank 1, next lowest rank gets 2

45
New cards

mann-whitney U test

  • compares the central tendencies of two groups using ranks

  • non-parametric method

46
New cards

Performing a mann-whitney U test

  • rank all individuals from both groups together in order

  • sum the ranks for all individuals in each group —> R1 and R2

47
New cards

assumptions of mann-whitney U test

  • both samples are random samples

  • both populations have the same shape of distribution

48
New cards

permutation tests

  • also known as randomization tests

  • used for hypothesis testing on measures of association

  • mixes the real data randomly

  • variable 1 from an individual is paired with variable 2 data from a randomly chosen individual. this is done for all individuals

  • the estimate is made on the randomized data

  • this is repeated numerous times

49
New cards

without replacement

  • permutation tests are done without replacement

  • all data points are used exactly once in each permuted data set

50
New cards

goals of experiments

  • eliminate bias

  • reduce sampling error (increase precision and power)

51
New cards

what is the question

  • what kind of data do you need?

  • how much time/space/money/other resources do you have?

52
New cards

factor

the independent or experimental variable

53
New cards

level

one version of the experimental variable

54
New cards

treatment

the total experimental manipulation applied to a “unit” or “sample”

55
New cards

features that reduce bias

controls, random assignment to treatments, blinding

56
New cards

controls

  • a group which is identical to the experimental treatment in all respects aside from the treatment itself

  • establish a baseline

  • compare to the status quo

  • placebo-procedural control

57
New cards

example of placebo

  • some illnesses, e.g. pain and depression, respond to fact of treatment, even with no pharmaceutically active ingredients

  • control: “sugar pills”

58
New cards

independent recovery

  • patients tend to seek treatment when they feel very bad

  • as a result, they often visit the doctor when they are at their worst. improvement may be inevitable, even without treatment

59
New cards

random assignment averages out the effects of confounding variables

  • allocation of treatments at random to avoid unknown bias

  • use a random number table, coin flip, deck of cards, etc.

60
New cards

blinding

  • preventing knowledge of experimenter (or patient) of which treatment is given to whom

  • unblinded studies usually find much larger effects (sometimes threefold higher), showing bias that results from lack of blinding

61
New cards

error and variation

  • experimental error

    • natural differences in experimental units

    • variation in measurement

    • environmental conditions

  • variance of experimental error is used to conduct statistical comparisons

62
New cards

replication

carry out study on multiple independent objects

63
New cards

balance

nearly equal sample sizes in each treatment

64
New cards

blocking

grouping of experimental unit; within each group, different experimental treatments are applied to different units

65
New cards

extreme treatments

stronger treatments can increase the signal-to-noise ratio

66
New cards

blocking

  • controls for known bias or variation

    • age

    • sex

    • weight

    • nutrient level

    • size

    • location

67
New cards

replication

  • used to minimize unknown bias or error

  • indication of variation of results

68
New cards

experimental unit

  • in field biology, known as “plot”

  • physical entity to which a treatment is randomly assigned or a subject that is randomly selected from a treatment population

  • avoid pseudoreplication

69
New cards
70
New cards

analysis of variance (ANOVA)

  • like a t-test, but can compare more than two groups

  • asks whether any of two or more mean as is different from any other

  • in other words, is the variance among groups greater than 0?

71
New cards

ANOVA assumptions

  • all samples are random samples

  • all populations are normally distributed

  • the variance for all groups are equal

72
New cards

kruskal-wallis test

  • non-parametric alternative to ANOVA

  • uses the ranks of the data points

73
New cards

correlation:r

describes the relationship between two numerical variables

74
New cards

correlation assumes…

  • random sample

  • X is normally distributed with equal variance for all values of Y

  • Y is normally distributed with equal variance for all values of X

75
New cards

regression

  • predicts Y from X

  • linear regression assumes that the relationship between X and Y can be described by a line

76
New cards

regression assumes…

  • random sample

  • Y is normally distributed with equal variance for all values of X

77
New cards

multiple-factor ANOVA = MANOVA

  • a factor is a categorical variable

  • ANOVAs can be generalized to look at more than one categorical variable at a time

  • not only can we ask whether each categorical variable affects a numerical variable, but also do they interact in affecting the numerical variable

78
New cards

fixed effects

treatments are chosen by experimenter, they are not a random subset of all possible treatments

79
New cards

random effects

the treatments are a random sample from all possible treatments

80
New cards

method for multiple comparisons

tukey-kramer test

81
New cards

tukey-kramer test

  • done after finding variation among groups with single-factor ANOVA

  • compares all group means to all other group means

82
New cards

why not use a series of two-sample t-tests

  • multiple comparisons would cause the t-tests to reject too many true null hypotheses

  • tukey-kramer adjusts for the number of tests

  • tukey-kramer also uses info about the variance within groups from all the data, so it has more power than a t-test with a bonferroni correction

83
New cards

estimate correlation coefficient

sum of products/sum of squares = r

84
New cards

spearmen’s rank correlation

alternative to correlation that does not make so many assumptions

85
New cards

attenuation

the estimated correlation will be lower if X or Y are estimated with error

86
New cards

parameter of linear regression

Y = α + β X

87
New cards

estimating a regression line

Y=a+bX

88
New cards

best estimate of the slope

=”Sum of cross products” over “sum of squares of X”

89
New cards

coefficient of determination

r², square of the correlation coefficient r

90
New cards

non-linear relationships

transformations, quadratic regression, splines

91
New cards

AIC vs. inferential statistics

power of p-value

multiple models as alternative hypotheses

statistically significant versus biologically signifiant

92
New cards

AIC

  • estimate relative fit of a set of competing statistical models → model selection

  • model fit is never exact, some fit better than others

  • AIC balances goodness of fit with number of parameters in the model (more parameterized models are penalized)

93
New cards

AIC calculations

  • models get scored

    • AIC = 2k-sln(L-hat)

    • k=number of parameters

    • L-hat=max values of the liklihood function

  • measures how much information (fit) is gaines/lost by adding a predictor (parameter)

94
New cards

relative quality of the model

  • AIC score described the relative quality of the model

  • changes with changes in the model set

  • how to choose the model set?

  • how to make an inference?

95
New cards

summary of AIC approach

  • not leaning on p-values

  • multiple models as alternative hypothesis

  • comparing relative model fit

  • strength of evidence approach

  • information theory statistics