Biostats test 2

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/94

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

95 Terms

New cards

properties of a normal distribution

fully described by its mean and standard deviation
symmetric around its mean
mean=median=mode
2/3 of random draws are within one SD of the mean
~95% of random draws are within 2 SD of the mean

New cards

standard normal distribution

mean is zero
standard deviation is 1

New cards

standard normal table

gives the probability of getting a random draw from a standard normal distribution greater than a given value

New cards

standard normal is symmetric so…

Pr[Z>x] = Pr[Z<-x]

Pr[Z<x]=1-Pr[Z>x]

New cards

what about other normal distributions

all normal distributions are shaped alike, just with different means and variances
any normal distribution can be converted to a standard normal distribution by Z=Y-μ/σ

New cards

What does Z tell us

how many standard deviations Y is from the mean

New cards

sample means are normally distributed

the mean of the sample means is μ
the standard deviation of the sample means is SD/square root of number of samples

New cards

standard error

standard deviation of the distribution of sample means

= s/square root of n

New cards

central limit theorem

the sum or mean of a large number of measurements randomly sampled from any population is approximately normally distributed

New cards

inference about means

because y bar is normally distributed, we can convert its distribution to a standard normal distribution

this gives a probability distribution of the difference between a sample mean and the population mean

New cards

what can s be used for

an estimation of SD

New cards

student’s t test

good approximation to the standard normal, has a t distribution

New cards

degrees of freedom for t-test

n-1

New cards

what can we use the t-distribution for

calculate confidence interval of the mean

New cards

one-sample t-test

compares the mean of a random sample from a normal population with the population mean proposed in a null hypothesis

New cards

test statistic for one sample t-test

y-bar - mean proposed by the null divided by s/square root of n

New cards

one sample t-test assumptions

variable is normally distributed
the sample is a random sample

New cards

comparing two means

tests with one categorical and one numeric variable
goal: to compare the mean of a numerical variable for different groups

New cards

paired design examples

before and after treatment
upstream and downstream of a power plant
identical twins: one with a treatment and one without
earwigs in each ear: how to get them out? compare tweezers to hot oil

New cards

paired t-test

compares the mean of the differences to a value given in the null hypothesis
for each pair, calculate the difference. the paired t-test is simply a one-sample t-test on the differences

New cards

degrees of freedom for paired t-test

number of pairs-1

New cards

assumptions of paired t-test

pairs are chosen at random
differences have a normal distribution

New cards

2 sample t-test

compares the means of a numerical variable between two populations

New cards

assumptions of two-sample t-test

both samples are random samples
both populations have normal distributions
the variances of both populations is equal

New cards

Welch’s t-test

compares the means of two normally distributed populations that have unequal variances

New cards

how to compare variance between groups

the f-test

New cards

f-test f

two different degrees of freedom, one for the numerator and one for the denominator
very sensitive to assumption that both distributions are normal

New cards

levene’s test

more robust test to compare variances (between 2 or more groups)

New cards

how to detect deviations from normality

previous data/theory
histograms
quantile plots
shapiro-wilk test

New cards

shapiro-wilk test

used to test statistically whether a set of data comes from a normal distribution

New cards

what to do when assumptions aren’t true

transformations
non-parametric tests
randomization and resampling

New cards

the normal approximation

means of large samples are normally distributed
the parametric tests on large samples work relatively well, even for non-normal data
rule of thumb, if n>~50, the normal approximations may work

New cards

parametric tests - unequal variance

welch’s t-test would work
if sample sizes are equal and large, then even a ten-fold difference in variance is approximately acceptable

New cards

data transformations

changes each data point by some simple mathematical formula

New cards

log-transformation

y = ln[y]

New cards

when is the log transformation useful

the variable is likely to be the result of multiplication of various components
the frequency distribution of the data is skewed to the right
the variance seems to increase as the mean gets larger (in comparisons across groups)

New cards

other transformations

arcsine, square-root, square, reciprocal, antilog

New cards

valid transformations

require the same transformation be applied to each individual
have one-to-one correspondence to original values
have a monotonic relationship with the original values

New cards

choosing transformations

must transform each individual in the same way
you CAN try different transformations until you find one that makes that makes the data fit the assumptions
you CANNOT keep trying transformations until P<0.05

New cards

non-parametric methods

assume less about the underlying distributions
also called “distribution-free'“
“parametric” methods assume a distribution or a parameter

New cards

non-parametric test

sign test

compares data from one sample to a constant
simple: for each data point, record whether individual is above (+) or below (-) the hypothesized constant
use a binomial test to compare result to 1/2

New cards

the sign test has very low power

it is quite likely to not reject a false null hypothesis

New cards

most non-parametric methods use ranked order of data points

rank each data point in all samples from lowest to highest
lowest data point gets rank 1, next lowest rank gets 2

New cards

mann-whitney U test

compares the central tendencies of two groups using ranks
non-parametric method

New cards

Performing a mann-whitney U test

rank all individuals from both groups together in order
sum the ranks for all individuals in each group —> R1 and R2

New cards

assumptions of mann-whitney U test

both samples are random samples
both populations have the same shape of distribution

New cards

permutation tests

also known as randomization tests
used for hypothesis testing on measures of association
mixes the real data randomly
variable 1 from an individual is paired with variable 2 data from a randomly chosen individual. this is done for all individuals
the estimate is made on the randomized data
this is repeated numerous times

New cards

without replacement

permutation tests are done without replacement
all data points are used exactly once in each permuted data set

New cards

goals of experiments

eliminate bias
reduce sampling error (increase precision and power)

New cards

what is the question

what kind of data do you need?
how much time/space/money/other resources do you have?

New cards

factor

the independent or experimental variable

New cards

level

one version of the experimental variable

New cards

treatment

the total experimental manipulation applied to a “unit” or “sample”

New cards

features that reduce bias

controls, random assignment to treatments, blinding

New cards

controls

a group which is identical to the experimental treatment in all respects aside from the treatment itself
establish a baseline
compare to the status quo
placebo-procedural control

New cards

example of placebo

some illnesses, e.g. pain and depression, respond to fact of treatment, even with no pharmaceutically active ingredients
control: “sugar pills”

New cards

independent recovery

patients tend to seek treatment when they feel very bad
as a result, they often visit the doctor when they are at their worst. improvement may be inevitable, even without treatment

New cards

random assignment averages out the effects of confounding variables

allocation of treatments at random to avoid unknown bias
use a random number table, coin flip, deck of cards, etc.

New cards

blinding

preventing knowledge of experimenter (or patient) of which treatment is given to whom
unblinded studies usually find much larger effects (sometimes threefold higher), showing bias that results from lack of blinding

New cards

error and variation

experimental error
- natural differences in experimental units
- variation in measurement
- environmental conditions
variance of experimental error is used to conduct statistical comparisons

New cards

replication

carry out study on multiple independent objects

New cards

balance

nearly equal sample sizes in each treatment

New cards

blocking

grouping of experimental unit; within each group, different experimental treatments are applied to different units

New cards

extreme treatments

stronger treatments can increase the signal-to-noise ratio

New cards

blocking

controls for known bias or variation
- age
- sex
- weight
- nutrient level
- size
- location

New cards

replication

used to minimize unknown bias or error
indication of variation of results

New cards

experimental unit

in field biology, known as “plot”
physical entity to which a treatment is randomly assigned or a subject that is randomly selected from a treatment population
avoid pseudoreplication

New cards

analysis of variance (ANOVA)

like a t-test, but can compare more than two groups
asks whether any of two or more mean as is different from any other
in other words, is the variance among groups greater than 0?

New cards

ANOVA assumptions

all samples are random samples
all populations are normally distributed
the variance for all groups are equal

New cards

kruskal-wallis test

non-parametric alternative to ANOVA
uses the ranks of the data points

New cards

correlation:r

describes the relationship between two numerical variables

New cards

correlation assumes…

random sample
X is normally distributed with equal variance for all values of Y
Y is normally distributed with equal variance for all values of X

New cards

regression

predicts Y from X
linear regression assumes that the relationship between X and Y can be described by a line

New cards

regression assumes…

random sample
Y is normally distributed with equal variance for all values of X

New cards

multiple-factor ANOVA = MANOVA

a factor is a categorical variable
ANOVAs can be generalized to look at more than one categorical variable at a time
not only can we ask whether each categorical variable affects a numerical variable, but also do they interact in affecting the numerical variable

New cards

fixed effects

treatments are chosen by experimenter, they are not a random subset of all possible treatments

New cards

random effects

the treatments are a random sample from all possible treatments

New cards

method for multiple comparisons

tukey-kramer test

New cards

tukey-kramer test

done after finding variation among groups with single-factor ANOVA
compares all group means to all other group means

New cards

why not use a series of two-sample t-tests

multiple comparisons would cause the t-tests to reject too many true null hypotheses
tukey-kramer adjusts for the number of tests
tukey-kramer also uses info about the variance within groups from all the data, so it has more power than a t-test with a bonferroni correction

New cards

estimate correlation coefficient

sum of products/sum of squares = r

New cards

spearmen’s rank correlation

alternative to correlation that does not make so many assumptions

New cards

attenuation

the estimated correlation will be lower if X or Y are estimated with error

New cards

parameter of linear regression

Y = α + β X

New cards

estimating a regression line

Y=a+bX

New cards

best estimate of the slope

=”Sum of cross products” over “sum of squares of X”

New cards

coefficient of determination

r², square of the correlation coefficient r

New cards

non-linear relationships

transformations, quadratic regression, splines

New cards

AIC vs. inferential statistics

power of p-value

multiple models as alternative hypotheses

statistically significant versus biologically signifiant

New cards

AIC

estimate relative fit of a set of competing statistical models → model selection
model fit is never exact, some fit better than others
AIC balances goodness of fit with number of parameters in the model (more parameterized models are penalized)

New cards

AIC calculations

models get scored
- AIC = 2k-sln(L-hat)
- k=number of parameters
- L-hat=max values of the liklihood function
measures how much information (fit) is gaines/lost by adding a predictor (parameter)

New cards

relative quality of the model

AIC score described the relative quality of the model
changes with changes in the model set
how to choose the model set?
how to make an inference?

New cards

summary of AIC approach

not leaning on p-values
multiple models as alternative hypothesis
comparing relative model fit
strength of evidence approach
information theory statistics