biostats review

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/84

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 12:33 PM on 5/5/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

85 Terms

1
New cards

SSM/SSE for bivariate regression

2
New cards

how to find p value for bivariate regression

pf(f value, df model, df error)

3
New cards

if R² is close to then

model explains the data well

4
New cards

R² for bivariate regression measures

the proportion of variance

SSM/SST

5
New cards

correlation r for bivariate regression

measures the direct linear relationship between X and Y

found by sqrt(R²) and then change the sign on it based off of the slope

6
New cards

σ^e is

estimated standard deviation of the residuals from the line of best fit

sqrtMSE

7
New cards

calculate z score without the sample size

observed-null/sd

8
New cards

when is a z score considered unusual

when values lie more or less than 2 standard deviations from the mean

9
New cards

what is a pooled proportion and when to use it

combines the sample means (x bar) and sample sizes (n) use it when finding SE for a normal distribution comparing a difference in proportions

10
New cards

Standard error

measures how much a sample statistic (like the mean) would vary from sample to sample.

MANY SAMPLES

11
New cards

SE gets smaller when

sample sizes are large

data is less variable

12
New cards

A standard error of 0.050 means

the difference in proportions of about +/ 0.05 is what to expect from random chance variation

13
New cards

Standard deviation (SD)

measures spread of individual data points in a sample or population.

ONE SAMPLE

14
New cards

z score is used for

normal distributions

15
New cards

what does z score tell us

How unusual is this result if the null hypothesis were true

if close to 0, that means the result is very typical under the null

16
New cards

the p value is

how likely is it to see a test statistic this extreme?

17
New cards

is bootstrap or null distributions used for SE

bootstrap! it estimates variability. it conceptualizes uncertainity in our estimate

18
New cards

what are randomization distributions also called and what are they for

null distributions and hypothesis testing

19
New cards

the middle 95% of values fall within

mean ± 2*SE

20
New cards

what type of distribution is used to get p value

randomization or null distributions

21
New cards

a randomization spread tells us

what we would expect if there were no real difference in whatever is being measured

22
New cards

what to look for when comparing bootstrap and randomization distributions

are the graphs shaped and spread similarity

bulk of data lies in the middle 95%

23
New cards

when p value is greater than the significance level

fail to reject the null hypothesis

24
New cards

when p value is smaller than the significance level

reject H0

25
New cards

when looking at residual plots you want to have

no patterns present in spacing and an even spread to the dots

26
New cards

in a residual plot if the dot is above the line it means

the model underestimated the value of the observed

27
New cards

in a residual plot if the dot is below the line it means

the model overestimated the value of the observed

28
New cards

residual is

observed value - predicted value

29
New cards

what can we learn from our residual plots

we can infer if the model is good or not

the model is not a good fit if the residual plot will have weird patterns and curves

30
New cards

what are the 4 steps to hypothesis testing

  1. state the null and alternative hypothesis

  2. calculate the test statistic

  3. find the p value

  4. draw a conclusion

31
New cards

middle line in a boxplot is the

median

32
New cards

the box length spreads from

Q1 to Q3

it represents the middle 50% of the data

33
New cards

if the whisker is longer on the right the distribution is

right-skewed

34
New cards

when multiple boxplots are side-by-side they are used to compare a

quantitative variable

35
New cards

a larger IQR means

The group is more spread out and is less consistent

36
New cards

mean compared to median

less resistant to change so goes in the direction of the skew

37
New cards

50% of the values in a boxplot fall in

the IQR

38
New cards

a curved distribution will have a boxplot with

a LARGE IQR because the middle 50% spreads across a sparse middle region

39
New cards

how would i estimate the p value given a dotplot

locate the test statistic then see how many dots fall to the more extreme

divide that number by the number of dots

40
New cards

r command for a randomization distribution

do(1000) * diffmean(response variable ~ shuffle(explanatory variable), data = YOUR DATA)

diffmean could also be diffprop

41
New cards

diffmean is used when the response variable for a randomization distribution is

quantitative

42
New cards

diffprop is used when the response variable for a randomization distribution is

categorical

43
New cards

to make a histogram of a randomization/null distribution use

gf_histogram(~ diffmean, data = YOUR RANDOMIZATION DATA)

diffmean could also be diffprop

44
New cards

How do you use a confidence interval to estimate the p-value for a hypothesis test?

look to see if the confidence interval includes 0

result is statistically significant at 0.05

45
New cards

when is the result statistically significant at a 95% CI

at 0.05 (two-sided)

46
New cards

if the confidence interval includes 0 then

The p-value is greater than 0.05

47
New cards

how is critical value found with a t distribution

qt(confidence interval, df = n-1)

48
New cards

degrees freedom for critical value computation for one mean

sample size -1

49
New cards

empirical rule

68 (1 sd), 95 (2 sd), 99.7 (3 sd)

50
New cards

how to find percentages in a normal distribution when given specific values

pnorm(#, mean, sd)

51
New cards

how to find a specific value in a normal distribution when given percentile

qnorm(percentile, mean, sd)

52
New cards

how to find p value when given SE

p^ - p0 divided by the SE

gives z score

then z score is plugged into r

pnorm(z score)

53
New cards

when to use one sided vs two sided

if the question points a certain direction then just use one sided

54
New cards

where is the randomization distribution centered?

at the value of the parameter specified in the null hypothesis

55
New cards

where is the bootstrap distribution centered?

at the observed sample statistic

56
New cards

how do i see if two events are independent?

P(A)P(B) = P(A and B)

57
New cards

when matching the boxplot with the ANOVA table look for

sum of squares residual

F

58
New cards

what does the sum of squares residual tell you for ANOVA

difference WITHIN groups

if large, it means the data points are more spread out within groups

larger IQR for a boxplot

59
New cards

A big F for ANOVA means

sees if the differences between group means are significant by comparing the variance between groups to the variance within groups

there is a significant difference between at least one groups means

60
New cards

what does a f value result in?

A low p-value

61
New cards

IQR is

Q3-Q1

pnorm(0.75, mean, sd) - pnorm(0.25, mean, sd)

62
New cards

df for model for ANOVA

#groups - 1

63
New cards

df for residuals for ANOVA

#observations - # of groups

64
New cards

mean sq for groups for ANOVA

SSM/DFM

65
New cards

mean square for residuals ANOVA

SSE/DFE

66
New cards

f value for ANOVA

SSM/SSE

67
New cards

how is pvalue found for anova

1-pf(f value, df1 = ___, df2 = ___)

68
New cards

bayes theorem

P(A|B)= P(B|A) * P(A) / P(B)

69
New cards

sensitivity means

positive given they have it

70
New cards

specificity means

negative given they dont have it

71
New cards

bayes thereom

P(A|B) = P(B|A) * P(A) / P(B)

72
New cards

how to find the denominator for bayes theorem

P(B) = (P(B|A) P(A)) + (P(B|not A) * P(not A))

73
New cards

what does one dot on a bootstrap sample represent

1 bootstrap sample

74
New cards

to approximate a confidence interval using a dot plot of a bootstrap you

count the number of dots

Find the percentage of the dots that are not included in the CI and count them

75
New cards

how to find p value for a hypothesis test

find observed difference

Calculate the test statistic, either t (means) or z (proportions)

then pt() or pnorm()

76
New cards

how to make a bootstrap distribution with R

do (1000) * diffmean(~explanatory variable, data = reshuffle (____))

77
New cards

Q1

(the first quartile) corresponds to the 25th percentile, or the value at which 25% of the data lies at or below this value.

78
New cards

Median in a boxplot

corresponds to the 50th percentile or the middle value, or the value at which 50% of the data lies at or below this value.

79
New cards

Q3

the third quartile) corresponds to the 75th percentile, or the value at which 75% of the data lies at or below this value.

80
New cards

for a normal distribution the mean and median are

the same

81
New cards

z score for a normal distribution

z = x-m/sd

82
New cards

central limit theorem tells us

the sampling distribution will be approximately normal when the sample size it large. 

83
New cards

if you know a population sd use

normal (Z) distribution

84
New cards

type one error

rejects a true null

85
New cards

level of significance is

probability of a Type 1 error.  It is the probability of rejecting the null hypothesis when the null hypothesis is true.