Biostats Final Exam

0.0(0)
studied byStudied by 7 people
0.0(0)
call with kaiCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/111

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 9:26 PM on 12/15/25
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

112 Terms

1
New cards

Biostatistics (def)

Analysis of data derived from biological sciences and medicine

Statistical tests to analyze biological data

Describing (large/small/variable a trait is)

Testing (whether two things are related/different)

Predicting (whether an intervention is effective)

2
New cards

Categorical data (def, types)

Non-ordered (nominal): yes/no

Ordered (ordinal): ranked, agree/disagree/neutral

3
New cards

Nominal data (def, ex)

Categorical

Non-ordered, non-numerical, can’t be ranked

Yes or no

Eye color

4
New cards

Ordinal data (def, ex)

Categorical

Natural rank or order, but distances between are not known or equal

Poor, average, good

Education levels

Military ranks

5
New cards

Numeric data (def, types)

Scale

Discrete: number of siblings

Continuous/interval: height, BMI

6
New cards

Discrete data (def, ex)

Numeric

Countable, distinct values that can’t be broken down into smaller parts

Number of students

Clicks on a website

7
New cards

Continuous/interval data

Numeric

Values that can be infinitely divided, measured

Height

Weight

Temperature

Measured, not counted

8
New cards

Rank (def)

Relative position of a set of measurements

9
New cards

Rate (def, ex, notes)

Ratio of two quantities

Percentages, proportions, ratios

Zero as lowest meaningful point

10
New cards

Descriptive statistics (use, ex)

To present and describe data in a useful way

Show patterns and associations

Summary statistics, graphs

11
New cards

Inferential statistics (use, ex)

To draw conclusions about a population from a sample; taking action based on the data

Estimation

Hypothesis tests, statistical modelling

12
New cards

Box and whisker plot (type, notes)

Median, range, IQR, outliers

Numeric data

25% in each whisker

50% in the box

13
New cards

Presenting categorical data

Table

Bar chart

Pie chart (<6 slices)

14
New cards

Presenting numeric data

Histograms

Box and whisker plots

15
New cards

Mode (def, features)

Value that occurs most frequently

Useful for discrete or categorical data

16
New cards

Median (def, features)

Middle value

Useful for asymmetric or skewed data or outlying values

17
New cards

Mean (def, feat)

Average of data set (add up and divide by number of points)

Affected by extreme values

For symmetrical distributions

Fictitious

Only for interval data

18
New cards

Skew (left vs right)

Left peak: POSITIVE skew

Right peak: NEGATIVE skew

Lefty win :)

19
New cards

Range (def, feat)

Minimum and maximum values

Sensitive to extreme values

20
New cards

Interquartile range (def, feat)

Difference between 25th and 75th centiles

Range of middle 50% of data

Not influenced by extreme values; ok for skewed

21
New cards

Variance (def, feat)

Mean of the squared distances to the mean

Are data close to the mean or spread out?

22
New cards

Standard deviation (def, feat)

Square root of the variance

Symmetric data

23
New cards

Coefficient of variation (use, calc)

Compare different scales or measurements

Standard deviation divided by the mean; usually multiplied by 100 to get a percentage

24
New cards

P (def)

Central or expected value for a given outcome, proportion or probability

25
New cards

N (def)

Sample size

26
New cards

Variance and standard error (calc)

p(1-p)/n

Standard error is same but square rooted

27
New cards

Probability (def, calc)

Frequency of event in many many trials

Categorical variables

Number of events/total number of trials

28
New cards

Probability of two events (calcs)

Mutually exclusive: Add probabilities

Not sequential: Multiply probabilities

29
New cards

Binomial probabilities (def, notes)

Chance of getting a specific number of “successes” in a fixed number of independent trials

Only two outcomes (success/failure)

Constant success probability

30
New cards

Conditions for binomial probabilities

BINS

Binary outcomes

Independent trials

Number of trials (N)

Same probability of success

31
New cards

Normal distribution

Probability of a range of variables is represented by area under the curve, = 1

Mean/median/mode equal

Symmetrical around the mean

32
New cards

Standard deviation Normal distribution (percents)

68% +- 1 SD of mean

95% within 2

99.7% within 3

33
New cards

Normal distribution mean and SD

Mean (μ) = 0

SD (σ) = 1

34
New cards

Z value (def, equ)

Any Normal variable can be standardized to get a z-value

(value-mean value)/SD

35
New cards

Kurtosis (def, names)

How tailed/peaked the distribution is

High kurtosis = leptokurtic = fat tails and sharper peaks, more outliers

Low kurtosis = platykurtic = thinner tails, flatter peak, fewer outliers

Mesokurtic similar to Normal

36
New cards

Population and Sample (defs, issue)

Population: group of interest

Sample: group chosen to represent population

Issue: sample may not reflect population; repeated samples from same population may give different results

37
New cards

Population parameters (def)

What is of interest to determine

Sample estimates = sample statistics, point estimates of population parameters

38
New cards

Conventional notation

Population:

N = population size

μ = mean

σ = SD, with 2 = population variance

Sample:

n = sample size

x̄ = mean

sd = sample SD

sd2 = sample variance

39
New cards

Sampling distribution of the mean (def)

Mean and standard deviation of the sample means

40
New cards

Standard error of the mean (def, equ)

Standard deviation of the sampling distribution of the means

Used to comment on the population mean, not to describe the dispersion of values around a mean

SEM = sd/square root of n

41
New cards

Confidence intervals (def, equ)

Range of values within which the the statistic would fall 95% of the time

Indication of how good the sample mean is as an estimate of the population mean

Mean +- 1.96*SEM (for 95% CI)

42
New cards

Z score requisites

Random samples

Quantitative data

Variable Normally distributed

Sample size > 30

43
New cards

Z score interpretation

Positive Z = observation is >mean

Negative z = observation is <mean

44
New cards

Sampling distribution of a proportion (def, notes)

the distribution of all possible values of the proportion that could be obtained in repeated samples of the same size from the population of interest

has a shape, mean and SD

mean = π

45
New cards

Standard error of proportion

Standard deviation of the sampling distribution of the proportion

Gets smaller as sample size increases

46
New cards

Central limit theorem

Large enough sample size 30+ → distribution of sample means will approximate a normal distribution

47
New cards

Finite population correction factor

1-f

sampled most or all of the population

48
New cards

Poisson distribution (use, ex)

Model rare events that occur across time, i.e. 100 year floods

Less than 20 events

Rare health events like infant mortality, cancer

49
New cards

Confidence intervals for an age-adjusted rate

SE = R/sqaure root of N

R = age-adjusted rate

N = number of events (deaths)

CI: R+-1.96xSE

50
New cards

Pnorm

Converts a Z score into a probability

Find probability that a randomly selected value from a Normal distribution would be less than or equal to a specified value

Output = cumulative density function, area under the curve to the left of a Z score

51
New cards

Qnorm

Converts probability into a Z score

Find the specific value in a Normal distribution at which a given proportion of the distribution falls before

Output = quantile or value below which the given percentage of the distribution lies

52
New cards

Hypothesis (def)

A test of belief or set of rules from coming to a yes/no conclusion

Statistical hypothesis = statement of belief regarding the value of one or more population characteristics

53
New cards

Hypothesis (process)

State null hypothesis (Ho) → there is NO EFFECT

Apply a statistical test

Decide to accept or reject null hypothesis (p-value)

Interpret the test

54
New cards

T test (use, assumptions)

To test if two means are the same

One continuous, one categorical

Assumes:

  • distributions roughly symmetrical

  • Observations independent

  • Variances are similar

55
New cards

T statistic/value (def, equ)

How far the data are from the null hypothesis

A standardized difference in means

(mean 1 - mean 2)/SE of mean diff

56
New cards

T score (meaning, equ)

How many SEMs are we from the zero?

A standardized score for comparison

57
New cards

P value (def, threshold)

Probability of the observed test statistic if null is true

P>0.05, accept the null

P<0.05, reject the null

58
New cards

Cohen’s d (def, interp)

How many SDs separate the means of two groups

Standardized measure of effect size

Small = .2

Medium = .5

Large = .8

Effect size for t tests

59
New cards

Type I error

False positive

Rejecting the null when it’s true

Controlled by significance level α, usually .05 or 5%

60
New cards

Type II error

False negative

Failing to reject null when it’s false

Controlled through large enough sample size

β

61
New cards

Power of a test (def, equ)

Probability of correctly rejecting a null

1-β

Usually .2 or power 80%

62
New cards

Paired t test (use)

Need for a paired sample i.e. a pre- and post-treatment measurement on the same participants

Calculates the difference within pairs (change score) first (subtract pre from post), and take the mean

63
New cards

ANOVA

Analysis of variance'/how different the means are

T test for 2+ means

Alt: at least one mean is different

Effect size: Eta squared or partial eta squared

64
New cards

Between-group differences (ANOVA)

Using group means and grand mean

Sums of squared deviations, then mean squared deviations (between)

65
New cards

Within-group differences

Using raw values and their group mean

Sum of squared deviations, then mean squared deviations (within)

66
New cards

Mean squares (ANOVA, calc)

= Sums of squares/degrees of freedom

67
New cards

Degrees of freedom (ANOVA)

Between groups (df1) = number of groups - 1

Within groups (df2) = N total - number of groups

Total df = N total - 1

68
New cards

ANOVA interpretation

If significant (p>0.05), justification to go look at differences between groups to find weirdo using Tukey’s HSD post-hoc

69
New cards

ANOVA post-hoc tests (why, ex)

To control for type I error at 5% level

Usually Tukey’s HSD

70
New cards

Partial eta squared (what, which test, interp)

ANOVA

Proportion of the variance in outcome variable explained by the grouping factor

Small effect: 0.01

Medium: 0.06

Large: 0.15

71
New cards

ANOVA assumptions

Groups normally distributed

Groups have same variance (up to 2x ok)

Observations and groups independent

Ok with moderate violations of assumptions (robust) but worse if groups very different sizes/small sample sizes

72
New cards

Correlation (def)

Measures association between two CONTINUOUS numeric variables

Association not causation

Assumes a straight line is the true relationship

73
New cards

Pearson’s r (def, interp, assum)

Correlation coefficient

Ranges from -1 to 1

<0.5 = weak

.5-.7 = moderate

>.7 = strong

Assumes Normally distributed variables, straight line relationship

74
New cards

Scatterplot (set up)

Dependent variable on the Y

Predictor on the X

75
New cards

Spearman’s ρ and Kendall’s τ

Alternate correlation coefficients for non-Normal or ordinal variables

Non-parametrics

76
New cards

R2 (R squared) (def, use)

Square of Pearson’s r

Effect size for correlation; amount of overlap between the variables

If r2 = 0.25, 25% of the variance is shared between variables

77
New cards

Uses of correlation (3)

Association between two variables in observational studies

Validating a new test against a gold standard

Reliability of a test

78
New cards

Null hypothesis for correlation

There is no correlation, r = 0

79
New cards

Linear regression (def, equ)

Gives equation of the line of the association

y = mx+b or y=b0+b1x

m or b1 is slope

b or b0 is intercept

80
New cards

Interpreting the slope

Slope quantifies how different y is for a +1 unit difference in x

81
New cards

Least squares (def, equ)

Regression fits the “best line” where the distance squared from each data point to the line is kept as small as possible

slope = SSyx/SSx

82
New cards

Error/residuals in regression (def)

The distances from each point to the line (vertically)

Represents unexplained variability in Y

83
New cards

Software output of note for regression (4)

R2 value

Standard error of the estimate

Coefficients: Intercept and Slope, including P-values

84
New cards

Assumptions of linear regression (3)

Linear relationship (check using scatterplots)

Constant variance of the residuals (no wedge shape)

Residuals have Normal distribution

85
New cards

Stability of a regression line (depends on)

Sample size

<5 observations per predictor is unstable

10-15 at least, >20 ideally

100 plus the number of predictor variables

86
New cards

Chi-square test (use, output, null)

Compare proportions between two or more groups

Categorical and categorical

Give x2 (chi-squared) test statistic

Null = proportions in groups are the same; being active or not is independent of gender

87
New cards

Expected table (chi) (def, equ, assum)

What you would expect in the cells of a table if the null were true

= row total x column total/grand total

Should have N>=20 and no cell <=5 unless N>=40; if no, FIsher’s

88
New cards

Chi-squared degrees of freedom (how, equ)

Depends on size of table

df = (#rows-1)x(#columns-1)

89
New cards

Assumptions of chi-square test (3, additional/alternate tests)

Sample size large enough (Fisher’s exact test)

Independent data points, not pre-post (McNemar’s tests)

Distribution curve is continuous, while cell counts are discrete (Yate’s continuity correction?)

90
New cards

Complete follow-up (def, notes)

All participants followed to death

Length of survival known for everyone

Rare and difficult; data is often “censored” or lost to follow-up

91
New cards

Right-censored data (def)

Blind to after

We don’t know what happened after a particular time or when a future event happens

92
New cards

Left-censored data (def)

Blind to the past

We don’t know what happened before a particular time or when a past event happened

93
New cards

Variable time follow-up (def)

Participants aren’t followed to death

94
New cards

Analyzing variable-time studies (best to worst, 4)

Life table approach (optimum)

Using person-years as a unit of observation (acceptable but unrealistic)

Comparing N-year risk (biased if any incomplete FUP)

Comparing mean survivals (don’t work)

95
New cards

Clinical/actuarial life table (def, equ)

Gives survival probabilities over time

Probability of surviving from time 0 to time b = (prob of surviving 0 to a)*(prob of surviving a to b)

96
New cards

Kaplan Meier approach (def)

Shows survival probabilities over time (steps)

Splits periods at events (outcome aka death, withdrawal, etc)

97
New cards

Log-rank test (use)

Compare two+ groups for survival time

98
New cards

Median survival time (def, note)

Time point where 50% survival is reached

Only estimable when curves drop below 50% survival

99
New cards

Exploratory study objectives (3)

State of low knowledge

Baselines, natural history

Discovery of patterns/potential assocations

100
New cards

Confirmatory study objectives (3)

Find effect of a given magnitude

Control of type I and II errors

Possible population state/hypothesis test