Statistically Large Biostats quizlet

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/124

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 7:48 PM on 4/29/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

125 Terms

1
New cards

What is a population

the entire collection of individual/observational units that share a common property

2
New cards

What is a sample

a subset of the population

3
New cards

Observation

Set of one or more quantities (measurements) on a single observational unit

4
New cards

what is a parameter

a quantity describing a statistical population

5
New cards

what is an estimate

a calculated quantity describing a statistical population, aka statistic

6
New cards

what is a categorical variable

describes membership in a category our group, they do not have magnitude on a numerical scale

7
New cards

what are the types of categorical variables

nominal (name) or Ordinal (ordered)

8
New cards

example of nominal variable

survival (alive or dead), eye color, breed of dog

9
New cards

example of an ordinal variable

life stage, size class (sm,m,lg), severity score

10
New cards

are months and weekdays categorical?

yes, they are treated both nominally and ordinally depending on the context

11
New cards

what is a numerical varaible

characteristics of observations that have magnitude on a numerical scale

12
New cards

what are the types of numerical variables

continuous and discrete

13
New cards

what is a continuous variable

any real number value, ex. degrees celcius, cm

14
New cards

what is a discrete variable

only take individible units though they can be non integers, ex. age at death, # of eggs in a bird nest,

15
New cards

Can continuous variables have an exact value?

the probability of any exact value is zero, whereas it is nonzero with discrete variables

16
New cards

what is a graph

visual representation of a relationship between variables

17
New cards

What is a bar graph

columns (bars) representing the distribution of a numerical variable against one or more categorical variables (better than pie charts)

18
New cards

what is an experimental study

researcher randomly assigns observational units to different groups (treatments). Researcher controls the treatments

19
New cards

what is an explanatory variable

the treatment variable that has been manipulated by the researcher (independent variable)

20
New cards

what is a response variable

the measured effect of the treatment (dependent variable)

21
New cards

what is an observational study

researchers have no control over which observational unit falls into which treatment. Passive observation

22
New cards

what is a scatter plot

graphical display of two numerical variables, each observation is represented as a point on a graph with 2 or 3 axis.

23
New cards

what is a line graph

uses dots connected by line segments to display trends measured over time or other ordered states

24
New cards

what is a frequency distribution

a representation (either graphical or tabular) that displays the number of observations within a given interval of a quantitative variable

25
New cards

what is the mode

interval corresponding to the highest peak in the frequency distribution

26
New cards

what is Skew

asymmetry in the shape of a frequency distribution for a numerical variable

27
New cards

what is the primary goal of statistics

to infer/estimate an unknown characteristic of an entire population based on sample data

28
New cards

what does location tell us

something about the average/typical individual units

29
New cards

what does spread tell us

how many measurements vary among individual units (how widely scattered are the values around the centre/location)

30
New cards

what is the most important location statistic

the arithmetic mean

31
New cards

what are the most important spread statistics

variance and standard deviation

32
New cards

what is variance

average squared deviation of observations from the mean. measures the overall uncertainty/spread

33
New cards

what is the coefficient of variance (CV)

it tells us how hard it is to guess a typical value (mean/location) relative to size. Small CV means most values fall close to the target relative to magnitude, Large CV means guesses are more uncertain.

34
New cards

what is the median

the middle measurment/value of a distribution

35
New cards

what is the mean

arithmetic average, more sensitive to extreme values than the median

36
New cards

what is the median for an even number of observations

the average of the two central numbers

37
New cards

what is left skewed

few small values, >1/2 of values exceed the mean

38
New cards

what is right skewed

few large values, >1/2 of values are less than the mean

39
New cards

what is the interquartile range

the measure of spread for the median (Q3-Q1)

40
New cards

what are the advantages of a box plot

quickly shows where the most values lie (location) and how spread out the data are. Provides quick info on symmetry and skewness

41
New cards

what is accuracy

how close. anestimate is to the population parameter (mean median or standard deviation)

42
New cards

what is precision

how much estimates vary across samples

43
New cards

acurrate and precise

low sampling variation and low bias

44
New cards

accurate and imprecise

high sampling variation and low bias

45
New cards

inaccurate and precise

low sampling variation and high bias

46
New cards

inaccurate and imprecise

high sampling variation and high bias

47
New cards

random sampling

minimizes bias and allows to quantify sampling variation

48
New cards

sample bias

occurs when observational units in the target population have a higher or lower probability of being sampled than others, leads to inaccuracy

49
New cards

sampling variation

refers to the natural variation in statistics across different samples drawn from the same population

50
New cards

sampling of convenience

whomever you can get, some members of a population are systematically more likely to be selected in a sample than others, leads to inaccuracy

51
New cards

survivorship bias

occurs when we draw conclusions based on the individuals that remain observable (sample observations) while ignoring those that did not survive, failed, or disappeared (unobserved)

52
New cards

what is μ

population mean (mu)

53
New cards

what is σ

population standard deviation (sigma)

54
New cards

what is σ^2

population variance (sigma squared)

55
New cards

what is X

sample mean (X bar)

56
New cards

what is s

sample standard deviation

57
New cards

what is s^2

sample variance

58
New cards

properties of sampling distribution

1. the mean of all samples is always equal to the population mean

2. selection of observational units is unbiased and independemt,

3. under random sampling, increasing size reduces variability, more precise estimation

59
New cards

as sampling size increases, sampling variability______

decreases, yielding a more precise estimate

60
New cards

what is Y

the sampling distribution of sample means

61
New cards

sampling error

how much a statistic calculated from a sample differs from the true population value due to random variation

62
New cards

criteria of a random sample

every observational unit in the population have an equal chance of being included, the selection is independent

63
New cards

does random sampling ensure accuracy or precision

accuracy, ensures precision as sample size increases

64
New cards

what is a confidence interval

A range of values that is likely to contain the population parameter with a specified level of confidence. reflects the uncertainty of the mean estimate

65
New cards

what is the margin of error

the maximum typical deviation we expect between a sample estimate and the true population due to sampling variability at a chosen confidence level

66
New cards

what does a larger confidence level (95% or 99%) tell us

provides a more plausible range for the parameter. values inside the interval are more plausible, those that lie outside are considered less plausible based on sample data

67
New cards

what does a 95% confidence interval tell us

we are 95% confident the true population mean lies between the lower and upper limits of the interval (NOT 95% probability)

68
New cards

higher confidence =

wider interval, less precision

69
New cards

lower confidence =

narrower interval, more precision

70
New cards

what is normal distribution

bell curve

71
New cards

what is a t-distribution

It is bell-shaped and has a mean of zero, but has a larger standard deviation than the standard normal distribution, and therefore, has thicker tails than the standard normal distribution.

72
New cards

what is the t-statistic

measures the deviation of the sample mean from the true population mean, it is unit free and comparable across populations. It is universal and only varies as a function of sample size

73
New cards

when do we use the t-distribution

when sample size and population standard deviation is unknown

74
New cards

what is the square-root transformation

it compresses variation, stabilizing variance and skewedness, primarily used to normalize right skewed data

75
New cards

What is Jensen's inequality?

a mathematical rule used to compare the average of a function to the function of an average, used to calculate risk and inequality in curved functions. used for small sample sizes

76
New cards

is variance biased or unbiased when divided by n instead of n-1

BIASED due to degrees of freedom

77
New cards

what is scientific evidence

information, facts, or data that support or challenge a claim, prediction, assumption, or hypothesis

78
New cards

statistical hypothesis framework

a quantitative method of statistical inference that allows to generate evidence for or against a hypothesis

79
New cards

Null Hypothesis (H0)

nothing systematic is going on

80
New cards

Alternative hypothesis (HA)

Hypothesis in which there are nonzero effects and there are differences between treatments

81
New cards

The frequentist hypothesis testing framework

evaluates how compatible the data are within an assumed model (H0) measured by P-values

82
New cards

what does a small p value mean (<0.05)

strong evidence against the null hypothesis, reject null

83
New cards

what does a large p value mean (>0.05)

strong evidence supporting null hypothesis, reject HA

84
New cards

what does the significance level alpha represent

the probability of rejecting the null hypothesis when it is actually true (a Type I error or "false positive")

85
New cards

Type I Error

H0 is true, we reject H0 (false positive)

86
New cards

Type II Error

H0 is false, we accept H0 (false negative)

87
New cards

if p

reject the null hypothesis

88
New cards

if p>a

accept null hypothesis

89
New cards

1-B

propability of correctly rejecting the null hypothesis when it is truly false

90
New cards

what is a one sample t test

testing a hypothesis based on a single sample

91
New cards

assumptions of one sample t test

the sample is random and independent, the variable of interest is assumed to follow normal distribution for small sample sizes

92
New cards

two sample t test

a statistical method used to compare the means of 2 groups of subjects

93
New cards

two sample t test assumptions

random and independent

94
New cards

what is a paired design

both treatments are applied to every sampled unit, minimized the impact of variability thus increasing precision

95
New cards

what is the F test

Variance among group means/variance within groups, large value indicates significance

96
New cards

Welch's t-test

compares the means of two groups and can be used even when the variances of the two groups are not equal

97
New cards

Welches t test assumptions

normal distribution, independent + random, does not assume equal variance, degrees of freedom can be non-whole numbers to provide more accurate results accounting for unequal variances

98
New cards

ANOVA

One continuous response variable and one categorical predictor variable, uses F statistic

99
New cards

what does ANOVA assume

independent, random, normally distributed, equal variance

100
New cards

tukeys honest test

post hoc test (after ANOVA), determines which group means differ