Statistically Large Biostats quizlet

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/124

There's no tags or description

Looks like no tags are added yet.

Last updated 7:48 PM on 4/29/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

125 Terms

New cards

What is a population

the entire collection of individual/observational units that share a common property

New cards

What is a sample

a subset of the population

New cards

Observation

Set of one or more quantities (measurements) on a single observational unit

New cards

what is a parameter

a quantity describing a statistical population

New cards

what is an estimate

a calculated quantity describing a statistical population, aka statistic

New cards

what is a categorical variable

describes membership in a category our group, they do not have magnitude on a numerical scale

New cards

what are the types of categorical variables

nominal (name) or Ordinal (ordered)

New cards

example of nominal variable

survival (alive or dead), eye color, breed of dog

New cards

example of an ordinal variable

life stage, size class (sm,m,lg), severity score

New cards

are months and weekdays categorical?

yes, they are treated both nominally and ordinally depending on the context

New cards

what is a numerical varaible

characteristics of observations that have magnitude on a numerical scale

New cards

what are the types of numerical variables

continuous and discrete

New cards

what is a continuous variable

any real number value, ex. degrees celcius, cm

New cards

what is a discrete variable

only take individible units though they can be non integers, ex. age at death, # of eggs in a bird nest,

New cards

Can continuous variables have an exact value?

the probability of any exact value is zero, whereas it is nonzero with discrete variables

New cards

what is a graph

visual representation of a relationship between variables

New cards

What is a bar graph

columns (bars) representing the distribution of a numerical variable against one or more categorical variables (better than pie charts)

New cards

what is an experimental study

researcher randomly assigns observational units to different groups (treatments). Researcher controls the treatments

New cards

what is an explanatory variable

the treatment variable that has been manipulated by the researcher (independent variable)

New cards

what is a response variable

the measured effect of the treatment (dependent variable)

New cards

what is an observational study

researchers have no control over which observational unit falls into which treatment. Passive observation

New cards

what is a scatter plot

graphical display of two numerical variables, each observation is represented as a point on a graph with 2 or 3 axis.

New cards

what is a line graph

uses dots connected by line segments to display trends measured over time or other ordered states

New cards

what is a frequency distribution

a representation (either graphical or tabular) that displays the number of observations within a given interval of a quantitative variable

New cards

what is the mode

interval corresponding to the highest peak in the frequency distribution

New cards

what is Skew

asymmetry in the shape of a frequency distribution for a numerical variable

New cards

what is the primary goal of statistics

to infer/estimate an unknown characteristic of an entire population based on sample data

New cards

what does location tell us

something about the average/typical individual units

New cards

what does spread tell us

how many measurements vary among individual units (how widely scattered are the values around the centre/location)

New cards

what is the most important location statistic

the arithmetic mean

New cards

what are the most important spread statistics

variance and standard deviation

New cards

what is variance

average squared deviation of observations from the mean. measures the overall uncertainty/spread

New cards

what is the coefficient of variance (CV)

it tells us how hard it is to guess a typical value (mean/location) relative to size. Small CV means most values fall close to the target relative to magnitude, Large CV means guesses are more uncertain.

New cards

what is the median

the middle measurment/value of a distribution

New cards

what is the mean

arithmetic average, more sensitive to extreme values than the median

New cards

what is the median for an even number of observations

the average of the two central numbers

New cards

what is left skewed

few small values, >1/2 of values exceed the mean

New cards

what is right skewed

few large values, >1/2 of values are less than the mean

New cards

what is the interquartile range

the measure of spread for the median (Q3-Q1)

New cards

what are the advantages of a box plot

quickly shows where the most values lie (location) and how spread out the data are. Provides quick info on symmetry and skewness

New cards

what is accuracy

how close. anestimate is to the population parameter (mean median or standard deviation)

New cards

what is precision

how much estimates vary across samples

New cards

acurrate and precise

low sampling variation and low bias

New cards

accurate and imprecise

high sampling variation and low bias

New cards

inaccurate and precise

low sampling variation and high bias

New cards

inaccurate and imprecise

high sampling variation and high bias

New cards

random sampling

minimizes bias and allows to quantify sampling variation

New cards

sample bias

occurs when observational units in the target population have a higher or lower probability of being sampled than others, leads to inaccuracy

New cards

sampling variation

refers to the natural variation in statistics across different samples drawn from the same population

New cards

sampling of convenience

whomever you can get, some members of a population are systematically more likely to be selected in a sample than others, leads to inaccuracy

New cards

survivorship bias

occurs when we draw conclusions based on the individuals that remain observable (sample observations) while ignoring those that did not survive, failed, or disappeared (unobserved)

New cards

what is μ

population mean (mu)

New cards

what is σ

population standard deviation (sigma)

New cards

what is σ^2

population variance (sigma squared)

New cards

what is X

sample mean (X bar)

New cards

what is s

sample standard deviation

New cards

what is s^2

sample variance

New cards

properties of sampling distribution

1. the mean of all samples is always equal to the population mean

2. selection of observational units is unbiased and independemt,

3. under random sampling, increasing size reduces variability, more precise estimation

New cards

as sampling size increases, sampling variability______

decreases, yielding a more precise estimate

New cards

what is Y

the sampling distribution of sample means

New cards

sampling error

how much a statistic calculated from a sample differs from the true population value due to random variation

New cards

criteria of a random sample

every observational unit in the population have an equal chance of being included, the selection is independent

New cards

does random sampling ensure accuracy or precision

accuracy, ensures precision as sample size increases

New cards

what is a confidence interval

A range of values that is likely to contain the population parameter with a specified level of confidence. reflects the uncertainty of the mean estimate

New cards

what is the margin of error

the maximum typical deviation we expect between a sample estimate and the true population due to sampling variability at a chosen confidence level

New cards

what does a larger confidence level (95% or 99%) tell us

provides a more plausible range for the parameter. values inside the interval are more plausible, those that lie outside are considered less plausible based on sample data

New cards

what does a 95% confidence interval tell us

we are 95% confident the true population mean lies between the lower and upper limits of the interval (NOT 95% probability)

New cards

higher confidence =

wider interval, less precision

New cards

lower confidence =

narrower interval, more precision

New cards

what is normal distribution

bell curve

New cards

what is a t-distribution

It is bell-shaped and has a mean of zero, but has a larger standard deviation than the standard normal distribution, and therefore, has thicker tails than the standard normal distribution.

New cards

what is the t-statistic

measures the deviation of the sample mean from the true population mean, it is unit free and comparable across populations. It is universal and only varies as a function of sample size

New cards

when do we use the t-distribution

when sample size and population standard deviation is unknown

New cards

what is the square-root transformation

it compresses variation, stabilizing variance and skewedness, primarily used to normalize right skewed data

New cards

What is Jensen's inequality?

a mathematical rule used to compare the average of a function to the function of an average, used to calculate risk and inequality in curved functions. used for small sample sizes

New cards

is variance biased or unbiased when divided by n instead of n-1

BIASED due to degrees of freedom

New cards

what is scientific evidence

information, facts, or data that support or challenge a claim, prediction, assumption, or hypothesis

New cards

statistical hypothesis framework

a quantitative method of statistical inference that allows to generate evidence for or against a hypothesis

New cards

Null Hypothesis (H0)

nothing systematic is going on

New cards

Alternative hypothesis (HA)

Hypothesis in which there are nonzero effects and there are differences between treatments

New cards

The frequentist hypothesis testing framework

evaluates how compatible the data are within an assumed model (H0) measured by P-values

New cards

what does a small p value mean (<0.05)

strong evidence against the null hypothesis, reject null

New cards

what does a large p value mean (>0.05)

strong evidence supporting null hypothesis, reject HA

New cards

what does the significance level alpha represent

the probability of rejecting the null hypothesis when it is actually true (a Type I error or "false positive")

New cards

Type I Error

H0 is true, we reject H0 (false positive)

New cards

Type II Error

H0 is false, we accept H0 (false negative)

New cards

if p

reject the null hypothesis

New cards

if p>a

accept null hypothesis

New cards

1-B

propability of correctly rejecting the null hypothesis when it is truly false

New cards

what is a one sample t test

testing a hypothesis based on a single sample

New cards

assumptions of one sample t test

the sample is random and independent, the variable of interest is assumed to follow normal distribution for small sample sizes

New cards

two sample t test

a statistical method used to compare the means of 2 groups of subjects

New cards

two sample t test assumptions

random and independent

New cards

what is a paired design

both treatments are applied to every sampled unit, minimized the impact of variability thus increasing precision

New cards

what is the F test

Variance among group means/variance within groups, large value indicates significance

New cards

Welch's t-test

compares the means of two groups and can be used even when the variances of the two groups are not equal

New cards

Welches t test assumptions

normal distribution, independent + random, does not assume equal variance, degrees of freedom can be non-whole numbers to provide more accurate results accounting for unequal variances

New cards

ANOVA

One continuous response variable and one categorical predictor variable, uses F statistic

New cards

what does ANOVA assume

independent, random, normally distributed, equal variance

100

New cards

tukeys honest test

post hoc test (after ANOVA), determines which group means differ