data collection
sampling
population
def: the entire collection of individuals being studied
population parameter
a value describing the population
a census (data collected from the whole population) is the only way to certainly determine a parameter
not very ideal for large/evolving population
sample
def: a subset of a population being studied
the goal of a sample is to be as representative as possible, otherwise, they are biased
every sample will be different from its previous
sample statistic
a value describing the sample of a population
if a statistic is accurate enough, it can be generalized to the population
random sampling methods
good sampling
simple random sampling: a sampling method where every individual & combination has an equal chance of being selected
stratified sampling: a sampling method where a population is grouped into strata, and an srs is taken from each strata
works best when strata are homogeneous
cluster sampling: a sampling method where a population is grouped into clusters, and a census is taken from selected clusters
works best when clusters are heterogeneous
systematic sampling: a sampling method where individuals are selected according to a system
bad sampling
convenience sampling: a sampling method where individuals are hand-selected
voluntary sampling: a sampling method where individuals are prompted to respond
observational studies vs. experiments
observational studies
doesnāt use treatments
can be retrospective or prospective
retrospective: collecting present/past data
prospective: collecting future data
cannot determine a causal relationship
experiments
uses treatments
can determine a causal relationship
biases
biases only matter when the underrepresented groupās opinions will impact the sample statistics
undercoverage bias
caused by an error in sample method where a population group is not represented in the sample
response bias
cause by alteration to a groupās response like framing, incentives, etc.
non-response bias
caused by a sample groupās decision not to respond