data collection

sampling

population

def: the entire collection of individuals being studied

population parameter

a value describing the population

  • a census (data collected from the whole population) is the only way to certainly determine a parameter

    not very ideal for large/evolving population

sample

def: a subset of a population being studied

the goal of a sample is to be as representative as possible, otherwise, they are biased

every sample will be different from its previous

sample statistic

a value describing the sample of a population

if a statistic is accurate enough, it can be generalized to the population

random sampling methods

good sampling

  1. simple random sampling: a sampling method where every individual & combination has an equal chance of being selected

  2. stratified sampling: a sampling method where a population is grouped into strata, and an srs is taken from each strata

    works best when strata are homogeneous

  3. cluster sampling: a sampling method where a population is grouped into clusters, and a census is taken from selected clusters

    works best when clusters are heterogeneous

  4. systematic sampling: a sampling method where individuals are selected according to a system

bad sampling

  1. convenience sampling: a sampling method where individuals are hand-selected

  2. voluntary sampling: a sampling method where individuals are prompted to respond

observational studies vs. experiments

observational studies

  • doesn’t use treatments

  • can be retrospective or prospective

    • retrospective: collecting present/past data

    • prospective: collecting future data

  • cannot determine a causal relationship

experiments

  • uses treatments

  • can determine a causal relationship

biases

biases only matter when the underrepresented group’s opinions will impact the sample statistics

undercoverage bias

caused by an error in sample method where a population group is not represented in the sample

response bias

cause by alteration to a group’s response like framing, incentives, etc.

non-response bias

caused by a sample group’s decision not to respond