Key Concepts: Population, Sampling, and Data Types

  • Population vs. Sample
    • Population: the entire group of interest; a sample is a subset of the population.
    • Larger sets are called the population; the sample is drawn from it.
    • Descriptive statistics summarize data from the sample; inferential statistics use the sample to make inferences about the population.
  • Sampling Principles
    • If you survey everyone, you have no sampling error; in practice, use a sample.
    • A sample should have well-defined probabilities of selection to allow generalization.
    • Every member of the population should have an equal chance of being selected; and every possible sample of size $n$ should be equally likely.
    • Non-random sampling yields biased results; careful randomization is essential for validity.
    • Real-world samples must consider reachability and willingness to respond; context matters.
    • Population context matters (e.g., US population vs. a specific subpopulation) for generalizability.
  • Random Sampling and Assignment
    • Simple random sampling: every member has equal chance; selections are independent.
    • Random assignment: dividing a sample into two groups randomly (e.g., treatment vs. control).
    • Stratified random sampling: sample from subgroups in proportions that match the population sizes.
  • Core Concepts in Inference
    • Inferential statistics convert sample information into guesses about the population; relies on randomness.
    • The sampling procedure defines what is considered random, not just the outcome.
    • Sampling error decreases with larger sample size; depends on population variability and desired precision.
    • Larger samples give more precise estimates of population parameters.
    • A population parameter is a true, fixed quantity (e.g., population mean $\mu$ or population proportion $p$).
    • A sample statistic is computed from the sample to estimate the population parameter (e.g., sample mean $\bar{x}$).
  • Variables and Measurements
    • Variables: properties that can take different values (independent, dependent, etc.).
    • Independent variable: deliberately manipulated by the experimenter.
    • Dependent variable: measured outcome.
    • Qualitative (categorical) variables: express attributes (e.g., hair color, gender); do not have a numerical order.
    • Quantitative (numerical) variables: measurable quantities (e.g., height, weight); can be discrete or continuous.
    • Qualitative vs. quantitative: qualitative describes categories; quantitative describes quantities.
  • Data Types and Scales
    • Univariate data: one piece of information per observation.
    • Bivariate data: two pieces of information per observation.
    • Multivariate data: three or more pieces of information per observation.
    • Numerical data can be discrete (countable) or continuous (any value within a range).
    • A proportion is a rate relative to population size (not the same as a percentage, though percentages can be derived by multiplying by 100).
  • Terminology for Population Studies
    • A parameter is a true, fixed population quantity (e.g., population mean $\mu$, population proportion $\pi$).
    • A statistic is a quantity computed from the sample used to estimate a parameter.
    • Descriptive statistics describe the data at hand and do not inherently generalize beyond it.
    • A sample is a small subset of a larger data set; we often rely on samples to draw inferences about the larger population.
  • Measurement and Distribution Notes
    • In continuous data (e.g., height), there is no smallest next value on the real number line; there are infinitely many values between any two values.
    • A bigger sample size generally reduces sampling error; discussed in terms of the standard deviation of the sampling distribution.
    • A random sample is chosen so every sample of size $n$ has an equal chance of being selected.