Key Concepts: Population, Sampling, and Data Types
- Population vs. Sample
- Population: the entire group of interest; a sample is a subset of the population.
- Larger sets are called the population; the sample is drawn from it.
- Descriptive statistics summarize data from the sample; inferential statistics use the sample to make inferences about the population.
- Sampling Principles
- If you survey everyone, you have no sampling error; in practice, use a sample.
- A sample should have well-defined probabilities of selection to allow generalization.
- Every member of the population should have an equal chance of being selected; and every possible sample of size $n$ should be equally likely.
- Non-random sampling yields biased results; careful randomization is essential for validity.
- Real-world samples must consider reachability and willingness to respond; context matters.
- Population context matters (e.g., US population vs. a specific subpopulation) for generalizability.
- Random Sampling and Assignment
- Simple random sampling: every member has equal chance; selections are independent.
- Random assignment: dividing a sample into two groups randomly (e.g., treatment vs. control).
- Stratified random sampling: sample from subgroups in proportions that match the population sizes.
- Core Concepts in Inference
- Inferential statistics convert sample information into guesses about the population; relies on randomness.
- The sampling procedure defines what is considered random, not just the outcome.
- Sampling error decreases with larger sample size; depends on population variability and desired precision.
- Larger samples give more precise estimates of population parameters.
- A population parameter is a true, fixed quantity (e.g., population mean $\mu$ or population proportion $p$).
- A sample statistic is computed from the sample to estimate the population parameter (e.g., sample mean $\bar{x}$).
- Variables and Measurements
- Variables: properties that can take different values (independent, dependent, etc.).
- Independent variable: deliberately manipulated by the experimenter.
- Dependent variable: measured outcome.
- Qualitative (categorical) variables: express attributes (e.g., hair color, gender); do not have a numerical order.
- Quantitative (numerical) variables: measurable quantities (e.g., height, weight); can be discrete or continuous.
- Qualitative vs. quantitative: qualitative describes categories; quantitative describes quantities.
- Data Types and Scales
- Univariate data: one piece of information per observation.
- Bivariate data: two pieces of information per observation.
- Multivariate data: three or more pieces of information per observation.
- Numerical data can be discrete (countable) or continuous (any value within a range).
- A proportion is a rate relative to population size (not the same as a percentage, though percentages can be derived by multiplying by 100).
- Terminology for Population Studies
- A parameter is a true, fixed population quantity (e.g., population mean $\mu$, population proportion $\pi$).
- A statistic is a quantity computed from the sample used to estimate a parameter.
- Descriptive statistics describe the data at hand and do not inherently generalize beyond it.
- A sample is a small subset of a larger data set; we often rely on samples to draw inferences about the larger population.
- Measurement and Distribution Notes
- In continuous data (e.g., height), there is no smallest next value on the real number line; there are infinitely many values between any two values.
- A bigger sample size generally reduces sampling error; discussed in terms of the standard deviation of the sampling distribution.
- A random sample is chosen so every sample of size $n$ has an equal chance of being selected.