4.1: Sampling and Surveys
Introductory Terms
- Population: an entire group of people about which information is sought
- Sample: the actual part of the population studied in order to gather information
- Information from the sample is used to draw conclusions about the entire population
- Subset of total population
- Census: an attempt to contact everyone in a population
- Very difficult to obtain
- US only attempts national censuses once every 10 years
- It is only appropriate to generalize about a population if the sample is randomly selected or otherwise representative of that population
- A sample is only generalizable to the population from which it was selected
- Sample design: the method used to choose a sample from a population
- Sampling frame: the list of individuals from which a sample is drawn
- Biased sample/biased study: a sample or study which systematically favors certain individuals or outcomes
- Does not represent the population
- Consistently overestimates or underestimates the value sought
Replacement Sampling
- Sampling with replacement: when an item from a population can be selected more than once
- Sampling without replacement: when an item from a population cannot be selected more than once
Types of Sampling
Relatively Ineffective Methods
- Convenience sampling: choosing individuals who are in close proximity or otherwise easy to reach
- Often produces unrepresentative data
- Almost guaranteed to show bias
- Voluntary response sample: individuals choose themselves as participants by responding to a general appeal
- Shows bias because people with strong opinions (often negative) are more likely to respond
- Eg. call-in opinion polls
Generally Effective Methods (if used correctly)
- Good sampling designs have the goal of creating a sample which is representative of the population
- Random sample: an essential principle of statistical sampling
- The use of chance to select a sample
- Eg. dice, spinners, cards
- Simple random sample (SRS): choosing individuals from a population in such a way that every individual in the population has an equal chance of being chosen and every possible sample has an equal chance of being chosen
- The hat method is one type of SRS
- Number the individuals on identical slips of paper
- Place them in a hat
- Mix thoroughly
- Draw one at a time until the desired sample size has been selected
- The numbers you draw represent the individuals that are chosen to be in the sample
- Stratified random sample
- More complicated than an SRS
- Divide the population into groups of similar individuals based on something that might influence results
- These groups are called strata (singular: stratum)
- Select an SRS from each stratum and combine to form a full sample
- Multiple hats; take a little from each
- This way, you are guaranteed to have representation from each group
- The individuals in each stratum are less varied than the population as a whole, but when you select an SRS from each stratum, you will definitely have people from each group
- Can produce better information about the population than an SRS of the same size
- Cluster sample
- The population is naturally divided into groups that contain a mixture of individuals like mini populations, called clusters
- Number each cluster, then choose an SRS from the clusters
- Use all of the individuals in the chosen clusters for the sample
- Multistage sample
- Perform selection in stages, often done for national samples
- Systematic sample
- Order list according to some feature you want to ensure a range of responses from
- Eg. height, GPA, income
- Will be selecting every nth item from the ordered list
- To figure out what n should be, take the total number in the list divided by the number you want to have in your sample
- Starting point should be randomized
- Will spread the sample more evenly throughout the population
- Systematic Random Sample: a method in which sample members from a population are selected according to a random starting point and a fixed, periodic interval
- Starting point should be randomized
Bias
- Bias: when certain responses are systematically favored over others
- When writing about bias, you must:
- Identify the population and sample
- Explain how the sampled individuals might differ from the general population
- Explain how this leads to an overestimate or underestimate
- Non-random sampling methods have the potential for bias because they do not use chance to select the individuals
- Two such methods are voluntary response sampling and convenience sampling
- Voluntary response bias: when a sample is comprised entirely of volunteers, the sample will typically not be representative of the population
- Convenience bias: when those that are most convenient to access get selected for a sample
Types of Bias
In addition to the two types covered above:
Undercoverage bias: when some groups of the population are left out in the process of choosing a sample
Response bias: when the behavior of the respondent or the interviewer causes bias
- Can be intentional or unintentional
Nonresponse bias: when an individual chosen for a sample can’t be reached or chooses not to respond
Question wording: when the complexity, style, or order that a question in influences a response
Self-reported responses: when individuals inaccurately report their own data