4.1: Sampling and Surveys
Population: an entire group of people about which information is sought
Sample: the actual part of the population studied in order to gather information
Information from the sample is used to draw conclusions about the entire population
Subset of total population
Census: an attempt to contact everyone in a population
Very difficult to obtain
US only attempts national censuses once every 10 years
It is only appropriate to generalize about a population if the sample is randomly selected or otherwise representative of that population
A sample is only generalizable to the population from which it was selected
Sample design: the method used to choose a sample from a population
Sampling frame: the list of individuals from which a sample is drawn
Biased sample/biased study: a sample or study which systematically favors certain individuals or outcomes
Does not represent the population
Consistently overestimates or underestimates the value sought
Sampling with replacement: when an item from a population can be selected more than once
Sampling without replacement: when an item from a population cannot be selected more than once
Convenience sampling: choosing individuals who are in close proximity or otherwise easy to reach
Often produces unrepresentative data
Almost guaranteed to show bias
Voluntary response sample: individuals choose themselves as participants by responding to a general appeal
Shows bias because people with strong opinions (often negative) are more likely to respond
Eg. call-in opinion polls
Good sampling designs have the goal of creating a sample which is representative of the population
Random sample: an essential principle of statistical sampling
The use of chance to select a sample
Eg. dice, spinners, cards
Simple random sample (SRS): choosing individuals from a population in such a way that every individual in the population has an equal chance of being chosen and every possible sample has an equal chance of being chosen
The hat method is one type of SRS
Number the individuals on identical slips of paper
Place them in a hat
Mix thoroughly
Draw one at a time until the desired sample size has been selected
The numbers you draw represent the individuals that are chosen to be in the sample
Stratified random sample
More complicated than an SRS
Divide the population into groups of similar individuals based on something that might influence results
These groups are called strata (singular: stratum)
Select an SRS from each stratum and combine to form a full sample
Multiple hats; take a little from each
This way, you are guaranteed to have representation from each group
The individuals in each stratum are less varied than the population as a whole, but when you select an SRS from each stratum, you will definitely have people from each group
Can produce better information about the population than an SRS of the same size
Cluster sample
The population is naturally divided into groups that contain a mixture of individuals like mini populations, called clusters
Number each cluster, then choose an SRS from the clusters
Use all of the individuals in the chosen clusters for the sample
Multistage sample
Perform selection in stages, often done for national samples
Systematic sample
Order list according to some feature you want to ensure a range of responses from
Eg. height, GPA, income
Will be selecting every nth item from the ordered list
To figure out what n should be, take the total number in the list divided by the number you want to have in your sample
Starting point should be randomized
Will spread the sample more evenly throughout the population
Systematic Random Sample: a method in which sample members from a population are selected according to a random starting point and a fixed, periodic interval
Starting point should be randomized
Bias: when certain responses are systematically favored over others
When writing about bias, you must:
Identify the population and sample
Explain how the sampled individuals might differ from the general population
Explain how this leads to an overestimate or underestimate
Non-random sampling methods have the potential for bias because they do not use chance to select the individuals
Two such methods are voluntary response sampling and convenience sampling
Voluntary response bias: when a sample is comprised entirely of volunteers, the sample will typically not be representative of the population
Convenience bias: when those that are most convenient to access get selected for a sample
In addition to the two types covered above:
Undercoverage bias: when some groups of the population are left out in the process of choosing a sample
Response bias: when the behavior of the respondent or the interviewer causes bias
Can be intentional or unintentional
Nonresponse bias: when an individual chosen for a sample can’t be reached or chooses not to respond
Question wording: when the complexity, style, or order that a question in influences a response
Self-reported responses: when individuals inaccurately report their own data
Population: an entire group of people about which information is sought
Sample: the actual part of the population studied in order to gather information
Information from the sample is used to draw conclusions about the entire population
Subset of total population
Census: an attempt to contact everyone in a population
Very difficult to obtain
US only attempts national censuses once every 10 years
It is only appropriate to generalize about a population if the sample is randomly selected or otherwise representative of that population
A sample is only generalizable to the population from which it was selected
Sample design: the method used to choose a sample from a population
Sampling frame: the list of individuals from which a sample is drawn
Biased sample/biased study: a sample or study which systematically favors certain individuals or outcomes
Does not represent the population
Consistently overestimates or underestimates the value sought
Sampling with replacement: when an item from a population can be selected more than once
Sampling without replacement: when an item from a population cannot be selected more than once
Convenience sampling: choosing individuals who are in close proximity or otherwise easy to reach
Often produces unrepresentative data
Almost guaranteed to show bias
Voluntary response sample: individuals choose themselves as participants by responding to a general appeal
Shows bias because people with strong opinions (often negative) are more likely to respond
Eg. call-in opinion polls
Good sampling designs have the goal of creating a sample which is representative of the population
Random sample: an essential principle of statistical sampling
The use of chance to select a sample
Eg. dice, spinners, cards
Simple random sample (SRS): choosing individuals from a population in such a way that every individual in the population has an equal chance of being chosen and every possible sample has an equal chance of being chosen
The hat method is one type of SRS
Number the individuals on identical slips of paper
Place them in a hat
Mix thoroughly
Draw one at a time until the desired sample size has been selected
The numbers you draw represent the individuals that are chosen to be in the sample
Stratified random sample
More complicated than an SRS
Divide the population into groups of similar individuals based on something that might influence results
These groups are called strata (singular: stratum)
Select an SRS from each stratum and combine to form a full sample
Multiple hats; take a little from each
This way, you are guaranteed to have representation from each group
The individuals in each stratum are less varied than the population as a whole, but when you select an SRS from each stratum, you will definitely have people from each group
Can produce better information about the population than an SRS of the same size
Cluster sample
The population is naturally divided into groups that contain a mixture of individuals like mini populations, called clusters
Number each cluster, then choose an SRS from the clusters
Use all of the individuals in the chosen clusters for the sample
Multistage sample
Perform selection in stages, often done for national samples
Systematic sample
Order list according to some feature you want to ensure a range of responses from
Eg. height, GPA, income
Will be selecting every nth item from the ordered list
To figure out what n should be, take the total number in the list divided by the number you want to have in your sample
Starting point should be randomized
Will spread the sample more evenly throughout the population
Systematic Random Sample: a method in which sample members from a population are selected according to a random starting point and a fixed, periodic interval
Starting point should be randomized
Bias: when certain responses are systematically favored over others
When writing about bias, you must:
Identify the population and sample
Explain how the sampled individuals might differ from the general population
Explain how this leads to an overestimate or underestimate
Non-random sampling methods have the potential for bias because they do not use chance to select the individuals
Two such methods are voluntary response sampling and convenience sampling
Voluntary response bias: when a sample is comprised entirely of volunteers, the sample will typically not be representative of the population
Convenience bias: when those that are most convenient to access get selected for a sample
In addition to the two types covered above:
Undercoverage bias: when some groups of the population are left out in the process of choosing a sample
Response bias: when the behavior of the respondent or the interviewer causes bias
Can be intentional or unintentional
Nonresponse bias: when an individual chosen for a sample can’t be reached or chooses not to respond
Question wording: when the complexity, style, or order that a question in influences a response
Self-reported responses: when individuals inaccurately report their own data