# 4.1: Sampling and Surveys

## Introductory Terms

• Population: an entire group of people about which information is sought

• Sample: the actual part of the population studied in order to gather information

• Information from the sample is used to draw conclusions about the entire population

• Subset of total population

• Census: an attempt to contact everyone in a population

• Very difficult to obtain

• US only attempts national censuses once every 10 years

• It is only appropriate to generalize about a population if the sample is randomly selected or otherwise representative of that population

• A sample is only generalizable to the population from which it was selected

• Sample design: the method used to choose a sample from a population

• Sampling frame: the list of individuals from which a sample is drawn

• Biased sample/biased study: a sample or study which systematically favors certain individuals or outcomes

• Does not represent the population

• Consistently overestimates or underestimates the value sought

### Replacement Sampling

• Sampling with replacement: when an item from a population can be selected more than once

• Sampling without replacement: when an item from a population cannot be selected more than once

## Types of Sampling

### Relatively Ineffective Methods

• Convenience sampling: choosing individuals who are in close proximity or otherwise easy to reach

• Often produces unrepresentative data

• Almost guaranteed to show bias

• Voluntary response sample: individuals choose themselves as participants by responding to a general appeal

• Shows bias because people with strong opinions (often negative) are more likely to respond

• Eg. call-in opinion polls

### Generally Effective Methods (if used correctly)

• Good sampling designs have the goal of creating a sample which is representative of the population

• Random sample: an essential principle of statistical sampling

• The use of chance to select a sample

• Eg. dice, spinners, cards

• Simple random sample (SRS): choosing individuals from a population in such a way that every individual in the population has an equal chance of being chosen and every possible sample has an equal chance of being chosen

• The hat method is one type of SRS

• Number the individuals on identical slips of paper

• Place them in a hat

• Mix thoroughly

• Draw one at a time until the desired sample size has been selected

• The numbers you draw represent the individuals that are chosen to be in the sample

• Stratified random sample

• More complicated than an SRS

• Divide the population into groups of similar individuals based on something that might influence results

• These groups are called strata (singular: stratum)

• Select an SRS from each stratum and combine to form a full sample

• Multiple hats; take a little from each

• This way, you are guaranteed to have representation from each group

• The individuals in each stratum are less varied than the population as a whole, but when you select an SRS from each stratum, you will definitely have people from each group

• Can produce better information about the population than an SRS of the same size

• Cluster sample

• The population is naturally divided into groups that contain a mixture of individuals like mini populations, called clusters

• Number each cluster, then choose an SRS from the clusters

• Use all of the individuals in the chosen clusters for the sample

• Multistage sample

• Perform selection in stages, often done for national samples

• Systematic sample

• Order list according to some feature you want to ensure a range of responses from

• Eg. height, GPA, income

• Will be selecting every nth item from the ordered list

• To figure out what n should be, take the total number in the list divided by the number you want to have in your sample

• Starting point should be randomized

• Will spread the sample more evenly throughout the population

• Systematic Random Sample: a method in which sample members from a population are selected according to a random starting point and a fixed, periodic interval

• Starting point should be randomized

## Bias

• Bias: when certain responses are systematically favored over others

• When writing about bias, you must:

• Identify the population and sample

• Explain how the sampled individuals might differ from the general population

• Explain how this leads to an overestimate or underestimate

• Non-random sampling methods have the potential for bias because they do not use chance to select the individuals

• Two such methods are voluntary response sampling and convenience sampling

• Voluntary response bias: when a sample is comprised entirely of volunteers, the sample will typically not be representative of the population

• Convenience bias: when those that are most convenient to access get selected for a sample

### Types of Bias

• In addition to the two types covered above:

• Undercoverage bias: when some groups of the population are left out in the process of choosing a sample

• Response bias: when the behavior of the respondent or the interviewer causes bias

• Can be intentional or unintentional

• Nonresponse bias: when an individual chosen for a sample can’t be reached or chooses not to respond

• Question wording: when the complexity, style, or order that a question in influences a response

• Self-reported responses: when individuals inaccurately report their own data