4.1: Sampling and Surveys

## Introductory Terms

**Population**: an entire group of people about which information is sought**Sample**: the actual part of the population studied in order to gather informationInformation from the sample is used to draw conclusions about the entire population

Subset of total population

**Census**: an attempt to contact everyone in a populationVery difficult to obtain

US only attempts national censuses once every 10 years

It is only appropriate to generalize about a population if the sample is randomly selected or otherwise representative of that population

A sample is only generalizable to the population from which it was selected

**Sample design**: the method used to choose a sample from a population**Sampling frame**: the list of individuals from which a sample is drawn**Biased sample/biased study**: a sample or study which systematically favors certain individuals or outcomesDoes not represent the population

Consistently overestimates or underestimates the value sought

### Replacement Sampling

**Sampling with replacement**: when an item from a population can be selected more than once**Sampling without replacement**: when an item from a population cannot be selected more than once

## Types of Sampling

### Relatively Ineffective Methods

**Convenience sampling**: choosing individuals who are in close proximity or otherwise easy to reachOften produces unrepresentative data

Almost guaranteed to show bias

**Voluntary response sample**: individuals choose themselves as participants by responding to a general appealShows bias because people with strong opinions (often negative) are more likely to respond

Eg. call-in opinion polls

### Generally Effective Methods (if used correctly)

Good sampling designs have the goal of creating a sample which is representative of the population

**Random sample**: an essential principle of statistical samplingThe use of chance to select a sample

Eg. dice, spinners, cards

**Simple random sample (SRS)**: choosing individuals from a population in such a way that every individual in the population has an equal chance of being chosen and every possible sample has an equal chance of being chosenThe hat method is one type of SRS

Number the individuals on identical slips of paper

Place them in a hat

Mix thoroughly

Draw one at a time until the desired sample size has been selected

The numbers you draw represent the individuals that are chosen to be in the sample

**Stratified random sample**More complicated than an SRS

Divide the population into groups of similar individuals based on something that might influence results

These groups are called

**strata**(singular:**stratum**)

Select an SRS from each stratum and combine to form a full sample

Multiple hats; take a little from each

This way, you are guaranteed to have representation from each group

The individuals in each stratum are less varied than the population as a whole, but when you select an SRS from each stratum, you will definitely have people from each group

Can produce better information about the population than an SRS of the same size

**Cluster****sample**The population is naturally divided into groups that contain a mixture of individuals like mini populations, called clusters

Number each cluster, then choose an SRS from the clusters

Use all of the individuals in the chosen clusters for the sample

**Multistage sample**Perform selection in stages, often done for national samples

**Systematic sample**Order list according to some feature you want to ensure a range of responses from

Eg. height, GPA, income

Will be selecting every nth item from the ordered list

To figure out what n should be, take the total number in the list divided by the number you want to have in your sample

Starting point should be randomized

Will spread the sample more evenly throughout the population

**Systematic Random Sample**: a method in which sample members from a population are selected according to a random starting point and a fixed, periodic intervalStarting point should be randomized

## Bias

**Bias**: when certain responses are systematically favored over othersWhen writing about bias, you must:

Identify the population and sample

Explain how the sampled individuals might differ from the general population

Explain how this leads to an overestimate or underestimate

Non-random sampling methods have the potential for bias because they do not use chance to select the individuals

Two such methods are voluntary response sampling and convenience sampling

**Voluntary response bias**: when a sample is comprised entirely of volunteers, the sample will typically not be representative of the population**Convenience bias**: when those that are most convenient to access get selected for a sample

### Types of Bias

In addition to the two types covered above:

**Undercoverage bias**: when some groups of the population are left out in the process of choosing a sample**Response bias**: when the behavior of the respondent or the interviewer causes biasCan be intentional or unintentional

**Nonresponse bias**: when an individual chosen for a sample can’t be reached or chooses not to respond**Question wording**: when the complexity, style, or order that a question in influences a response**Self-reported responses**: when individuals inaccurately report their own data

# 4.1: Sampling and Surveys

## Introductory Terms

**Population**: an entire group of people about which information is sought**Sample**: the actual part of the population studied in order to gather informationInformation from the sample is used to draw conclusions about the entire population

Subset of total population

**Census**: an attempt to contact everyone in a populationVery difficult to obtain

US only attempts national censuses once every 10 years

It is only appropriate to generalize about a population if the sample is randomly selected or otherwise representative of that population

A sample is only generalizable to the population from which it was selected

**Sample design**: the method used to choose a sample from a population**Sampling frame**: the list of individuals from which a sample is drawn**Biased sample/biased study**: a sample or study which systematically favors certain individuals or outcomesDoes not represent the population

Consistently overestimates or underestimates the value sought

### Replacement Sampling

**Sampling with replacement**: when an item from a population can be selected more than once**Sampling without replacement**: when an item from a population cannot be selected more than once

## Types of Sampling

### Relatively Ineffective Methods

**Convenience sampling**: choosing individuals who are in close proximity or otherwise easy to reachOften produces unrepresentative data

Almost guaranteed to show bias

**Voluntary response sample**: individuals choose themselves as participants by responding to a general appealShows bias because people with strong opinions (often negative) are more likely to respond

Eg. call-in opinion polls

### Generally Effective Methods (if used correctly)

Good sampling designs have the goal of creating a sample which is representative of the population

**Random sample**: an essential principle of statistical samplingThe use of chance to select a sample

Eg. dice, spinners, cards

**Simple random sample (SRS)**: choosing individuals from a population in such a way that every individual in the population has an equal chance of being chosen and every possible sample has an equal chance of being chosenThe hat method is one type of SRS

Number the individuals on identical slips of paper

Place them in a hat

Mix thoroughly

Draw one at a time until the desired sample size has been selected

The numbers you draw represent the individuals that are chosen to be in the sample

**Stratified random sample**More complicated than an SRS

Divide the population into groups of similar individuals based on something that might influence results

These groups are called

**strata**(singular:**stratum**)

Select an SRS from each stratum and combine to form a full sample

Multiple hats; take a little from each

This way, you are guaranteed to have representation from each group

The individuals in each stratum are less varied than the population as a whole, but when you select an SRS from each stratum, you will definitely have people from each group

Can produce better information about the population than an SRS of the same size

**Cluster****sample**The population is naturally divided into groups that contain a mixture of individuals like mini populations, called clusters

Number each cluster, then choose an SRS from the clusters

Use all of the individuals in the chosen clusters for the sample

**Multistage sample**Perform selection in stages, often done for national samples

**Systematic sample**Order list according to some feature you want to ensure a range of responses from

Eg. height, GPA, income

Will be selecting every nth item from the ordered list

To figure out what n should be, take the total number in the list divided by the number you want to have in your sample

Starting point should be randomized

Will spread the sample more evenly throughout the population

**Systematic Random Sample**: a method in which sample members from a population are selected according to a random starting point and a fixed, periodic intervalStarting point should be randomized

## Bias

**Bias**: when certain responses are systematically favored over othersWhen writing about bias, you must:

Identify the population and sample

Explain how the sampled individuals might differ from the general population

Explain how this leads to an overestimate or underestimate

Non-random sampling methods have the potential for bias because they do not use chance to select the individuals

Two such methods are voluntary response sampling and convenience sampling

**Voluntary response bias**: when a sample is comprised entirely of volunteers, the sample will typically not be representative of the population**Convenience bias**: when those that are most convenient to access get selected for a sample

### Types of Bias

In addition to the two types covered above:

**Undercoverage bias**: when some groups of the population are left out in the process of choosing a sample**Response bias**: when the behavior of the respondent or the interviewer causes biasCan be intentional or unintentional

**Nonresponse bias**: when an individual chosen for a sample can’t be reached or chooses not to respond**Question wording**: when the complexity, style, or order that a question in influences a response**Self-reported responses**: when individuals inaccurately report their own data