4.1 Samples and Surveys

**population: **entire group of individuals

**census**: data from all of the population

**sample: **a subset of the population

How to choose a sample:

Define the population

define what you want measured

Decide how to choose the sample (avoid bias, represent whole population)

Choosing a sample is less costly than an entire census. Samples work because of the laws of probability (if 3% of a sample like apples, it is likely 3% of the population will like apples) IF BIAS IS AVOIDED. It is important to choose the sample avoiding all biases possible.

Bad samples:

**convience sample: **easy-to-reach sample (first 100 people that walk into a store, your entire first period class)

**voluntary response sample: **open invitation to join the sample (a mail-in survey sent to everyone in the district for people to choose whether they want to complete the form or not)

How to Sample Well

**random sample: **chance process to select random individuals

**simple random sample: **the simplest method to choosing a random sample

Label all individuals in the population from #1 -

*n*, n being the total amount of the population.Read the table

*x*digits at a time, to the number of digits*n*has. (if individuals labeled #00-99, 2 digits at a time.)Ignore repeat numbers or values that are out of range.

Stop once you reach

*y*numbers, y being the desired sample count.Identify each number to the individual.

**stratified random sample: **more complex method of random sampling where a population is divided into similar groups (called **strata)**, and a select few from each group is chosen. this guarantees some level of representation which helps to bring more accurate data.

Classify population into groups of similar people (boys/girls, morning/evening/night, cat/dog)

Complete the SRS steps within each strata.

**cluster sample: **sampling based by location, into groups of **clusters**.

Complete a SRS to chose

*x*number of clusters, folowing the SRS steps.Interview EVERY individual in each ‘sample’ cluster.

**systematic random sample: **more methodic method of random sampling, where every nth member is chosen

Divide population into sample size

*n*.Sample size will be every

*nth*digit.Every individual has an equal chance of being selected.

**inference: **conclusions drawn from samples

**margin of error: **binds likely errors

random sampling avoids bias because the laws of probability allow for a trustworthy interference.

**What Can Go Wrong?**

**undercoverage**: some of population isn’t represented or cannot be chosen (sampling error). for example, older people are less likely to have mobile phones, so a survey carried out by calling mobile phone numbers will not properly represent the older generation.

**nonresponse: **some of the samples do not participate. if sample is ALREADY chosen, and people ACTIVELY choose not to send in their response.

**voluntary response bias: **people can CHOOSE to be part of the sample. this makes the data collected more extreme, as people with more middle-ground views will not feel strongly and take the time to respond.

**response bias: **systematic pattern of incorrect responses. (people will say to a survey they do not do drugs in fear of getting caught, so the data is not correct.)

**wording of questions: **most important influence on an answer. questions that “lead” to an answer are not accurate and cause bias. questions that are vague and hard to understand will result in guesses rather than concrete opinions.

**population: **entire group of individuals

**census**: data from all of the population

**sample: **a subset of the population

How to choose a sample:

Define the population

define what you want measured

Decide how to choose the sample (avoid bias, represent whole population)

Choosing a sample is less costly than an entire census. Samples work because of the laws of probability (if 3% of a sample like apples, it is likely 3% of the population will like apples) IF BIAS IS AVOIDED. It is important to choose the sample avoiding all biases possible.

Bad samples:

**convience sample: **easy-to-reach sample (first 100 people that walk into a store, your entire first period class)

**voluntary response sample: **open invitation to join the sample (a mail-in survey sent to everyone in the district for people to choose whether they want to complete the form or not)

How to Sample Well

**random sample: **chance process to select random individuals

**simple random sample: **the simplest method to choosing a random sample

Label all individuals in the population from #1 -

*n*, n being the total amount of the population.Read the table

*x*digits at a time, to the number of digits*n*has. (if individuals labeled #00-99, 2 digits at a time.)Ignore repeat numbers or values that are out of range.

Stop once you reach

*y*numbers, y being the desired sample count.Identify each number to the individual.

**stratified random sample: **more complex method of random sampling where a population is divided into similar groups (called **strata)**, and a select few from each group is chosen. this guarantees some level of representation which helps to bring more accurate data.

Classify population into groups of similar people (boys/girls, morning/evening/night, cat/dog)

Complete the SRS steps within each strata.

**cluster sample: **sampling based by location, into groups of **clusters**.

Complete a SRS to chose

*x*number of clusters, folowing the SRS steps.Interview EVERY individual in each ‘sample’ cluster.

**systematic random sample: **more methodic method of random sampling, where every nth member is chosen

Divide population into sample size

*n*.Sample size will be every

*nth*digit.Every individual has an equal chance of being selected.

**inference: **conclusions drawn from samples

**margin of error: **binds likely errors

random sampling avoids bias because the laws of probability allow for a trustworthy interference.

**What Can Go Wrong?**

**undercoverage**: some of population isn’t represented or cannot be chosen (sampling error). for example, older people are less likely to have mobile phones, so a survey carried out by calling mobile phone numbers will not properly represent the older generation.

**nonresponse: **some of the samples do not participate. if sample is ALREADY chosen, and people ACTIVELY choose not to send in their response.

**voluntary response bias: **people can CHOOSE to be part of the sample. this makes the data collected more extreme, as people with more middle-ground views will not feel strongly and take the time to respond.

**response bias: **systematic pattern of incorrect responses. (people will say to a survey they do not do drugs in fear of getting caught, so the data is not correct.)

**wording of questions: **most important influence on an answer. questions that “lead” to an answer are not accurate and cause bias. questions that are vague and hard to understand will result in guesses rather than concrete opinions.