The Practice of Statistics
What is a population?
the entire group of individuals we want information about
What is a sample?
a subset of individuals n the population from which we actually collect data
What is a census?
data from every individual in the population
What is a sample survey?
a study that collects data from a sample that is chosen to represent a specific population
What are the steps for planning a sample survey?
Decide what population you want to describe
Decide what you want to measure
Decide how to choose a sample from the population
What does poor sampling lead to in your results?
bias
What is bias?
using a value that will consistently overestimate or underestimate the value you want to know
What is convenience sampling?
choosing individuals who are easy to reach
What is voluntary response sampling?
allowing people to choose to be in the sample by responding to a general invitation
Why might voluntary response sampling show bias?
because people will strong feelings (often in the same direction) are most likely to respond
How do you ensure that the conclusion of your study doesn’t become rendered invalid?
by doing everything in your power to ensure that the sample was collected truly, utterly, and completely randomly
What is random sampling?
a chance process to determine which members of a population are included in the sample
What is a simple random sample (SRS)?
a sample chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample
Why might you choose a sample by chance?
to avoid bias affecting the results
How can you choose an SRS?
using technology or Table D
What are the 3 steps to choosing an SRS?
Label
Randomize
Select
What is N in regard to SRS?
the number of individuals in the population
What is n in regard to SRS?
sample size
What is the Label step of choosing an SRS with technology?
Give each individual in the population a distinct numerical label from 1 to N
What is the Randomize step of choosing an SRS with technology?
Use a random number generator to obtain n different integers from 1 to N
What is the Select step of choosing an SRS with technology?
Choose the individuals that correspond to the randomly selected integers
How do you find SRS using a calculator?
Math → PRB → 5: randomInt(1, N)
What is the Label step of choosing an SRS with Table D?
Give each member of the population a numerical label with the same number of digits. Use as few digits as possible
What is the Randomize step of choosing an SRS with Table D?
Read consecutive groups of digits of the appropriate length from left to right across a line in Table D. Ignore any groups of digits that wasn’t used as a label or that duplicates a label already in the sample. Stop when you have chosen n different labels
What is the Select step of choosing an SRS with Table D?
Choose the individuals that correspond to the randomly selected integers
What is a table of random digits?
a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these two properties:
each entry in the table is equally likely to be any of the 10 digits (0-9)
the entries are independent of each other, and knowledge of one part of the table gives no information about any other part
What are strata?
groups of similar groups
What is a stratified random sample?
a sample that takes an SRS within each group and combines the SRS’s into one overall sample
Why is it beneficial to use a stratified random sample?
it provides a more precise estimate with less variability
How do you choose a variable to stratify by?
pick the variable that is the best predictor of what you’re measuring
When is it preferred to use cluster sampling instead of SRS or stratified random sampling?
when the populations are large and spread over a wide area
What is a cluster?
a group of individuals that are located near each other
What is a cluster sampling?
randomly choosing clusters and including each member of the selected clusters in the sample
Why are cluster samples use?
for practical reasons like saving time and money
When do cluster samples work best?
when the cluster looks like the population, just on a smaller scale
How do you describe stratified random sampling?
Define the strata
obtain an SRS of [ n/number of strata] from each [strata]
result – stratified random sample of n students
How do you describe cluster sampling?
Use […] as clusters, assuming x individuals per [cluster]
Randomly selected [n/number of individuals per cluster]
Result – the n individuals will be our sample
What is the drawback of SRS?
there is a large amount of variability, and it is time-consuming
What is the drawback of stratified random sampling?
they might not be many individuals for some strata, which can influence the result
What is the drawback of cluster sampling?
the clusters used may not be good representations of the entire population
What is systemic random sampling?
selecting a sample from an ordered arrangement of the population by randomly selecting one of the first k individuals and every kth individual thereafter
What can affect sample surveys in addition to sampling variability?
errors
What do good sampling techniques include?
the art of reducing all sources of error
When does undercoverage occur?
when some members of the population are less likely to be chosen or cannot be chosen in a sample
When does nonresponse occur?
when an individual chosen for the sample can’t be contacted or refuses to participate
When does response bias occur?
when there is a systematic pattern of inaccurate answers to a survey question
What is the most important influence on the answers given to a sample survey?
the wording of questions
Why should you rely on random sampling?
to avoid bias in selecting samples from the lists of available individuals
the laws of probability allow trustworthy inference about the population
What is a margin of error?
how far we expect the sample proportion to be from the actual
What is the benefit of increasing the sample size?
increased precision (but not accuracy)
What are errors in design methods (designer flaw)?
convenience sampling
voluntary response sampling
What are errors causing response bias (response flaw)?
undercoverage
nonresponse
wording of questions