Sampling and Surveys
Population and Samples
Population: whole collection of people that information is sought out from
Census: collection of data from every subject matter in population
Sample: specific bit of population where data is collected from
Whether certain data is seen as population or sample depends on who is viewing it.
The definition of population depends on what you’re trying to study.
Parameter
A number that describes certain characteristics of the population
This is a fixed number, but in reality we don’t know it’s value
Statistic
Known after sample is taken
This will vary from sample to sample
Statistic is used to estimate the parameter
Inference
The process of drawing conclusions based on a sample.
Bias
When one answer is systematically favored over another
It should be stated if the sample result is too high or low in comparison to the population
Voluntary Bias
When people respond to a general invitation to answers a question
Subjects self select
Causes fluctuation (bias) between people who are passionate and people who don’t care enough to respond
Convenience Sampling
When subjects are chosen based on convenience
Doesn’t fairly represent the whole population
Systematic Sample
Step 1: Randomly select starting point
Step 2: Select every kth item after
Quick and easy
Not every sample has an equal chance of being chosen
Can produce bias
Simple Random Sample
When everyone has an equal chance of being chosen.
Variability
How spread out statistics are among samples
Bigger samples tend to be less spread out than smaller ones
Studies should have low bias and low variability
Random sampling + Larger samples = fair results
Explaining Bias
Step 1: How will sampled individuals differ from the rest of the population
Step 2: How this results in overestimate or underestimate
Observational Studies
Retrospective: examines existing data or asks about past behaviors
Prospective: follows individuals to gain further data
Sampling Frame
List of individuals that sample is drawn from
Undercoverage bias and variable sampling errors can occur
Non Sampling Errors
Voluntary response bias
Undercoverage bias
Non response bias
Non random sampling methods
These can also be in a census. Increasing sample size won’t reduce error.
Non Response Bias
When some individuals can’t be contacted or when people lie, don’t respond, or partially respond.
Not undercoverage bias
Not voluntary response bias (selected by researchers as opposed to self selected)
Response Bias
When there are problems with the data gathering instrument
People can lie
People can answer something they don’t know
The question is confusing
Question Wording Bias
The wording of questions can influence answers.
Stratified Random Sampling
Step 1: Divide into distinct groups (stratas)- homogeneous grouping
Step 2: Use a method of random number selection and pick from each stratum to form the complete sample
Since people are divided into groups already all groups have an equal chance of being represented
Small differences within strata and large differences among statums
Helps to reduce variance in the data
Cluster Sampling
Classify groups of people that are next to each other
Use a method to randomly select one or more clusters
This helps to save time and money
The clusters should be different within but similar between
Population and Samples
Population: whole collection of people that information is sought out from
Census: collection of data from every subject matter in population
Sample: specific bit of population where data is collected from
Whether certain data is seen as population or sample depends on who is viewing it.
The definition of population depends on what you’re trying to study.
Parameter
A number that describes certain characteristics of the population
This is a fixed number, but in reality we don’t know it’s value
Statistic
Known after sample is taken
This will vary from sample to sample
Statistic is used to estimate the parameter
Inference
The process of drawing conclusions based on a sample.
Bias
When one answer is systematically favored over another
It should be stated if the sample result is too high or low in comparison to the population
Voluntary Bias
When people respond to a general invitation to answers a question
Subjects self select
Causes fluctuation (bias) between people who are passionate and people who don’t care enough to respond
Convenience Sampling
When subjects are chosen based on convenience
Doesn’t fairly represent the whole population
Systematic Sample
Step 1: Randomly select starting point
Step 2: Select every kth item after
Quick and easy
Not every sample has an equal chance of being chosen
Can produce bias
Simple Random Sample
When everyone has an equal chance of being chosen.
Variability
How spread out statistics are among samples
Bigger samples tend to be less spread out than smaller ones
Studies should have low bias and low variability
Random sampling + Larger samples = fair results
Explaining Bias
Step 1: How will sampled individuals differ from the rest of the population
Step 2: How this results in overestimate or underestimate
Observational Studies
Retrospective: examines existing data or asks about past behaviors
Prospective: follows individuals to gain further data
Sampling Frame
List of individuals that sample is drawn from
Undercoverage bias and variable sampling errors can occur
Non Sampling Errors
Voluntary response bias
Undercoverage bias
Non response bias
Non random sampling methods
These can also be in a census. Increasing sample size won’t reduce error.
Non Response Bias
When some individuals can’t be contacted or when people lie, don’t respond, or partially respond.
Not undercoverage bias
Not voluntary response bias (selected by researchers as opposed to self selected)
Response Bias
When there are problems with the data gathering instrument
People can lie
People can answer something they don’t know
The question is confusing
Question Wording Bias
The wording of questions can influence answers.
Stratified Random Sampling
Step 1: Divide into distinct groups (stratas)- homogeneous grouping
Step 2: Use a method of random number selection and pick from each stratum to form the complete sample
Since people are divided into groups already all groups have an equal chance of being represented
Small differences within strata and large differences among statums
Helps to reduce variance in the data
Cluster Sampling
Classify groups of people that are next to each other
Use a method to randomly select one or more clusters
This helps to save time and money
The clusters should be different within but similar between