Sampling Part I
1. Biased Sample
The design of a statistical study is biased if it systematically favors certain outcomes.
Selection of whichever individuals are easiest to reach is called convenience sampling.
A voluntary response sample chooses itself by responding to a general appeal (write-in or call-in opinion polls).
Convenience samples and voluntary response samples are often biased.
2. Convenience Sampling Effectiveness
Convenience sampling may work if the sample is a good representation of the population (e.g., using our class as a sample of college, or statistics, students in general).
3. Case Study: Write-In Opinion Polls
Ann Landers, an advice columnist whose daily column was published in over 1,200 newspapers in the United States and Canada, once asked the readers of her advice column, “If you had it to do over again, would you have children?”
She received nearly 10,000 responses, almost 70% saying no.
Is it true that 70% of parents regret having children?
This is an example of a voluntary response sample, in which the respondents are often those who have a strong and negative opinion.
“I believe the logical explanation for this phenomenon is (a) the hurt, angry and disenchanted tend to write more readily than the contented, and (b) people tell me things they wouldn’t dare tell anyone else.” - Ann Landers
4. Voluntary Response Samples Limitations
Voluntary response samples are often not a good representation of the population.
5. Case Study: WWII War Planes
During WWII, statisticians tried to find where to add armor to their warplanes.
After analyzing data on the warplanes, it was decided that armor should be added to these highlighted areas where the warplanes got hit.
What’s wrong with this conclusion?
To decide where to add protection, one might study areas that were most damaged on airplanes that crashed (rather than the planes that survived). This is called survivorship bias.
6. Case Study: Consumer Social Media Research
6.1 Reducing Bias
To reduce bias, a larger, more representative sample should be used.
Research firms often have a database of consumers (panel) who represent the population better and participate in surveys (often for payment).
Source: Pew Research Center
Many third-party organizations provide ad hoc sample data from databases and collect data on behalf of clients.
6.2 Ipsos KnowledgePanel
Ipsos KnowledgePanel is the oldest and largest probability-based online panel in the U.S.—with about 60,000 members with specific subpopulations available.
Panel data allows tracking changes and trends and is useful for studying long-term effects or behavior patterns.
7. Simple Random Sample (SRS)
A simple random sample (SRS) of size n consists of N individuals from the population chosen so every individual has an equal chance of being selected.
One way is to label each individual in the population with an index from .
Then generate a list of random numbers to decide which individuals to draw (see next slide).
There are other ways to draw an SRS. e.g. Assign each individual a random number between and . Pick the individual if the number < 0.5.
A sequence of random numbers can be generated by a computer to select individuals for inclusion in a sample. This is related to simulation, which we will cover in a few weeks.
8. Example: Generating an SRS of Hotels
8.1 Hotel List
01 Aloha Kai
02 Anchor Down
03 Banana Bay
04 Banyan Tree
05 Beach Castle
06 Best Western
07 Cabana
08 Captiva
09 Casa del Mar
10 Coconuts
11 Diplomat
12 Holiday Inn
13 Lime Tree
14 Outrigger
15 Palm Tree
16 Radisson
17 Ramada
18 Sandpiper
19 Sea Castle
20 Sea Club
21 Sea Grape
22 Sea Shell
23 Silver Beach
24 Sunset Beach
25 Tradewinds
26 Tropical Breeze
27 Tropical Shores
28 Veranda
8.2 Random Digits Selection Process
Random digits provided:
Groups of two digits for picking 4 hotels among 28 options:
(ignore, > 28)
(Beach Castle) - Selected
(Radisson) - Selected
(ignore, > 28)
(Ramada) - Selected
(ignore, > 28)
(already selected, ignore)
(ignore, > 28)
(ignore, > 28)
(already selected, ignore)
(ignore, > 28)
(ignore, > 28)
(ignore, > 28)
(ignore, > 28)
(ignore, > 28)
(ignore, > 28)
(Sea Club) - Selected
8.3 Resulting SRS
Our SRS of 4 hotels is: 05 Beach Castle, 16 Radisson, 17 Ramada, and 20 Sea Club.