Sampling Part I

1. Biased Sample
  • The design of a statistical study is biased if it systematically favors certain outcomes.

  • Selection of whichever individuals are easiest to reach is called convenience sampling.

  • A voluntary response sample chooses itself by responding to a general appeal (write-in or call-in opinion polls).

  • Convenience samples and voluntary response samples are often biased.

2. Convenience Sampling Effectiveness
  • Convenience sampling may work if the sample is a good representation of the population (e.g., using our class as a sample of college, or statistics, students in general).

3. Case Study: Write-In Opinion Polls
  • Ann Landers, an advice columnist whose daily column was published in over 1,200 newspapers in the United States and Canada, once asked the readers of her advice column, “If you had it to do over again, would you have children?”

  • She received nearly 10,000 responses, almost 70% saying no.

  • Is it true that 70% of parents regret having children?

  • This is an example of a voluntary response sample, in which the respondents are often those who have a strong and negative opinion.

    • “I believe the logical explanation for this phenomenon is (a) the hurt, angry and disenchanted tend to write more readily than the contented, and (b) people tell me things they wouldn’t dare tell anyone else.” - Ann Landers

4. Voluntary Response Samples Limitations
  • Voluntary response samples are often not a good representation of the population.

5. Case Study: WWII War Planes
  • During WWII, statisticians tried to find where to add armor to their warplanes.

  • After analyzing data on the warplanes, it was decided that armor should be added to these highlighted areas where the warplanes got hit.

  • What’s wrong with this conclusion?

  • To decide where to add protection, one might study areas that were most damaged on airplanes that crashed (rather than the planes that survived). This is called survivorship bias.

6. Case Study: Consumer Social Media Research
6.1 Reducing Bias
  • To reduce bias, a larger, more representative sample should be used.

  • Research firms often have a database of consumers (panel) who represent the population better and participate in surveys (often for payment).

  • Source: Pew Research Center

  • Many third-party organizations provide ad hoc sample data from databases and collect data on behalf of clients.

6.2 Ipsos KnowledgePanel
  • Ipsos KnowledgePanel is the oldest and largest probability-based online panel in the U.S.—with about 60,000 members with specific subpopulations available.

  • Panel data allows tracking changes and trends and is useful for studying long-term effects or behavior patterns.

7. Simple Random Sample (SRS)
  • A simple random sample (SRS) of size n consists of N individuals from the population chosen so every individual has an equal chance of being selected.

  • One way is to label each individual in the population with an index from 01,02,0301, 02, 03….

  • Then generate a list of random numbers to decide which individuals to draw (see next slide).

  • There are other ways to draw an SRS. e.g. Assign each individual a random number between 00 and 11. Pick the individual if the number < 0.5.

  • A sequence of random numbers can be generated by a computer to select individuals for inclusion in a sample. This is related to simulation, which we will cover in a few weeks.

8. Example: Generating an SRS of Hotels
8.1 Hotel List
  • 01 Aloha Kai

  • 02 Anchor Down

  • 03 Banana Bay

  • 04 Banyan Tree

  • 05 Beach Castle

  • 06 Best Western

  • 07 Cabana

  • 08 Captiva

  • 09 Casa del Mar

  • 10 Coconuts

  • 11 Diplomat

  • 12 Holiday Inn

  • 13 Lime Tree

  • 14 Outrigger

  • 15 Palm Tree

  • 16 Radisson

  • 17 Ramada

  • 18 Sandpiper

  • 19 Sea Castle

  • 20 Sea Club

  • 21 Sea Grape

  • 22 Sea Shell

  • 23 Silver Beach

  • 24 Sunset Beach

  • 25 Tradewinds

  • 26 Tropical Breeze

  • 27 Tropical Shores

  • 28 Veranda

8.2 Random Digits Selection Process
  • Random digits provided: 69051 64817 87174 09517 84534 06489 87201 9724569051\ 64817\ 87174\ 09517\ 84534\ 06489\ 87201\ 97245

  • Groups of two digits for picking 4 hotels among 28 options:

    • 6969 (ignore, > 28)

    • 0505 (Beach Castle) - Selected

    • 1616 (Radisson) - Selected

    • 4848 (ignore, > 28)

    • 1717 (Ramada) - Selected

    • 8787 (ignore, > 28)

    • 1717 (already selected, ignore)

    • 4040 (ignore, > 28)

    • 9595 (ignore, > 28)

    • 1717 (already selected, ignore)

    • 8484 (ignore, > 28)

    • 5353 (ignore, > 28)

    • 4040 (ignore, > 28)

    • 6464 (ignore, > 28)

    • 8989 (ignore, > 28)

    • 8787 (ignore, > 28)

    • 2020 (Sea Club) - Selected

8.3 Resulting SRS
  • Our SRS of 4 hotels is: 05 Beach Castle, 16 Radisson, 17 Ramada, and 20 Sea Club.