AG

Chapter 1 stat pt 2

Section Objectives

  • Learn why random samples are important.

  • Understand how to create a simple random sample using random numbers.

  • See how to simulate a real-world event using random methods.

  • Describe different ways to sample, such as stratified sampling, cluster sampling, systematic sampling, multistage sampling, and convenience sampling.

Simple Random Sample

  • A simple random sample is like picking names out of a hat. If you want to choose n items from a larger group (population), a simple random sample means every single possible group of n items has an equal chance of being chosen.

    • Example: If you have 10 students and you want a simple random sample of 2, every pair of students (like Student A and Student B, or Student C and Student D) has the same chance of being picked.

Important Features of a Simple Random Sample

  • Fair Chance: Every possible sample of a specific size has an equal chance of being selected.

  • No Bias: The researcher doesn't influence which items are picked, making the selection unbiased.

  • Not Always Perfectly Diverse: Even with random selection, a sample might not perfectly represent the mix of the population. For instance, if you randomly pick 6 pets from a population of 10 cats and 10 dogs, you could, by chance, end up with all 6 cats, even though it's unlikely.

Guided Exercise 3 (Context)

  • Imagine a lottery where open protected land around cities gets money from ticket sales.

  • How to play: Pay 1, choose any six different numbers from 1 to 42.

  • Winning: If your six numbers match the winning six (selected by simple random sampling), you win at least 1.5 ext{ million} dollars.

Guided Exercise 3 (Questions)

  • Is the number 25 just as likely to be chosen in the winning group as the number 5?

  • Could all the winning numbers be even numbers?

  • Your friend always plays the numbers 1, 2, 3, 4, 5, 6. Could they ever win?

Guided Exercise 3: Solution (1 of 2)

  • (a) Solution: Yes. Since the winning numbers are chosen through simple random sampling, every number from 1 to 42 has an equal chance.

  • (b) Solution: Yes. Picking all six even numbers is one of the many possible combinations of six numbers, so it could happen.

Guided Exercise 3: Solution (2 of 2)

  • (c) Solution: Yes. The group of numbers 1, 2, 3, 4, 5, 6 is just as likely to be selected as any other specific group of six numbers out of the 5,245,786 possible combinations. (You'll learn how to calculate this in Section 4.3 using the formula inom{42}{6} = 5{,}245{,}786.)

Example 3: Use a random-number table to pick a random sample of 30 cars from a population of 500 cars

  • Goal: Select 30 cars from a total of 500 cars.

  • Step 1: Number the Population: Assign a unique number from 1 to 500 to each car.

  • Step 2: Choose a Random Starting Point: Use a random-number table (like Table 1 in Appendix II, which has many rows and columns of random digits). You can start anywhere, for example, row 15, block 5.

  • Step 3: Read Digits: Since car numbers go up to 500, we need to read the digits from the table in groups of three. For instance, if we see 992815964015..., we read them as 992, 815, 964, 015, and so on.

Example 3: Solution (1 of 3)

  • How to Select: We start at our chosen point (e.g., row 15, block 5) and list the three-digit numbers. We only keep numbers that are between 001 and 500 (inclusive). If a number is too high (like 992) or has already been chosen, we skip it.

  • Initial Digits Read (and how we handle them):

    • 992 (Discard: too high, exceeds 500)

    • 815 (Discard: too high)

    • 964 (Discard: too high)

    • 015 (Keep: this is car #15)

    • 221 (Keep: this is car #221)

    • 960 (Discard: too high)

    • 079 (Keep: this is car #79)

    • 961 (Discard: too high)

    • 053 (Keep: this is car #53)

    • 71... (Continue reading in groups of three)

Example 3: Solution (2 of 3)

  • Continuing the Process: You would continue reading along the random-number table, taking numbers between 001 and 500, and skipping any that are repeats until you have 30 unique car numbers. The specific digits you encounter will depend on your starting point.

Example 3: Solution (3 of 3)

  • Final Sample: The first 30 suitable and unique car numbers you find in the random-number table will form your random sample of 30 cars.

How to Draw a Random Sample

  1. Number Everything: Give every member of your population a unique number.

  2. Pick Random Numbers: Use a random-number table, a calculator with a random number function, or a computer program to generate random numbers within the range of your population numbers.

  3. Form the Sample: The population members whose numbers match the randomly selected numbers become your sample.

Simulation

  • A simulation is a way to create a numerical copy or model of a real-world event or process. It helps us understand what might happen without actually doing the real-world experiment.

Guided Exercise 4 (Simulation): Penny Toss (10 tosses)

  • Task: Use a random-number table to simulate what would happen if you tossed a fair coin 10 times.

  • Mapping Outcomes: We need to connect numbers to coin sides. A common way is:

    • Even digit = Heads (H)

    • Odd digit = Tails (T)

  • Procedure: Go to a random-number table (like Table 1 in Appendix II, block 3, row 2). List the first 10 single digits you see.

Guided Exercise 4 (2 of 2): Outcomes and Sequence

  • (a) Outcomes: Each coin toss can result in one of two outcomes: Heads (H) or Tails (T).

  • (b) Digit sequence used (first 10 digits from example starting point): 7, 1, 5, 4, 9, 4, 4, 8, 4, 3.

  • (c) Resulting sequence of outcomes for 10 tosses (using our mapping):

    • 7 (odd) -> T

    • 1 (odd) -> T

    • 5 (odd) -> T

    • 4 (even) -> H

    • 9 (odd) -> T

    • 4 (even) -> H

    • 4 (even) -> H

    • 8 (even) -> H

    • 4 (even) -> H

    • 3 (odd) -> T
      The sequence is: T, T, T, H, T, H, H, H, H, T.

  • (d) It's possible, though not very likely, to get any specific sequence of 10 coin tosses (like all Heads or all Tails). There are 2^{10} = 1024 possible sequences for 10 coin tosses.

Sampling Techniques (1 of 3)

Here are different ways to gather a sample:

  • Random sampling (Simple Random Sample): As we discussed, you pick a sample from the entire population where every sample of that size has an equal chance of being chosen. It's like drawing names from a single hat containing everyone in the population.

    • Example: Putting all 100 students' names into a hat and drawing out 10 names.

  • Stratified sampling: First, you divide your entire population into important subgroups (called strata) based on a shared characteristic (like age groups, income levels, or different school grades). Then, you take a simple random sample from each subgroup.

    • Example: To survey student opinions, you first divide students into 9th, 10th, 11th, and 12th graders (these are your strata). Then, you randomly pick 20 students from each grade level.

Sampling Techniques (2 of 3)

  • Systematic sampling: You number all population members sequentially. Then, you pick a random starting point and include every k^{ ext{th}} member from that point onwards.

    • Example: To survey customers, you decide to survey every 5th person who walks into a store. You randomly pick a starting point (e.g., the 3rd person) and then survey the 3rd, 8th, 13th, 18th person, and so on.

  • Cluster sampling: You divide the population into pre-existing groups (clusters), usually based on geography. You then randomly pick some of these clusters and include every single member from the chosen clusters in your sample.

    • Example: To survey opinions in a city, you divide the city into city blocks (your clusters). You randomly select 10 city blocks, and then you interview everyone living in those 10 selected blocks.

Sampling Techniques (3 of 3)

  • Multistage sampling: This method uses a combination of different sampling techniques in several stages to get smaller and smaller groups, eventually leading to your final sample.

    • Example: To survey nationwide opinions, you might first randomly select 5 states (cluster sampling). Then, within those states, you randomly select 10 counties (another stage of cluster sampling). Finally, within those counties, you might randomly select households (simple random sampling).

  • Convenience sampling: You create a sample by simply using data from people who are easily available or convenient to reach.

    • Example: Asking your friends, family, or the first 20 people you see on the street for their opinions because they are easy to access.

Sampling Frame and Undercoverage

  • A sampling frame is the actual list of individuals from which you actually select your sample. It's like the list of all eligible voters from which you might draw a survey sample.

  • Undercoverage happens when some members of the population are accidentally left out of your sampling frame. This means they had no chance of being selected for your sample.

    • Example: If you conduct a phone survey using a landline phone book, you'll miss everyone who only uses a cell phone, leading to undercoverage of cell-only users.

Sampling and Nonsampling Error

  • Sampling error: This is the natural difference you see between measurements from your sample and the actual measurements from the whole population. It happens because your sample is just a part of the population and might not perfectly represent it, even if you do everything correctly.

    • Example: In our cat/dog example, picking 6 random pets might result in 4 cats and 2 dogs, even though the population has an equal number of each. The difference between the sample's cat/dog ratio and the population's is sampling error.

  • Nonsampling error: This type of error is not due to chance in sampling. It comes from problems with how the study was designed, how data was collected, faulty tools, biased survey questions, or mistakes in recording data.

    • Example: A survey question like "Don't you agree that our city needs more parks?" could be biased (a nonsampling error) because it encourages a