Statistical Sampling Techniques - Key Concepts and Applications

Sampling Overview

  • Sampling is the process of selecting a subset (the sample) from a larger population.
  • It is practical and cost-effective because studying an entire population is often time-consuming and expensive.
  • Real-world examples:
    • Government Census conducted roughly every 10 years to represent the entire population.
    • Presidential elections use opinion polls with samples typically around 10^3-10^4 people to represent national views.
    • Manufacturers sample products (e.g., 16-ounce cans) to verify content accuracy and consistency across the batch.
  • The sample is examined, information is interpreted, and conclusions are applied to the entire population.
  • Purpose: to draw inferences about the population from the sample.

Sampling Requirements

  • Careful sample selection is needed so the population is represented well and results are meaningful.
  • All samples should be:
    • Representative: the sample has the same relevant characteristics as the population and does not favor any subgroup.
    • Random: selected by chance, with no intentional bias or ulterior motive.
  • Five sampling techniques covered in this lesson (considered as individual techniques):
    • Random
    • Systematic
    • Convenience
    • Stratified
    • Cluster
  • In practice, researchers often use a combination of two or more techniques to improve representation and randomness.

Statistical Sampling Techniques

  • RANDOM (Simple Random Sample)

    • Each sample of the same size has an equal opportunity of being chosen.
    • Initially, every member of the population has an equal chance of being selected for the sample.
    • Common examples:
    • Drawing names from a hat.
    • Using a Random Number Generator.
    • Mathematical note: for a population of size N and a desired sample size n, the number of possible samples is inom{N}{n}, and each specific sample has probability rac{1}{inom{N}{n}}.
  • SYSTEMATIC

    • Select a random starting point, then select every L^{th} subject in the population.
    • Simple to use and thus commonly employed.
    • Description cue: look for a "th" in the description (e.g., every 50th).
    • Example form: let the sampling interval be L; choose starting index r ext{ with } 1 \le r \le L, then select indices r, r+L, r+2L, \dots until the desired sample size is reached.
    • Quick example from the material: selecting every 50^{th} student starting from a randomly chosen starting point to obtain 75 students.
  • CONVENIENCE

    • Use results that are readily available or easily accessible.
    • Examples: family members, students in a classroom, mall shoppers.
    • CAUTION: can lead to a non-representative or biased sample because it may over- or under-represent certain groups.
  • STRATIFIED

    • Divide the population into at least two distinct groups (strata) that share a common characteristic(s).
    • From each stratum, draw some subjects.
    • Benefits:
    • Results in a more representative sample.
    • Helps preserve certain characteristics of the population within the sample.
  • CLUSTER

    • Divide the population into groups (clusters).
    • Randomly select some clusters, then collect data from ALL members of the selected clusters.
    • Used extensively by government and private research organizations.
    • Example: exit polls (sampling whole clusters like a geographic area or polling location).

Examples and Scenarios (Applying the Techniques)

  • Scenario 1 (Stratified): Determine the average tuition paid by San Jose State undergraduate students per Fall semester.

    • Sample: 100 undergraduates organized by class (freshman, sophomore, junior, senior); 25 from each class are selected.
    • Technique: Stratified because the population is divided by class and samples are drawn from each class.
    • Alternative described: Use a random number generator to select a student from the entire alphabetical list; then select every 50th student until 75 students are included.
    • Classification in notes: Stratified (followed by systematic approach in the alternative).
  • Scenario 2 (Systematic): Same tuition study, different sampling method.

    • Method: Completely random selection of 75 students using a random process.
    • Then, the freshman/sophomore/junior/senior levels are numbered (1,2,3,4). A random number generator picks two levels; all students in those two levels are included in the sample.
    • Classification in notes: Cluster (groups are years; all students in selected years are included).
  • Scenario 3 (Convenience): Administrative assistant stands by the library and asks the first 100 undergraduate students encountered about their tuition.

    • Method: Convenience sampling due to easy access.
    • Classification in notes: Convenience.
  • Scenario 4 (Stratified): Soccer coach forms a recreational team by age groups.

    • Ages: 8-10, 11-12, 13-14.
    • Selection: 6 players from 8-10, 7 players from 11-12, 3 players from 13-14.
    • Classification in notes: Stratified by age groups.
  • Scenario 5 (Cluster): Pollster interviews all HR personnel in five different high-tech companies.

    • Method: Cluster sampling by company.
  • Scenario 6 (Stratified): High school educational researcher interviews 50 female teachers and 50 male teachers.

    • Method: Stratified by gender.
  • Scenario 7 (Systematic): Medical researcher interviews every third cancer patient from a hospital list.

    • Method: Systematic with interval L=3.
  • Scenario 8 (Random): High school counselor generates 50 random numbers and selects students whose names correspond to those numbers.

    • Method: Random sampling.
  • Scenario 9 (Convenience): A student interviews classmates in algebra class to determine the average number of jeans owned per student.

    • Method: Convenience sampling.

Key Takeaways and Connections

  • Sampling aims to represent the population while saving time and resources.
  • The five core techniques each have strengths and weaknesses:
    • Random: unbiased probability of selection.
    • Systematic: simple and fast, but potential biases if there is an underlying pattern aligned with the interval.
    • Convenience: highly practical but prone to bias.
    • Stratified: improves representativeness by preserving subgroup characteristics.
    • Cluster: practical for large populations and when natural groups exist; can be efficient but may introduce cluster-level bias if clusters are not representative.
  • Real-world relevance:
    • Government censuses inform policy and resource allocation.
    • Public opinion polls guide political strategy and media coverage.
    • Quality control in manufacturing relies on sampling to ensure product consistency.

Formulas and Notation Highlights

  • Simple Random Sampling: probability of selecting a specific sample is rac{1}{inom{N}{n}}.
  • Systematic Sampling (conceptual):
    • Choose a random starting point r \in \{1,2,\dots, L\} and a fixed interval L; select indices i_j = r + (j-1)L, \ j = 1,2,\dots, t where t is the number of selections needed.
  • General relationships:
    • Population size: N
    • Desired sample size: n
    • Sampling interval (systematic): L (e.g., 50 in the provided example)