Chapter 7: Sampling and Sampling Distributions

Learning Objectives (LOs)

  • LO 7.1: Explain common sample biases.

  • LO 7.2: Describe various sampling methods.

  • LO 7.3: Describe the sampling distribution of the sample mean.

  • LO 7.4: Explain the importance of the central limit theorem.

  • LO 7.5: Describe the sampling distribution of the sample proportion.

  • LO 7.6: Construct and interpret control charts for numerical and categorical variables.

Introductory Case: Marketing Iced Coffee

  • A coffee shop owner, Camila, implements a promotion offering half-price iced coffee from 1 PM to 4 PM to increase customer traffic during slow hours.

  • Key statistics from current record review:

    • Average spending on iced coffee: $4.18

    • Standard deviation of spending: $0.84

    • Percentage of iced-coffee customers who are women: 43%

    • Percentage of iced-coffee customers who are teenage girls: 21%

  • Post-promotion survey of 50 customers conducted:

    • Camila seeks to calculate the probability of:

    • Average customer spending being $4.26 or more.

    • 46% or more of customers being women.

    • 34% or more of customers being teenage girls.

7.1 Sampling

  • A significant aspect of statistics is statistical inference, which includes:

    • Estimating population parameters.

    • Testing hypotheses about population parameters.

  • Definitions:

    • Population: All items of interest in a statistical problem. If access to the population was possible, parameters would be known and no inference would be needed.

    • Sample: A subset of the population, from which sample statistics are used to infer knowledge about unknown population parameters.

  • Credibility of Statistical Inference: The quality of the sample is crucial as the sample/survey must be representative of the population.

7.1 Sampling Biases

  • Bias: The tendency of a sample statistic to systemically overestimate or underestimate a population parameter, often caused by unrepresentative samples.

    • Selection Bias: Systematic underrepresentation of certain groups in the sample.

    • Nonresponse Bias: Systematic differences in preferences between surveyed respondents and nonrespondents.

    • Social-Desirability Bias: Variations between a group’s “socially acceptable” responses and their actual choices.

7.1 Good Samples

  • Characteristics of a “good” sample: must be representative of the population.

  • Sampling Methods:

    • Simple Random Sample: Each of the n observations has an equal probability of being selected. Most statistical methods presume a simple random sample.

    • Stratified Random Sample: Divides the population into mutually exclusive and collectively exhaustive groups (strata) and defines samples from each stratum, proportionate to its population size.

Advantages of Stratified Random Sampling
  • Ensures representation of subdivisions of interest within the population.

  • Increases precision of parameter estimates compared to simple random sampling.

7.1 Cluster Sampling

  • Cluster Sampling: The population is divided into mutually exclusive clusters; all observations from randomly selected clusters are included.

Advantages and Disadvantages of Cluster Sampling
  • Advantages:

    • Cost-effective compared to other sampling methods.

    • Useful for populations with natural clusters (e.g., geographic regions).

    • Effective when it is challenging to compile a complete list of population members.

  • Disadvantages:

    • Generally provides less precision than simple random or stratified sampling.

7.1 Challenges in Sampling

  • Practical difficulties in obtaining a truly random sample mean the sample data may not be free of error:

    • Must be sampled from the correct population.

    • Should be void of biases.

    • Must be properly collected, analyzed, and reported.

7.2 The Sampling Distribution of the Sample Mean

Definitions and Concepts
  • Population Parameter: A constant value that may be unknown. Characteristics include:

    • Population mean for quantitative variables.

    • Population proportion for categorical variables.

  • The statistic is a random variable whose value is contingent on the selection of the random sample.

  • Estimator (Point Estimator): A statistic utilized to estimate a population parameter.

    • For example, ar{x} is an estimator of the population mean.

  • An estimate represents a specific value of the estimator.

The Sampling Distribution of the Sample Mean
  • The sampling distribution of the sample mean ar{x} is the probability distribution derived from all feasible samples of a given size from the population.

  • Procedure:

    • Draw one sample of size n, compute its sample mean.

    • Repeat the process numerous times.

    • The result is the frequency distribution of the sample means.

Expected Value and Variance
  • Represent certain population characteristics. Key points include:

    • The average of sample means equals the average of all individual observations in the population.

    • Unbiased Estimator: The expected value of an estimator equals the corresponding population parameter.

  • Variance of ar{x} :

    • Each sample captures both high and low values that may offset each other.

Standard Error of the Sample Mean
  • The standard deviation of ar{x} is termed the standard error of the sample mean.

Examples and Applications
  • Illustrative example highlights the pizza-making endeavor where the size of pizzas is normally distributed with a mean of 16 inches and a standard deviation of 0.8 inches.

  • Determine expected value and standard error of the sample mean based on random samples of sizes 2 and 4:

    • Compute expected values and their variances.

  • Comparison:

    • Increases in sample size lead to lower standard errors.

The Central Limit Theorem (CLT)
  • States that:

    • For any population X with mean ext{E}(X) and standard deviation ext{SD}(X) , the sampling distribution of ar{x} will approximate a normal distribution when n is large enough (usually n ext{ ≥ } 30 ).

  • The normality of the sampling distribution strengthens as n increases.

  • Applications illustrate scenarios where the population may not be normally distributed, utilizing the Central Limit Theorem for inferences.

7.3 The Sampling Distribution of the Sample Proportion

Key Concepts
  • Focuses on the population proportion p and the binomial distribution of successes X in n trials with probability p .

  • Sample Proportion: Expected value of the sampling distribution of ar{p} is an unbiased estimator of the population proportion.

Examples Illustrating Sampling Distribution
  • A study reveals that 55% of British firms experienced cyber-attacks last year; calculations of sample proportion and standards for differing sample sizes (100 and 200) demonstrate decreasing standard errors with larger samples.

Central Limit Theorem for Sample Proportion
  • States that for any population proportion p , the sampling distribution of ar{p} approaches a normal shape when:

    • n p ext{ ≥ } 5 and n(1 - p) ext{ ≥ } 5 .

    • Larger sample sizes are necessary for approximations when p diverges from 0.50.

  • Examples link back to Camila's promotion analysis, calculating probabilities regarding customer proportions.

7.4 Statistical Quality Control

Importance in Business
  • Effective firms prioritize quality control in products and services, involving statistical techniques for enhancing production capabilities.

Methods of Sampling
  • Acceptance Sampling: Inspects a subset of products upon production completion; defects may be discarded or repaired, which incurs costs and risk of undetected defective products.

  • Detection Approach: Monitors production process ensuring conformance; seeks prompt adjustment to prevent widespread defects.

Control Chart Usage
  • A control chart visualizes sample estimates, indicating process stability:

    • Samples between upper and lower control limits indicate an in-control process.

    • Identifying trends or shifts prompts investigation.

  • Different types of charts:

    • Numerical Variable Charts: Focus on monitoring central tendency (X-bar), variability (R-chart, s-chart).

    • Categorical Variable Charts: Monitor proportion (p-chart) and counts (c-chart) of defects.

Control Limits in Statistical Quality Control
  • Parameters for control charts defined by:

    • Upper Control Limit (UCL): ext{Expected Value} + (3 imes ext{Standard Error})

    • Lower Control Limit (LCL): ext{Expected Value} - (3 imes ext{Standard Error})

  • Validity for control charts hinges on the assumption of approximately normal sampling distributions.

Continuous Monitoring of Production Processes
  • Example in milk jug production illustrates active monitoring, with calculations reaffirming its operational state based on control limits.

  • The analysis of a firm producing 4-liter cans demonstrates the adherence to quality standards and acceptable process flaws identified.