Chapter 7: Sampling and Sampling Distributions

Learning Objectives (LOs)

LO 7.1: Explain common sample biases.
LO 7.2: Describe various sampling methods.
LO 7.3: Describe the sampling distribution of the sample mean.
LO 7.4: Explain the importance of the central limit theorem.
LO 7.5: Describe the sampling distribution of the sample proportion.
LO 7.6: Construct and interpret control charts for numerical and categorical variables.

Introductory Case: Marketing Iced Coffee

A coffee shop owner, Camila, implements a promotion offering half-price iced coffee from 1 PM to 4 PM to increase customer traffic during slow hours.
Key statistics from current record review:
- Average spending on iced coffee: $4.18
- Standard deviation of spending: $0.84
- Percentage of iced-coffee customers who are women: 43%
- Percentage of iced-coffee customers who are teenage girls: 21%
Post-promotion survey of 50 customers conducted:
- Camila seeks to calculate the probability of:
- Average customer spending being $4.26 or more.
- 46% or more of customers being women.
- 34% or more of customers being teenage girls.

7.1 Sampling

A significant aspect of statistics is statistical inference, which includes:
- Estimating population parameters.
- Testing hypotheses about population parameters.
Definitions:
- Population: All items of interest in a statistical problem. If access to the population was possible, parameters would be known and no inference would be needed.
- Sample: A subset of the population, from which sample statistics are used to infer knowledge about unknown population parameters.
Credibility of Statistical Inference: The quality of the sample is crucial as the sample/survey must be representative of the population.

7.1 Sampling Biases

Bias: The tendency of a sample statistic to systemically overestimate or underestimate a population parameter, often caused by unrepresentative samples.
- Selection Bias: Systematic underrepresentation of certain groups in the sample.
- Nonresponse Bias: Systematic differences in preferences between surveyed respondents and nonrespondents.
- Social-Desirability Bias: Variations between a group’s “socially acceptable” responses and their actual choices.

7.1 Good Samples

Characteristics of a “good” sample: must be representative of the population.
Sampling Methods:
- Simple Random Sample: Each of the n observations has an equal probability of being selected. Most statistical methods presume a simple random sample.
- Stratified Random Sample: Divides the population into mutually exclusive and collectively exhaustive groups (strata) and defines samples from each stratum, proportionate to its population size.

Advantages of Stratified Random Sampling

Ensures representation of subdivisions of interest within the population.
Increases precision of parameter estimates compared to simple random sampling.

7.1 Cluster Sampling

Cluster Sampling: The population is divided into mutually exclusive clusters; all observations from randomly selected clusters are included.

Advantages and Disadvantages of Cluster Sampling

Advantages:
- Cost-effective compared to other sampling methods.
- Useful for populations with natural clusters (e.g., geographic regions).
- Effective when it is challenging to compile a complete list of population members.
Disadvantages:
- Generally provides less precision than simple random or stratified sampling.

7.1 Challenges in Sampling

Practical difficulties in obtaining a truly random sample mean the sample data may not be free of error:
- Must be sampled from the correct population.
- Should be void of biases.
- Must be properly collected, analyzed, and reported.

7.2 The Sampling Distribution of the Sample Mean

Definitions and Concepts

Population Parameter: A constant value that may be unknown. Characteristics include:
- Population mean for quantitative variables.
- Population proportion for categorical variables.
The statistic is a random variable whose value is contingent on the selection of the random sample.
Estimator (Point Estimator): A statistic utilized to estimate a population parameter.
- For example, $\bar{x}$ is an estimator of the population mean.
An estimate represents a specific value of the estimator.

The Sampling Distribution of the Sample Mean

The sampling distribution of the sample mean $\bar{x}$ is the probability distribution derived from all feasible samples of a given size from the population.
Procedure:
- Draw one sample of size n, compute its sample mean.
- Repeat the process numerous times.
- The result is the frequency distribution of the sample means.

Expected Value and Variance

Represent certain population characteristics. Key points include:
- The average of sample means equals the average of all individual observations in the population.
- Unbiased Estimator: The expected value of an estimator equals the corresponding population parameter.
Variance of $\bar{x}$ :
- Each sample captures both high and low values that may offset each other.

Standard Error of the Sample Mean

The standard deviation of $\bar{x}$ is termed the standard error of the sample mean.

Examples and Applications

Illustrative example highlights the pizza-making endeavor where the size of pizzas is normally distributed with a mean of 16 inches and a standard deviation of 0.8 inches.
Determine expected value and standard error of the sample mean based on random samples of sizes 2 and 4:
- Compute expected values and their variances.
Comparison:
- Increases in sample size lead to lower standard errors.

The Central Limit Theorem (CLT)

States that:
- For any population X with mean $ext{E}(X)$ and standard deviation $ext{SD}(X)$ , the sampling distribution of $\bar{x}$ will approximate a normal distribution when n is large enough (usually $n ext{ ≥ } 30$ ).
The normality of the sampling distribution strengthens as n increases.
Applications illustrate scenarios where the population may not be normally distributed, utilizing the Central Limit Theorem for inferences.

7.3 The Sampling Distribution of the Sample Proportion

Key Concepts

Focuses on the population proportion $p$ and the binomial distribution of successes $X$ in $n$ trials with probability $p$ .
Sample Proportion: Expected value of the sampling distribution of $\bar{p}$ is an unbiased estimator of the population proportion.

Examples Illustrating Sampling Distribution

A study reveals that 55% of British firms experienced cyber-attacks last year; calculations of sample proportion and standards for differing sample sizes (100 and 200) demonstrate decreasing standard errors with larger samples.

Central Limit Theorem for Sample Proportion

States that for any population proportion $p$ , the sampling distribution of $\bar{p}$ approaches a normal shape when:
- $n p ext{ ≥ } 5$ and $n(1 - p) ext{ ≥ } 5$ .
- Larger sample sizes are necessary for approximations when p diverges from 0.50.
Examples link back to Camila's promotion analysis, calculating probabilities regarding customer proportions.

7.4 Statistical Quality Control

Importance in Business

Effective firms prioritize quality control in products and services, involving statistical techniques for enhancing production capabilities.

Methods of Sampling

Acceptance Sampling: Inspects a subset of products upon production completion; defects may be discarded or repaired, which incurs costs and risk of undetected defective products.
Detection Approach: Monitors production process ensuring conformance; seeks prompt adjustment to prevent widespread defects.

Control Chart Usage

A control chart visualizes sample estimates, indicating process stability:
- Samples between upper and lower control limits indicate an in-control process.
- Identifying trends or shifts prompts investigation.
Different types of charts:
- Numerical Variable Charts: Focus on monitoring central tendency (X-bar), variability (R-chart, s-chart).
- Categorical Variable Charts: Monitor proportion (p-chart) and counts (c-chart) of defects.

Control Limits in Statistical Quality Control

Parameters for control charts defined by:
- Upper Control Limit (UCL): $ext{Expected Value} + (3 imes ext{Standard Error})$
- Lower Control Limit (LCL): $ext{Expected Value} - (3 imes ext{Standard Error})$
Validity for control charts hinges on the assumption of approximately normal sampling distributions.

Continuous Monitoring of Production Processes

Example in milk jug production illustrates active monitoring, with calculations reaffirming its operational state based on control limits.
The analysis of a firm producing 4-liter cans demonstrates the adherence to quality standards and acceptable process flaws identified.