Chapter 7 – Sampling and Sampling Distributions

Essentials of Statistics for Business & Economics, Chapter 7 – Sampling and Sampling Distributions

Introduction

Simple random sampling is key for selecting samples from:
- Finite populations
- Infinite populations (a continuous process)
Using data from samples allows for estimates of:
- A population mean from a sample mean
- A population proportion from a sample proportion
Estimation errors are anticipated; this chapter determines how large these errors may be.
Introduction of sampling distributions facilitates comparison of sample estimates to population parameters.
The last sections focus on alternative sampling methods and the implications of large samples on sampling distributions.

Introduction to Sampling

Definitions:
- Element: Entity from which data is collected.
- Population: All elements of interest.
- Sample: Subset of the population.
Sampling: Data collection method to analyze a population question.
Terms:
- Sampled population: Source of the sample.
- Frame: List from which sample elements will be drawn.
Results from the sample provide estimates of population characteristics.
Proper sampling methods ensure accurate estimates of population values.

7.1 The Electronics Associates Sampling Problem

Parameters: Numerical characteristics of a population.
Using descriptive statistics for the EAI dataset:
- Population Mean: $\bar{X} = \text{mean of salaries} = 51,800$
- Population Standard Deviation: $s = 4,000$
- Total Managers: $N = 2,500$
- Proportion of managers completed training: $p = \frac{1500}{2500} = 0.6$
Drawing a sample of 30 managers reduces the time and cost compared to collecting data from all 2,500.

Population Standard Deviation: $s = 4,000$
Total Managers: $N = 2,500$
Proportion of managers completed training: $p = rac{1500}{2500} = 0.6$

Drawing a sample of 30 managers reduces the time and cost compared to collecting data from all 2,500.

7.2 Sampling from a Finite Population

Simple Random Sample: Each sample of size $n$ has the same selection probability from a finite population of size $N$ .
- With Replacement: An element can reappear in samples.
- Without Replacement: Most common method.
Procedure:
1. Assign random numbers to 2,500 managers using, which follows uniform distribution (0,1).
2. Select the 30 managers corresponding to the 30 smallest random numbers.

7.2 Sampling from an Infinite Population

When no complete list of the population exists, a frame cannot be created.
Even in the absence of a frame, samples must be selected randomly.

Criteria for selection: To ensure a valid and representative sample from an infinite population, the following criteria must be strictly adhered to:
1. Each selected element must belong to the same population to avoid a heterogeneous sample and ensure accurate inferences.
2. Each element must be selected independently, meaning one selection doesn't influence another, which is crucial for minimizing bias and statistical analysis.

Example: Production line parts or banking transactions representing infinite populations are classic examples. In a manufacturing setting, new parts are continuously produced, making it impossible to create a finite list of all potential items. Similarly, a continuous stream of financial transactions forms an infinite population. Sampling in these scenarios involves selecting items or events as they occur, ensuring each selection meets the independence criterion and belongs to the ongoing process.

7.3 A Simple Random Sample for EAI

Point Estimator: Sample statistic estimating a population parameter.
Example Table for 30 EAI Managers:
- Salaries and Training Status (identifying 30 managers with random selection results highlighted).

7.3 Point Estimation

Sample statistics include:
- Sample Mean $\bar{X}$ as the point estimator of the population mean $\mu$
- Calculated as $\bar{X} = \frac{\Sigma{X}}{30} = 71,800$
- Sample Standard Deviation $s$ as the estimator of population standard deviation $\sigma$
- Calculated using $s = \sqrt{\frac{\Sigma(X - \bar{X})^2}{n-1}}$
- $s = 3,348$
- Sample Proportion $p$ as the estimator of population proportion $P$
- Calculated as $p = \frac{19}{30} = 0.633$

7.3 Practical Advice

Statistical Inference: Using a sample statistic as a point estimator for population parameters.
The target population: Population for inference.
The sampled population: Actual population from where the sample is taken.
Ensures that both populations are closely aligned for representative samples.

7.4 Introduction to Sampling Distributions

Repeatedly sampling from EAI yields different estimates of $\bar{X}$ .
Each sample estimate forms a probability distribution called the sampling distribution of $\bar{X}$ .
The distribution reflects variations in estimates across multiple samples.

7.4 Approximation of a Sampling Distribution

The random variable $\bar{X}$ exhibits various values due to different samples leading to a sampling distribution.
Histogram Representation: Displays the approximation of sampling distribution for its characteristics such as bell-shaped symmetry around the mean.

7.5 Sampling Distribution of $\bar{X}$

Defined by:
- $E(\bar{X}) = \mu$ : an unbiased point estimator if the expected value equals population parameter.
- $s_{\bar{X}} = \sqrt{\frac{N-n}{N-1}\frac{s^2}{n}}$ : standard error of the mean.
- When $N$ is large (greater than 30), the finite correction factor becomes negligible.
  .

7.5 Form of the Sampling Distribution of $\bar{X}$

For normally distributed populations, $\bar{X}$ is normally distributed regardless of sample size.
For non-normally distributed populations, Central Limit Theorem applies: as sample size increases (usually $n \geq 30$ ), the distribution approaches normality.
Highly skewed populations may require larger sizes ( $n=50$ ) for the distribution to normal.

7.5 Sampling from Different Population Distributions

Population Distributions Considered:
1. Uniform Distribution
2. Rabbit-eared Distribution
3. Exponential Distribution (Right-skewed)
Analysis of sampling distribution shape illustrates convergence toward normality as sample size increases.

7.5 Illustration of the Central Limit Theorem

For increasing sample sizes:
- $[n=2]$ : Sampling Distribution differs from Population.
- $[n=5]$ : This shows similarity in distributions 1 and 2, while 3 remains skewed.
- $[n=30]$ : All distributions approximate normal.

7.5 Sampling Distribution of $\bar{X}$ for the EAI Problem

Parameters of Sampling Distribution:
- Expected Value: $E(\bar{X}) = \mu = 71,800$
- Standard Error: $s_{\bar{X}} = \frac{4000}{\sqrt{30}} \approx 730.3$
- Since $n=30$ and $N=2500$ : no finite population correction needed.

7.5 Application of the Sampling Distribution of $\bar{X}$

Seeking probability that $\bar{X}$ is within $500 of $\mu$ :
- Calculated probabilities yield approximately 50% between $71,300$ and $72,300$ based on Z-scores.

7.5 Relationship Between Sample Size and Sampling Distribution of $\bar{X}$

Increasing sample size to $n = 100$ narrows the standard error:
- $s_{\bar{X}} = \frac{4000}{\sqrt{100}} = 400$
Yielding a probability of approximately 89% for being within $500 of $\mu$ .

7.6 Sampling Distribution of $p$

Probability distribution of all possible values of sample proportion $p$ is defined:
- $E(p) = p$ : unbiased as in the corresponding population proportion.
- $s_{p} = \sqrt{\frac{N-n}{N-1} \frac{p(1-p)}{n}}$ : standard error.
Typically neglect finite population correction for large populations.

7.6 Form of the Sampling Distribution of $p$

A binomial distribution approximates normal when:
- $np \geq 5$ and $n(1-p) \geq 5$ .
Estimation of $p$ underlines that sampling distribution will be normal as long as conditions are satisfied.

7.6 Practical Value of the Sampling Distribution of $p$

Investigating probability of having $p$ within 0.05 of the population proportion:
- Outputs expected value of $E(p) = 0.6$ and a standard error of $s_{p} = 0.0894$ .
Mathematical evaluation showcases a probability around 42%.

7.6 Relationship Between Sample Size and Sampling Distribution of $p$

Elevating sample size to $n = 100$ narrows distribution:
- New standard error $s_{p} = \sqrt{\frac{0.6(1-0.6)}{100}} = 0.049$ .
Yields a probability around 69% for being within 0.05 of $p$ .

7.7 Properties of Point Estimators

Essential characteristics for effective point estimators include:
- Unbiased: $E(\hat{\theta}) = \theta$ where $\hat{\theta}$ is the estimator and $\theta$ is the parameter.
- Efficiency: Smaller standard error indicates greater efficiency.
- Consistency: Estimator accuracy improves with larger sample sizes.

7.7 Unbiased Point Estimator

A point estimator is considered unbiased if:
- $E(\hat{\theta}) = \theta$ .
- Demonstrated for both sample mean and variance showing them as unbiased parameter estimators.

7.7 Efficient and Consistent Point Estimator

Efficiency: A point estimator with smaller standard error is more efficient.
Consistency: As samples grow, point estimators tend to converge to population parameters.

7.8 Stratified Random Sampling

Stratification: Divides population into strata (groups) with elements as homogeneous as possible.
Methodology: Collecting samples from each stratum; weighted averages combine results into a population estimate.
- Advantage: More precise than simple random sampling when strata are correctly identified.
- Disadvantage: Requires larger total sample sizes than others.
Application Examples: Grouping by department, location, age, or industry type.

7.8 Cluster Sampling

Clusters: Divide the population into heterogeneous groups called clusters; samples selected from clusters.
- Advantage: Can lower costs via collective proximity.
- Disadvantage: Higher total sample sizes may be needed.
Common Example: Area sampling defined by geographical clusters.

7.8 Systematic Sampling

Method: If sample size $n$ is needed from a population of $N$ , select one element from every $N/n$ population elements.
- Randomly choose an initial element and continue sampling at regular intervals.
- Advantage: Easier identification of samples than random sampling.

7.8 Nonprobability Sampling Methods

Convenience Sampling: Nonrandom, based on ease of access. Simple but may not represent the population.
- Example: Professor using student volunteers.
Judgment Sampling: Selection by a knowledgeable person; depends on judgment durability.
- Example: Media sampling legislators as reflective of broader opinion.

7.9 Sampling Error

Sampling Error: Natural divergence of sample from the population, expected due to randomness.
Larger samples help mitigate sampling errors due to tighter standard errors.

7.9 Nonsampling Error

Nonsampling Errors: Deviations from population due to factors excluding random sampling:
- Coverage Error: Misalignment of research objectives and sampled population.
- Nonresponse Error: Survey discrepancies due to unresponsive segments.
- Measurement Error: Issues in collecting accurate population characteristics.
- Interviewer Error: Bias introduced via survey methodologies.
- Processing Error: Errors due to data recording and preparations.

7.9 Big Data and Sampling Error

Big Data: Defined as datasets beyond processing capability of standard methods.
Four V’s characterize big data:
- Volume: Amount of data.
- Variety: Diversity of data types.
- Veracity: Reliability of data gathered.
- Velocity: Speed of data generation.
Big data challenges include tall and wide data, complicating traditional statistical inference.

Summary

Chapter covers concepts on sampling and distributions.
Emphasizes both finite and infinite sampling techniques.
Introduces point estimation and acknowledges variability in point estimators as random variables.
Clarifies properties of unbiased estimators and conditions necessary for normal distributions.
Discusses sampling methods, errors, and implications of big data concerning sampling error.

Chapter 7 – Sampling and Sampling Distributions

Essentials of Statistics for Business & Economics, Chapter 7 – Sampling and Sampling Distributions

Introduction

Introduction to Sampling

7.1 The Electronics Associates Sampling Problem

7.2 Sampling from a Finite Population

7.2 Sampling from an Infinite Population

7.3 A Simple Random Sample for EAI

7.3 Point Estimation

7.3 Practical Advice

7.4 Introduction to Sampling Distributions

7.4 Approximation of a Sampling Distribution

7.5 Sampling Distribution of Xˉ\bar{X}Xˉ

7.5 Form of the Sampling Distribution of Xˉ\bar{X}Xˉ

7.5 Sampling from Different Population Distributions

7.5 Illustration of the Central Limit Theorem

7.5 Sampling Distribution of Xˉ\bar{X}Xˉ for the EAI Problem

7.5 Application of the Sampling Distribution of Xˉ\bar{X}Xˉ

7.5 Relationship Between Sample Size and Sampling Distribution of Xˉ\bar{X}Xˉ

7.6 Sampling Distribution of ppp

7.6 Form of the Sampling Distribution of ppp

7.6 Practical Value of the Sampling Distribution of ppp

7.6 Relationship Between Sample Size and Sampling Distribution of ppp

7.7 Properties of Point Estimators

7.7 Unbiased Point Estimator

7.7 Efficient and Consistent Point Estimator

7.8 Stratified Random Sampling

7.8 Cluster Sampling

7.8 Systematic Sampling

7.8 Nonprobability Sampling Methods

7.9 Sampling Error

7.9 Nonsampling Error

7.9 Big Data and Sampling Error

Summary

7.5 Sampling Distribution of $\bar{X}$

7.5 Form of the Sampling Distribution of $\bar{X}$

7.5 Sampling Distribution of $\bar{X}$ for the EAI Problem

7.5 Application of the Sampling Distribution of $\bar{X}$

7.5 Relationship Between Sample Size and Sampling Distribution of $\bar{X}$

7.6 Sampling Distribution of $p$

7.6 Form of the Sampling Distribution of $p$

7.6 Practical Value of the Sampling Distribution of $p$

7.6 Relationship Between Sample Size and Sampling Distribution of $p$