Chapter 7 – Sampling and Sampling Distributions
Essentials of Statistics for Business & Economics, Chapter 7 – Sampling and Sampling Distributions
Introduction
Simple random sampling is key for selecting samples from:
Finite populations
Infinite populations (a continuous process)
Using data from samples allows for estimates of:
A population mean from a sample mean
A population proportion from a sample proportion
Estimation errors are anticipated; this chapter determines how large these errors may be.
Introduction of sampling distributions facilitates comparison of sample estimates to population parameters.
The last sections focus on alternative sampling methods and the implications of large samples on sampling distributions.
Introduction to Sampling
Definitions:
Element: Entity from which data is collected.
Population: All elements of interest.
Sample: Subset of the population.
Sampling: Data collection method to analyze a population question.
Terms:
Sampled population: Source of the sample.
Frame: List from which sample elements will be drawn.
Results from the sample provide estimates of population characteristics.
Proper sampling methods ensure accurate estimates of population values.
7.1 The Electronics Associates Sampling Problem
Parameters: Numerical characteristics of a population.
Using descriptive statistics for the EAI dataset:
Population Mean: \bar{X} = \text{mean of salaries} = 51,800
Population Standard Deviation: s = 4,000
Total Managers: N = 2,500
Proportion of managers completed training: p = \frac{1500}{2500} = 0.6
Drawing a sample of 30 managers reduces the time and cost compared to collecting data from all 2,500.
Population Standard Deviation: s = 4,000
Total Managers: N = 2,500
Proportion of managers completed training: p = rac{1500}{2500} = 0.6
Drawing a sample of 30 managers reduces the time and cost compared to collecting data from all 2,500.
7.2 Sampling from a Finite Population
Simple Random Sample: Each sample of size n has the same selection probability from a finite population of size N.
With Replacement: An element can reappear in samples.
Without Replacement: Most common method.
Procedure:
Assign random numbers to 2,500 managers using, which follows uniform distribution (0,1).
Select the 30 managers corresponding to the 30 smallest random numbers.
7.2 Sampling from an Infinite Population
When no complete list of the population exists, a frame cannot be created.
Even in the absence of a frame, samples must be selected randomly.
Criteria for selection: To ensure a valid and representative sample from an infinite population, the following criteria must be strictly adhered to:
Each selected element must belong to the same population to avoid a heterogeneous sample and ensure accurate inferences.
Each element must be selected independently, meaning one selection doesn't influence another, which is crucial for minimizing bias and statistical analysis.
Example: Production line parts or banking transactions representing infinite populations are classic examples. In a manufacturing setting, new parts are continuously produced, making it impossible to create a finite list of all potential items. Similarly, a continuous stream of financial transactions forms an infinite population. Sampling in these scenarios involves selecting items or events as they occur, ensuring each selection meets the independence criterion and belongs to the ongoing process.
7.3 A Simple Random Sample for EAI
Point Estimator: Sample statistic estimating a population parameter.
Example Table for 30 EAI Managers:
Salaries and Training Status (identifying 30 managers with random selection results highlighted).
7.3 Point Estimation
Sample statistics include:
Sample Mean \bar{X} as the point estimator of the population mean \mu
Calculated as \bar{X} = \frac{\Sigma{X}}{30} = 71,800
Sample Standard Deviation s as the estimator of population standard deviation \sigma
Calculated using s = \sqrt{\frac{\Sigma(X - \bar{X})^2}{n-1}}
s = 3,348
Sample Proportion p as the estimator of population proportion P
Calculated as p = \frac{19}{30} = 0.633
7.3 Practical Advice
Statistical Inference: Using a sample statistic as a point estimator for population parameters.
The target population: Population for inference.
The sampled population: Actual population from where the sample is taken.
Ensures that both populations are closely aligned for representative samples.
7.4 Introduction to Sampling Distributions
Repeatedly sampling from EAI yields different estimates of \bar{X}.
Each sample estimate forms a probability distribution called the sampling distribution of \bar{X}.
The distribution reflects variations in estimates across multiple samples.
7.4 Approximation of a Sampling Distribution
The random variable \bar{X} exhibits various values due to different samples leading to a sampling distribution.
Histogram Representation: Displays the approximation of sampling distribution for its characteristics such as bell-shaped symmetry around the mean.
7.5 Sampling Distribution of \bar{X}
Defined by:
E(\bar{X}) = \mu: an unbiased point estimator if the expected value equals population parameter.
s_{\bar{X}} = \sqrt{\frac{N-n}{N-1}\frac{s^2}{n}}: standard error of the mean.
When N is large (greater than 30), the finite correction factor becomes negligible.
.
7.5 Form of the Sampling Distribution of \bar{X}
For normally distributed populations, \bar{X} is normally distributed regardless of sample size.
For non-normally distributed populations, Central Limit Theorem applies: as sample size increases (usually n \geq 30), the distribution approaches normality.
Highly skewed populations may require larger sizes (n=50) for the distribution to normal.
7.5 Sampling from Different Population Distributions
Population Distributions Considered:
Uniform Distribution
Rabbit-eared Distribution
Exponential Distribution (Right-skewed)
Analysis of sampling distribution shape illustrates convergence toward normality as sample size increases.
7.5 Illustration of the Central Limit Theorem
For increasing sample sizes:
[n=2]: Sampling Distribution differs from Population.
[n=5]: This shows similarity in distributions 1 and 2, while 3 remains skewed.
[n=30]: All distributions approximate normal.
7.5 Sampling Distribution of \bar{X} for the EAI Problem
Parameters of Sampling Distribution:
Expected Value: E(\bar{X}) = \mu = 71,800
Standard Error: s_{\bar{X}} = \frac{4000}{\sqrt{30}} \approx 730.3
Since n=30 and N=2500: no finite population correction needed.
7.5 Application of the Sampling Distribution of \bar{X}
Seeking probability that \bar{X} is within $500 of \mu:
Calculated probabilities yield approximately 50% between 71,300 and 72,300 based on Z-scores.
7.5 Relationship Between Sample Size and Sampling Distribution of \bar{X}
Increasing sample size to n = 100 narrows the standard error:
s_{\bar{X}} = \frac{4000}{\sqrt{100}} = 400
Yielding a probability of approximately 89% for being within $500 of \mu.
7.6 Sampling Distribution of p
Probability distribution of all possible values of sample proportion p is defined:
E(p) = p: unbiased as in the corresponding population proportion.
s_{p} = \sqrt{\frac{N-n}{N-1} \frac{p(1-p)}{n}}: standard error.
Typically neglect finite population correction for large populations.
7.6 Form of the Sampling Distribution of p
A binomial distribution approximates normal when:
np \geq 5 and n(1-p) \geq 5.
Estimation of p underlines that sampling distribution will be normal as long as conditions are satisfied.
7.6 Practical Value of the Sampling Distribution of p
Investigating probability of having p within 0.05 of the population proportion:
Outputs expected value of E(p) = 0.6 and a standard error of s_{p} = 0.0894.
Mathematical evaluation showcases a probability around 42%.
7.6 Relationship Between Sample Size and Sampling Distribution of p
Elevating sample size to n = 100 narrows distribution:
New standard error s_{p} = \sqrt{\frac{0.6(1-0.6)}{100}} = 0.049.
Yields a probability around 69% for being within 0.05 of p.
7.7 Properties of Point Estimators
Essential characteristics for effective point estimators include:
Unbiased: E(\hat{\theta}) = \theta where \hat{\theta} is the estimator and \theta is the parameter.
Efficiency: Smaller standard error indicates greater efficiency.
Consistency: Estimator accuracy improves with larger sample sizes.
7.7 Unbiased Point Estimator
A point estimator is considered unbiased if:
E(\hat{\theta}) = \theta.
Demonstrated for both sample mean and variance showing them as unbiased parameter estimators.
7.7 Efficient and Consistent Point Estimator
Efficiency: A point estimator with smaller standard error is more efficient.
Consistency: As samples grow, point estimators tend to converge to population parameters.
7.8 Stratified Random Sampling
Stratification: Divides population into strata (groups) with elements as homogeneous as possible.
Methodology: Collecting samples from each stratum; weighted averages combine results into a population estimate.
Advantage: More precise than simple random sampling when strata are correctly identified.
Disadvantage: Requires larger total sample sizes than others.
Application Examples: Grouping by department, location, age, or industry type.
7.8 Cluster Sampling
Clusters: Divide the population into heterogeneous groups called clusters; samples selected from clusters.
Advantage: Can lower costs via collective proximity.
Disadvantage: Higher total sample sizes may be needed.
Common Example: Area sampling defined by geographical clusters.
7.8 Systematic Sampling
Method: If sample size n is needed from a population of N, select one element from every N/n population elements.
Randomly choose an initial element and continue sampling at regular intervals.
Advantage: Easier identification of samples than random sampling.
7.8 Nonprobability Sampling Methods
Convenience Sampling: Nonrandom, based on ease of access. Simple but may not represent the population.
Example: Professor using student volunteers.
Judgment Sampling: Selection by a knowledgeable person; depends on judgment durability.
Example: Media sampling legislators as reflective of broader opinion.
7.9 Sampling Error
Sampling Error: Natural divergence of sample from the population, expected due to randomness.
Larger samples help mitigate sampling errors due to tighter standard errors.
7.9 Nonsampling Error
Nonsampling Errors: Deviations from population due to factors excluding random sampling:
Coverage Error: Misalignment of research objectives and sampled population.
Nonresponse Error: Survey discrepancies due to unresponsive segments.
Measurement Error: Issues in collecting accurate population characteristics.
Interviewer Error: Bias introduced via survey methodologies.
Processing Error: Errors due to data recording and preparations.
7.9 Big Data and Sampling Error
Big Data: Defined as datasets beyond processing capability of standard methods.
Four V’s characterize big data:
Volume: Amount of data.
Variety: Diversity of data types.
Veracity: Reliability of data gathered.
Velocity: Speed of data generation.
Big data challenges include tall and wide data, complicating traditional statistical inference.
Summary
Chapter covers concepts on sampling and distributions.
Emphasizes both finite and infinite sampling techniques.
Introduces point estimation and acknowledges variability in point estimators as random variables.
Clarifies properties of unbiased estimators and conditions necessary for normal distributions.
Discusses sampling methods, errors, and implications of big data concerning sampling error.