Chapter 7 – Sampling and Sampling Distributions
Essentials of Statistics for Business & Economics, Chapter 7 – Sampling and Sampling Distributions
Introduction
Simple random sampling is key for selecting samples from:
Finite populations
Infinite populations (a continuous process)
Using data from samples allows for estimates of:
A population mean from a sample mean
A population proportion from a sample proportion
Estimation errors are anticipated; this chapter determines how large these errors may be.
Introduction of sampling distributions facilitates comparison of sample estimates to population parameters.
The last sections focus on alternative sampling methods and the implications of large samples on sampling distributions.
Introduction to Sampling
Definitions:
Element: Entity from which data is collected.
Population: All elements of interest.
Sample: Subset of the population.
Sampling: Data collection method to analyze a population question.
Terms:
Sampled population: Source of the sample.
Frame: List from which sample elements will be drawn.
Results from the sample provide estimates of population characteristics.
Proper sampling methods ensure accurate estimates of population values.
7.1 The Electronics Associates Sampling Problem
Parameters: Numerical characteristics of a population.
Using descriptive statistics for the EAI dataset:
Population Mean:
Population Standard Deviation:
Total Managers:
Proportion of managers completed training:
Drawing a sample of 30 managers reduces the time and cost compared to collecting data from all 2,500.
Population Standard Deviation:
Total Managers:
Proportion of managers completed training:
Drawing a sample of 30 managers reduces the time and cost compared to collecting data from all 2,500.
7.2 Sampling from a Finite Population
Simple Random Sample: Each sample of size has the same selection probability from a finite population of size .
With Replacement: An element can reappear in samples.
Without Replacement: Most common method.
Procedure:
Assign random numbers to 2,500 managers using, which follows uniform distribution (0,1).
Select the 30 managers corresponding to the 30 smallest random numbers.
7.2 Sampling from an Infinite Population
When no complete list of the population exists, a frame cannot be created.
Even in the absence of a frame, samples must be selected randomly.
Criteria for selection: To ensure a valid and representative sample from an infinite population, the following criteria must be strictly adhered to:
Each selected element must belong to the same population to avoid a heterogeneous sample and ensure accurate inferences.
Each element must be selected independently, meaning one selection doesn't influence another, which is crucial for minimizing bias and statistical analysis.
Example: Production line parts or banking transactions representing infinite populations are classic examples. In a manufacturing setting, new parts are continuously produced, making it impossible to create a finite list of all potential items. Similarly, a continuous stream of financial transactions forms an infinite population. Sampling in these scenarios involves selecting items or events as they occur, ensuring each selection meets the independence criterion and belongs to the ongoing process.
7.3 A Simple Random Sample for EAI
Point Estimator: Sample statistic estimating a population parameter.
Example Table for 30 EAI Managers:
Salaries and Training Status (identifying 30 managers with random selection results highlighted).
7.3 Point Estimation
Sample statistics include:
Sample Mean as the point estimator of the population mean
Calculated as
Sample Standard Deviation as the estimator of population standard deviation
Calculated using
Sample Proportion as the estimator of population proportion
Calculated as
7.3 Practical Advice
Statistical Inference: Using a sample statistic as a point estimator for population parameters.
The target population: Population for inference.
The sampled population: Actual population from where the sample is taken.
Ensures that both populations are closely aligned for representative samples.
7.4 Introduction to Sampling Distributions
Repeatedly sampling from EAI yields different estimates of .
Each sample estimate forms a probability distribution called the sampling distribution of .
The distribution reflects variations in estimates across multiple samples.
7.4 Approximation of a Sampling Distribution
The random variable exhibits various values due to different samples leading to a sampling distribution.
Histogram Representation: Displays the approximation of sampling distribution for its characteristics such as bell-shaped symmetry around the mean.
7.5 Sampling Distribution of
Defined by:
: an unbiased point estimator if the expected value equals population parameter.
: standard error of the mean.
When is large (greater than 30), the finite correction factor becomes negligible.
.
7.5 Form of the Sampling Distribution of
For normally distributed populations, is normally distributed regardless of sample size.
For non-normally distributed populations, Central Limit Theorem applies: as sample size increases (usually ), the distribution approaches normality.
Highly skewed populations may require larger sizes () for the distribution to normal.
7.5 Sampling from Different Population Distributions
Population Distributions Considered:
Uniform Distribution
Rabbit-eared Distribution
Exponential Distribution (Right-skewed)
Analysis of sampling distribution shape illustrates convergence toward normality as sample size increases.
7.5 Illustration of the Central Limit Theorem
For increasing sample sizes:
: Sampling Distribution differs from Population.
: This shows similarity in distributions 1 and 2, while 3 remains skewed.
: All distributions approximate normal.
7.5 Sampling Distribution of for the EAI Problem
Parameters of Sampling Distribution:
Expected Value:
Standard Error:
Since and : no finite population correction needed.
7.5 Application of the Sampling Distribution of
Seeking probability that is within $500 of :
Calculated probabilities yield approximately 50% between and based on Z-scores.
7.5 Relationship Between Sample Size and Sampling Distribution of
Increasing sample size to narrows the standard error:
Yielding a probability of approximately 89% for being within $500 of .
7.6 Sampling Distribution of
Probability distribution of all possible values of sample proportion is defined:
: unbiased as in the corresponding population proportion.
: standard error.
Typically neglect finite population correction for large populations.
7.6 Form of the Sampling Distribution of
A binomial distribution approximates normal when:
and .
Estimation of underlines that sampling distribution will be normal as long as conditions are satisfied.
7.6 Practical Value of the Sampling Distribution of
Investigating probability of having within 0.05 of the population proportion:
Outputs expected value of and a standard error of .
Mathematical evaluation showcases a probability around 42%.
7.6 Relationship Between Sample Size and Sampling Distribution of
Elevating sample size to narrows distribution:
New standard error .
Yields a probability around 69% for being within 0.05 of .
7.7 Properties of Point Estimators
Essential characteristics for effective point estimators include:
Unbiased: where is the estimator and is the parameter.
Efficiency: Smaller standard error indicates greater efficiency.
Consistency: Estimator accuracy improves with larger sample sizes.
7.7 Unbiased Point Estimator
A point estimator is considered unbiased if:
.
Demonstrated for both sample mean and variance showing them as unbiased parameter estimators.
7.7 Efficient and Consistent Point Estimator
Efficiency: A point estimator with smaller standard error is more efficient.
Consistency: As samples grow, point estimators tend to converge to population parameters.
7.8 Stratified Random Sampling
Stratification: Divides population into strata (groups) with elements as homogeneous as possible.
Methodology: Collecting samples from each stratum; weighted averages combine results into a population estimate.
Advantage: More precise than simple random sampling when strata are correctly identified.
Disadvantage: Requires larger total sample sizes than others.
Application Examples: Grouping by department, location, age, or industry type.
7.8 Cluster Sampling
Clusters: Divide the population into heterogeneous groups called clusters; samples selected from clusters.
Advantage: Can lower costs via collective proximity.
Disadvantage: Higher total sample sizes may be needed.
Common Example: Area sampling defined by geographical clusters.
7.8 Systematic Sampling
Method: If sample size is needed from a population of , select one element from every population elements.
Randomly choose an initial element and continue sampling at regular intervals.
Advantage: Easier identification of samples than random sampling.
7.8 Nonprobability Sampling Methods
Convenience Sampling: Nonrandom, based on ease of access. Simple but may not represent the population.
Example: Professor using student volunteers.
Judgment Sampling: Selection by a knowledgeable person; depends on judgment durability.
Example: Media sampling legislators as reflective of broader opinion.
7.9 Sampling Error
Sampling Error: Natural divergence of sample from the population, expected due to randomness.
Larger samples help mitigate sampling errors due to tighter standard errors.
7.9 Nonsampling Error
Nonsampling Errors: Deviations from population due to factors excluding random sampling:
Coverage Error: Misalignment of research objectives and sampled population.
Nonresponse Error: Survey discrepancies due to unresponsive segments.
Measurement Error: Issues in collecting accurate population characteristics.
Interviewer Error: Bias introduced via survey methodologies.
Processing Error: Errors due to data recording and preparations.
7.9 Big Data and Sampling Error
Big Data: Defined as datasets beyond processing capability of standard methods.
Four V’s characterize big data:
Volume: Amount of data.
Variety: Diversity of data types.
Veracity: Reliability of data gathered.
Velocity: Speed of data generation.
Big data challenges include tall and wide data, complicating traditional statistical inference.
Summary
Chapter covers concepts on sampling and distributions.
Emphasizes both finite and infinite sampling techniques.
Introduces point estimation and acknowledges variability in point estimators as random variables.
Clarifies properties of unbiased estimators and conditions necessary for normal distributions.
Discusses sampling methods, errors, and implications of big data concerning sampling error.