Chapter 7 – Sampling and Sampling Distributions

Essentials of Statistics for Business & Economics, Chapter 7 – Sampling and Sampling Distributions

Introduction

  • Simple random sampling is key for selecting samples from:

    • Finite populations

    • Infinite populations (a continuous process)

  • Using data from samples allows for estimates of:

    • A population mean from a sample mean

    • A population proportion from a sample proportion

  • Estimation errors are anticipated; this chapter determines how large these errors may be.

  • Introduction of sampling distributions facilitates comparison of sample estimates to population parameters.

  • The last sections focus on alternative sampling methods and the implications of large samples on sampling distributions.

Introduction to Sampling

  • Definitions:

    • Element: Entity from which data is collected.

    • Population: All elements of interest.

    • Sample: Subset of the population.

  • Sampling: Data collection method to analyze a population question.

  • Terms:

    • Sampled population: Source of the sample.

    • Frame: List from which sample elements will be drawn.

  • Results from the sample provide estimates of population characteristics.

  • Proper sampling methods ensure accurate estimates of population values.

7.1 The Electronics Associates Sampling Problem

  • Parameters: Numerical characteristics of a population.

  • Using descriptive statistics for the EAI dataset:

    • Population Mean: \bar{X} = \text{mean of salaries} = 51,800

    • Population Standard Deviation: s = 4,000

    • Total Managers: N = 2,500

    • Proportion of managers completed training: p = \frac{1500}{2500} = 0.6

  • Drawing a sample of 30 managers reduces the time and cost compared to collecting data from all 2,500.

  • Population Standard Deviation: s = 4,000

  • Total Managers: N = 2,500

  • Proportion of managers completed training: p = rac{1500}{2500} = 0.6

  • Drawing a sample of 30 managers reduces the time and cost compared to collecting data from all 2,500.

7.2 Sampling from a Finite Population

  • Simple Random Sample: Each sample of size n has the same selection probability from a finite population of size N.

    • With Replacement: An element can reappear in samples.

    • Without Replacement: Most common method.

  • Procedure:

    1. Assign random numbers to 2,500 managers using, which follows uniform distribution (0,1).

    2. Select the 30 managers corresponding to the 30 smallest random numbers.

7.2 Sampling from an Infinite Population

  • When no complete list of the population exists, a frame cannot be created.

  • Even in the absence of a frame, samples must be selected randomly.

  • Criteria for selection: To ensure a valid and representative sample from an infinite population, the following criteria must be strictly adhered to:

    1. Each selected element must belong to the same population to avoid a heterogeneous sample and ensure accurate inferences.

    2. Each element must be selected independently, meaning one selection doesn't influence another, which is crucial for minimizing bias and statistical analysis.

  • Example: Production line parts or banking transactions representing infinite populations are classic examples. In a manufacturing setting, new parts are continuously produced, making it impossible to create a finite list of all potential items. Similarly, a continuous stream of financial transactions forms an infinite population. Sampling in these scenarios involves selecting items or events as they occur, ensuring each selection meets the independence criterion and belongs to the ongoing process.

7.3 A Simple Random Sample for EAI

  • Point Estimator: Sample statistic estimating a population parameter.

  • Example Table for 30 EAI Managers:

    • Salaries and Training Status (identifying 30 managers with random selection results highlighted).

7.3 Point Estimation

  • Sample statistics include:

    • Sample Mean \bar{X} as the point estimator of the population mean \mu

    • Calculated as \bar{X} = \frac{\Sigma{X}}{30} = 71,800

    • Sample Standard Deviation s as the estimator of population standard deviation \sigma

    • Calculated using s = \sqrt{\frac{\Sigma(X - \bar{X})^2}{n-1}}

    • s = 3,348

    • Sample Proportion p as the estimator of population proportion P

    • Calculated as p = \frac{19}{30} = 0.633

7.3 Practical Advice

  • Statistical Inference: Using a sample statistic as a point estimator for population parameters.

  • The target population: Population for inference.

  • The sampled population: Actual population from where the sample is taken.

  • Ensures that both populations are closely aligned for representative samples.

7.4 Introduction to Sampling Distributions

  • Repeatedly sampling from EAI yields different estimates of \bar{X}.

  • Each sample estimate forms a probability distribution called the sampling distribution of \bar{X}.

  • The distribution reflects variations in estimates across multiple samples.

7.4 Approximation of a Sampling Distribution

  • The random variable \bar{X} exhibits various values due to different samples leading to a sampling distribution.

  • Histogram Representation: Displays the approximation of sampling distribution for its characteristics such as bell-shaped symmetry around the mean.

7.5 Sampling Distribution of \bar{X}

  • Defined by:

    • E(\bar{X}) = \mu: an unbiased point estimator if the expected value equals population parameter.

    • s_{\bar{X}} = \sqrt{\frac{N-n}{N-1}\frac{s^2}{n}}: standard error of the mean.

    • When N is large (greater than 30), the finite correction factor becomes negligible.
      .

7.5 Form of the Sampling Distribution of \bar{X}

  • For normally distributed populations, \bar{X} is normally distributed regardless of sample size.

  • For non-normally distributed populations, Central Limit Theorem applies: as sample size increases (usually n \geq 30), the distribution approaches normality.

  • Highly skewed populations may require larger sizes (n=50) for the distribution to normal.

7.5 Sampling from Different Population Distributions

  • Population Distributions Considered:

    1. Uniform Distribution

    2. Rabbit-eared Distribution

    3. Exponential Distribution (Right-skewed)

  • Analysis of sampling distribution shape illustrates convergence toward normality as sample size increases.

7.5 Illustration of the Central Limit Theorem

  • For increasing sample sizes:

    • [n=2]: Sampling Distribution differs from Population.

    • [n=5]: This shows similarity in distributions 1 and 2, while 3 remains skewed.

    • [n=30]: All distributions approximate normal.

7.5 Sampling Distribution of \bar{X} for the EAI Problem

  • Parameters of Sampling Distribution:

    • Expected Value: E(\bar{X}) = \mu = 71,800

    • Standard Error: s_{\bar{X}} = \frac{4000}{\sqrt{30}} \approx 730.3

    • Since n=30 and N=2500: no finite population correction needed.

7.5 Application of the Sampling Distribution of \bar{X}

  • Seeking probability that \bar{X} is within $500 of \mu:

    • Calculated probabilities yield approximately 50% between 71,300 and 72,300 based on Z-scores.

7.5 Relationship Between Sample Size and Sampling Distribution of \bar{X}

  • Increasing sample size to n = 100 narrows the standard error:

    • s_{\bar{X}} = \frac{4000}{\sqrt{100}} = 400

  • Yielding a probability of approximately 89% for being within $500 of \mu.

7.6 Sampling Distribution of p

  • Probability distribution of all possible values of sample proportion p is defined:

    • E(p) = p: unbiased as in the corresponding population proportion.

    • s_{p} = \sqrt{\frac{N-n}{N-1} \frac{p(1-p)}{n}}: standard error.

  • Typically neglect finite population correction for large populations.

7.6 Form of the Sampling Distribution of p

  • A binomial distribution approximates normal when:

    • np \geq 5 and n(1-p) \geq 5.

  • Estimation of p underlines that sampling distribution will be normal as long as conditions are satisfied.

7.6 Practical Value of the Sampling Distribution of p

  • Investigating probability of having p within 0.05 of the population proportion:

    • Outputs expected value of E(p) = 0.6 and a standard error of s_{p} = 0.0894.

  • Mathematical evaluation showcases a probability around 42%.

7.6 Relationship Between Sample Size and Sampling Distribution of p

  • Elevating sample size to n = 100 narrows distribution:

    • New standard error s_{p} = \sqrt{\frac{0.6(1-0.6)}{100}} = 0.049.

  • Yields a probability around 69% for being within 0.05 of p.

7.7 Properties of Point Estimators

  • Essential characteristics for effective point estimators include:

    • Unbiased: E(\hat{\theta}) = \theta where \hat{\theta} is the estimator and \theta is the parameter.

    • Efficiency: Smaller standard error indicates greater efficiency.

    • Consistency: Estimator accuracy improves with larger sample sizes.

7.7 Unbiased Point Estimator

  • A point estimator is considered unbiased if:

    • E(\hat{\theta}) = \theta.

    • Demonstrated for both sample mean and variance showing them as unbiased parameter estimators.

7.7 Efficient and Consistent Point Estimator

  • Efficiency: A point estimator with smaller standard error is more efficient.

  • Consistency: As samples grow, point estimators tend to converge to population parameters.

7.8 Stratified Random Sampling

  • Stratification: Divides population into strata (groups) with elements as homogeneous as possible.

  • Methodology: Collecting samples from each stratum; weighted averages combine results into a population estimate.

    • Advantage: More precise than simple random sampling when strata are correctly identified.

    • Disadvantage: Requires larger total sample sizes than others.

  • Application Examples: Grouping by department, location, age, or industry type.

7.8 Cluster Sampling

  • Clusters: Divide the population into heterogeneous groups called clusters; samples selected from clusters.

    • Advantage: Can lower costs via collective proximity.

    • Disadvantage: Higher total sample sizes may be needed.

  • Common Example: Area sampling defined by geographical clusters.

7.8 Systematic Sampling

  • Method: If sample size n is needed from a population of N, select one element from every N/n population elements.

    • Randomly choose an initial element and continue sampling at regular intervals.

    • Advantage: Easier identification of samples than random sampling.

7.8 Nonprobability Sampling Methods

  • Convenience Sampling: Nonrandom, based on ease of access. Simple but may not represent the population.

    • Example: Professor using student volunteers.

  • Judgment Sampling: Selection by a knowledgeable person; depends on judgment durability.

    • Example: Media sampling legislators as reflective of broader opinion.

7.9 Sampling Error

  • Sampling Error: Natural divergence of sample from the population, expected due to randomness.

  • Larger samples help mitigate sampling errors due to tighter standard errors.

7.9 Nonsampling Error

  • Nonsampling Errors: Deviations from population due to factors excluding random sampling:

    • Coverage Error: Misalignment of research objectives and sampled population.

    • Nonresponse Error: Survey discrepancies due to unresponsive segments.

    • Measurement Error: Issues in collecting accurate population characteristics.

    • Interviewer Error: Bias introduced via survey methodologies.

    • Processing Error: Errors due to data recording and preparations.

7.9 Big Data and Sampling Error

  • Big Data: Defined as datasets beyond processing capability of standard methods.

  • Four V’s characterize big data:

    • Volume: Amount of data.

    • Variety: Diversity of data types.

    • Veracity: Reliability of data gathered.

    • Velocity: Speed of data generation.

  • Big data challenges include tall and wide data, complicating traditional statistical inference.

Summary

  • Chapter covers concepts on sampling and distributions.

  • Emphasizes both finite and infinite sampling techniques.

  • Introduces point estimation and acknowledges variability in point estimators as random variables.

  • Clarifies properties of unbiased estimators and conditions necessary for normal distributions.

  • Discusses sampling methods, errors, and implications of big data concerning sampling error.