Statistics for Business - Sampling, Estimation

STATISTICS FOR BUSINESS

SAMPLING AND ESTIMATION

Introduction to Statistics
  • Definition of Statistics: Statistics involves the study of phenomena occurring en masse.

  • Statistical Population:

    • Definition: All the units (items) being the object of a study; collection of all elements of interest.

    • Types of Population:

    • Discrete Population: Countable units.

    • Continuous Population: Measurable data.

    • Finite Population: Limited number of units.

    • Infinite Population: Unlimited units.

Data Collection Methods
  • Full Scope Data Collection:

    • Definition: Observing all elements of the population.

    • Features:

    • Time-consuming.

    • Costly.

    • Applicable for both infinite or finite populations.

  • Sample-Based Data Collection:

    • Definition: Observing only a subset of elements from the population.

    • Related to inferential statistics, where conclusions are drawn about the population based on sample data.

Sampling
  • Sample Definition: A set of elements chosen (drawn) from the population.

  • Sampling Frame: A reference list of the population, which is always finite.

    • Population Characteristics:

    • Probability samples rely on randomness, which ensures reliability and allows for error calculation.

    • Sample elements are represented as random variables, and their values differ sample by sample, denoted as (X₁, X₂, …, Xn).

  • Sample Size (n):

    • Example size: n = 5.

Sample Planning
  • Considerations for Sampling:

    • Balancing accuracy versus costs/time in sample size.

    • Types of the population: infinite or finite.

    • Types of sampling techniques (with/replacement).

  • Practical Consideration: Large finite populations may be considered as infinite in practice, simplifying sampling techniques.

Sampling Errors
  • Definition of Errors:

    • Sampling Errors: Errors that depend on the chosen sample, which can be measured and calculated.

    • Non-Sampling Errors: Difficult to measure, includes:

    • Frame errors (undercoverage or overcoverage).

    • Non-response (full or partial).

    • Measurement errors (imprecise information).

    • Processing errors (errors in data capturing or units measurement).

    • Mode effect caused by the selection method.

  • Random Selection: Ensures a random (probability) sample.

Sampling Methods
  • Independent & Identically Distributed Sampling (IIDS):

    • Also known as Random Sampling.

    • Definition: From homogeneous, infinite populations with or without replacement.

    • Key Note: Elements in the sample are independent from each other; however, practical independence is challenging to achieve.

    • Application: Common in quality control in production.

  • Simple Random Sampling (SRS):

    • Definition: Sampling from a homogeneous, finite population without replacement.

    • Requirements: A complete list of population elements and equal probability of selection for each element.

  • Stratified Sampling (SS):

    • Utilizes heterogenic populations divided into strata, applying SRS within each stratum.

    • Objectives: Aim to produce more accurate estimations, reducing errors.

    • Types of Stratified Sampling:

    • Proportionate Allocation: Sample sizes according to strata proportion.

    • Disproportionate Allocation: Variations in allocations based on stratum characteristics.

    • Optimal Allocation: More sample elements from strata with higher variability.

    • Cost-Optimal Allocation: Minimizing error at certain cost levels.

  • Cluster Sampling (CS):

    • Used when no adequate sample frame exists.

    • Method: Population is grouped, SRS is drawn from these groups (clusters), and then complete observation within the selected clusters.

    • Variations: Multistage sampling, applying SRS multiple times.

  • Systematic Sampling (SYS):

    • Population is arranged according to a certain aspect, and a starting point and interval (k) are determined.

    • Process: Selecting every kth element after the first selected element from the sorted population.

    • Cautions: Ensures that no stochastic relationship exists between the sorting variable and the observation.

  • Non-Probability Sampling: Anticipates a systematic sampling when relationships exist between variables. Types include:

    • Quota Sampling: Fixed composition with random selection.

    • Snowball Sampling: Selected elements refer others in their network.

    • Concentrated Sampling: Influential elements have a higher selection probability.

    • Judgment Sampling: Often used in public opinion surveys.

Inferential Statistics
  • Estimation: Estimating values of unknown population parameters using sample data.

  • Hypothesis Testing: Testing assumptions about the population (parameters or other features) with sample data.

Basic Concepts of Estimation
  • Parameter (θ): Numerical characteristics of a population such as expected value E(X) = μ, standard deviation D(X) = σ, and proportion P.

    • Parameters are constant values; the goal is to estimate them accurately.

  • Estimator (θ̂): A rule for calculating an estimate based on observed data.

  • Sample Statistic: Characteristics calculated from the sample to estimate parameters.

  • Point Estimator: The statistic that yields a point estimate for a parameter (e.g., mean x̄, standard deviation s, proportion p).

  • Sampling Distribution: A probability distribution of all sample statistics.

  • Standard Error: The standard deviation of a point estimator often denoted as σx, S_x, S_p.

Properties of a Point Estimator
  • Unbiased: The expected value equals the parameter estimated.

  • Asymptotic Unbiasedness: Difference reduces as sample size increases.

  • Consistency: As sample size increases, the estimator approaches the true parameter value.

  • Effectiveness: Smallest standard error among competing estimators.

  • Sufficiency: Contains all information about characteristics from the population.

Theorems in Statistics
  • Law of Large Numbers: As the number of trials increases, the average result approaches the expected value.

  • Central Limit Theorem (CLT): Sum of a large number of independent random variables approximates a normal distribution (bell curve).

  • Chebyshev's Theorem: A maximum fraction of values will not be more than a specified distance (in terms of standard deviations) from the mean.

Interval Estimation
  • Interval Estimation Definition: An estimate providing a range (interval) believed to contain the parameter with a certain confidence level (π).

  • Formula: Point estimate ± Margin of error [L, U], where P(L < Parameter < U) = π.

  • Factors Affecting Width of Confidence Interval:

    • Sample Size (n).

    • Standard Deviation of the estimator.

    • Confidence Level (π).

    • Distribution of the population.

    • Maximum error of estimation (A).

Estimation Processes
  • Estimating Expected Value:

    • For populations having unknown distributions:

    • If known (σ): ar{x} ext{ ± } Z_{1-α/2} imes rac{σ}{ ext{√n}}

    • If unknown (using S): ar{x} ext{ ± } t_{1-α/2} imes rac{S}{ ext{√n}}

  • Standard Error: Denoted as S_x ext{ or } rac{σ}{ ext{√n}}.

Estimation of Probability (Proportion)
  • Probability Estimator:

    • Point estimate: ext{p} = rac{k}{n}

    • Standard Error (Sp): Sp= ext{√} rac{p imes q}{n} where q = 1-p.

Estimation of Standard Deviation
  • Point Estimate: Standard deviation (s).

  • Population with Normal Distribution: Uses the chi-square distribution to find confidence intervals.

  • Formula for confidence intervals for s:

    • rac{(n-1)s^2}{χ{1-α/2}^{2}} ext{ and } rac{(n-1)s^2}{χ{α/2}^{2}}

Estimation from Individual and Grouped Data
  • Individual Data Calculation: Utilizes functions like AVERAGE and STDEV.S in Excel or manual calculations.

  • Grouped Data Calculation: Includes weighted averages and standard deviations for grouped outcomes.

  • Example: Given data of usage from a sample, calculating averages and standard deviations from frequency distributions ensures accurate statistical representation.

Estimation of Sum of Value
  • Application: For example, to estimate the net filling weights of a certain product using a sample.

    • Total and average calculations formulae regarding the sample and population figures are explained.

  • Steps: Calculation involves:

    • ext{Average Weight} = rac{ ext{Σ (weight imes frequency)}}{n}

    • Estimating total population weight based on sample average.

Determining Sample Size
  • Margin of Error Definition: Refers to the appropriate range for confidence estimation.

  • Formulae for Sample Size:

    • Sampling distributions involve critical values from the standard normal distribution and standard errors.

    • Careful calculation of sample size based on desired estimation accuracy and existing sample size is critical.

Example Calculation for Sample Size Increase
  • If an original sample of size 120 aims to reduce estimation error by 4%, the adjusted sample size calculation considers the margin of error reduction as well as confidence intervals.