Statistics for Business - Sampling, Estimation

STATISTICS FOR BUSINESS

SAMPLING AND ESTIMATION

Introduction to Statistics

Definition of Statistics: Statistics involves the study of phenomena occurring en masse.
Statistical Population:
- Definition: All the units (items) being the object of a study; collection of all elements of interest.
- Types of Population:
- Discrete Population: Countable units.
- Continuous Population: Measurable data.
- Finite Population: Limited number of units.
- Infinite Population: Unlimited units.

Data Collection Methods

Full Scope Data Collection:
- Definition: Observing all elements of the population.
- Features:
- Time-consuming.
- Costly.
- Applicable for both infinite or finite populations.
Sample-Based Data Collection:
- Definition: Observing only a subset of elements from the population.
- Related to inferential statistics, where conclusions are drawn about the population based on sample data.

Sampling

Sample Definition: A set of elements chosen (drawn) from the population.
Sampling Frame: A reference list of the population, which is always finite.
- Population Characteristics:
- Probability samples rely on randomness, which ensures reliability and allows for error calculation.
- Sample elements are represented as random variables, and their values differ sample by sample, denoted as (X₁, X₂, …, Xn).
Sample Size (n):
- Example size: n = 5.

Sample Planning

Considerations for Sampling:
- Balancing accuracy versus costs/time in sample size.
- Types of the population: infinite or finite.
- Types of sampling techniques (with/replacement).
Practical Consideration: Large finite populations may be considered as infinite in practice, simplifying sampling techniques.

Sampling Errors

Definition of Errors:
- Sampling Errors: Errors that depend on the chosen sample, which can be measured and calculated.
- Non-Sampling Errors: Difficult to measure, includes:
- Frame errors (undercoverage or overcoverage).
- Non-response (full or partial).
- Measurement errors (imprecise information).
- Processing errors (errors in data capturing or units measurement).
- Mode effect caused by the selection method.
Random Selection: Ensures a random (probability) sample.

Sampling Methods

Independent & Identically Distributed Sampling (IIDS):
- Also known as Random Sampling.
- Definition: From homogeneous, infinite populations with or without replacement.
- Key Note: Elements in the sample are independent from each other; however, practical independence is challenging to achieve.
- Application: Common in quality control in production.
Simple Random Sampling (SRS):
- Definition: Sampling from a homogeneous, finite population without replacement.
- Requirements: A complete list of population elements and equal probability of selection for each element.
Stratified Sampling (SS):
- Utilizes heterogenic populations divided into strata, applying SRS within each stratum.
- Objectives: Aim to produce more accurate estimations, reducing errors.
- Types of Stratified Sampling:
- Proportionate Allocation: Sample sizes according to strata proportion.
- Disproportionate Allocation: Variations in allocations based on stratum characteristics.
- Optimal Allocation: More sample elements from strata with higher variability.
- Cost-Optimal Allocation: Minimizing error at certain cost levels.
Cluster Sampling (CS):
- Used when no adequate sample frame exists.
- Method: Population is grouped, SRS is drawn from these groups (clusters), and then complete observation within the selected clusters.
- Variations: Multistage sampling, applying SRS multiple times.
Systematic Sampling (SYS):
- Population is arranged according to a certain aspect, and a starting point and interval (k) are determined.
- Process: Selecting every kth element after the first selected element from the sorted population.
- Cautions: Ensures that no stochastic relationship exists between the sorting variable and the observation.
Non-Probability Sampling: Anticipates a systematic sampling when relationships exist between variables. Types include:
- Quota Sampling: Fixed composition with random selection.
- Snowball Sampling: Selected elements refer others in their network.
- Concentrated Sampling: Influential elements have a higher selection probability.
- Judgment Sampling: Often used in public opinion surveys.

Inferential Statistics

Estimation: Estimating values of unknown population parameters using sample data.
Hypothesis Testing: Testing assumptions about the population (parameters or other features) with sample data.

Basic Concepts of Estimation

Parameter (θ): Numerical characteristics of a population such as expected value E(X) = μ, standard deviation D(X) = σ, and proportion P.
- Parameters are constant values; the goal is to estimate them accurately.
Estimator (θ̂): A rule for calculating an estimate based on observed data.
Sample Statistic: Characteristics calculated from the sample to estimate parameters.
Point Estimator: The statistic that yields a point estimate for a parameter (e.g., mean x̄, standard deviation s, proportion p).
Sampling Distribution: A probability distribution of all sample statistics.
Standard Error: The standard deviation of a point estimator often denoted as σx, S_x, S_p.

Properties of a Point Estimator

Unbiased: The expected value equals the parameter estimated.
Asymptotic Unbiasedness: Difference reduces as sample size increases.
Consistency: As sample size increases, the estimator approaches the true parameter value.
Effectiveness: Smallest standard error among competing estimators.
Sufficiency: Contains all information about characteristics from the population.

Theorems in Statistics

Law of Large Numbers: As the number of trials increases, the average result approaches the expected value.
Central Limit Theorem (CLT): Sum of a large number of independent random variables approximates a normal distribution (bell curve).
Chebyshev's Theorem: A maximum fraction of values will not be more than a specified distance (in terms of standard deviations) from the mean.

Interval Estimation

Interval Estimation Definition: An estimate providing a range (interval) believed to contain the parameter with a certain confidence level (π).
Formula: Point estimate ± Margin of error [L, U], where P(L < Parameter < U) = π.
Factors Affecting Width of Confidence Interval:
- Sample Size (n).
- Standard Deviation of the estimator.
- Confidence Level (π).
- Distribution of the population.
- Maximum error of estimation (A).

Estimation Processes

Estimating Expected Value:
- For populations having unknown distributions:
- If known (σ): $\bar{x} ext{ ± } Z_{1-α/2} imes rac{σ}{ ext{√n}}$
- If unknown (using S): $\bar{x} ext{ ± } t_{1-α/2} imes rac{S}{ ext{√n}}$
Standard Error: Denoted as $S_x ext{ or } rac{σ}{ ext{√n}}$ .

Estimation of Probability (Proportion)

Probability Estimator:
- Point estimate: $ext{p} = rac{k}{n}$
- Standard Error (Sp): $Sp= ext{√} rac{p imes q}{n}$ where q = 1-p.

Estimation of Standard Deviation

Point Estimate: Standard deviation (s).
Population with Normal Distribution: Uses the chi-square distribution to find confidence intervals.
Formula for confidence intervals for s:
- $rac{(n-1)s^2}{χ<em>{1-α/2}^{2}} ext{ and } rac{(n-1)s^2}{χ</em>{α/2}^{2}}$

Estimation from Individual and Grouped Data

Individual Data Calculation: Utilizes functions like AVERAGE and STDEV.S in Excel or manual calculations.
Grouped Data Calculation: Includes weighted averages and standard deviations for grouped outcomes.
Example: Given data of usage from a sample, calculating averages and standard deviations from frequency distributions ensures accurate statistical representation.

Estimation of Sum of Value

Application: For example, to estimate the net filling weights of a certain product using a sample.
- Total and average calculations formulae regarding the sample and population figures are explained.
Steps: Calculation involves:
- $ext{Average Weight} = rac{ ext{Σ (weight imes frequency)}}{n}$
- Estimating total population weight based on sample average.

Determining Sample Size

Margin of Error Definition: Refers to the appropriate range for confidence estimation.
Formulae for Sample Size:
- Sampling distributions involve critical values from the standard normal distribution and standard errors.
- Careful calculation of sample size based on desired estimation accuracy and existing sample size is critical.

Example Calculation for Sample Size Increase

If an original sample of size 120 aims to reduce estimation error by 4%, the adjusted sample size calculation considers the margin of error reduction as well as confidence intervals.