D

eco1020_week3

Distinguishing Population and Sample

Introduction

Understanding the difference between population and sample is crucial in statistics for economics (ECO1020) as it forms the basis for statistical inference, allowing researchers to make conclusions about a larger group based on observations from a subset.

Population vs. Sample

Definition:

  • Let Y be a random variable with a probability distribution.

  • The values that Y can take represent the population, which is the entire set of observations or measurements of interest.

  • A sample of size n is drawn from this population, represented as (Y1, Y2, ..., Yn). The sample consists of n observations that are expected to represent the population accurately.

  • Assumption: To simplify analysis, it is often convenient to assume that the sample is random, meaning that it should reflect the characteristics of the population without bias.

Random Sampling

Characteristics of a Random Sample:

A sample (Y1, Y2, ..., Yn) is considered a random sample if:

  • All Yi's are independent draws from the same probability distribution, ensuring representativeness.

  • Each element has an equal chance of selection, which means there is no favoritism in the selection process.

  • The selection of one element does not affect the probability of another being selected (statistical independence), reinforcing the randomness of the sample.

These characteristics imply that the Yi's are independent and identically distributed (i.i.d) random variables, which is vital for many statistical methods.

Using the Sample

Purpose of Sampling:

A random sample serves several key purposes in statistical analysis, including:

  • Estimating population moments (mean, variance): This helps in understanding the overall attributes of the population.

  • Obtaining a point estimate for the parameter of interest: A single value that serves as the best guess for a population parameter, like the mean.

  • Testing hypotheses regarding the population, which includes determining if there is enough evidence to support or refute specific claims about population characteristics.

Population Sample Probability

Understanding:

A clear understanding of the distinctions and relationships between population and sample is essential in making credible inferences and predictions based on statistical analysis.

Sampling Distribution of the Sample Average

Sample Mean:

The sample average (mean) is denoted as Y.

  • Formula: Y = (1/n) * ΣYi for i = 1 to n.

  • Y is deemed random because it is derived from random variables Yi, and reflects the properties of the underlying population.

  • The mean of Y, denoted as E(Y), is equal to the population mean µY, which is instrumental in statistical calculations.

Variance of Y

Calculating Variance:
  • Formula: var(Y) = (1/n²) * Σvar(Yi) + (1/n²) * ΣΣcov(Yi, Yj).

  • For independent Yi's, this formula simplifies to: var(Y) = σ²Y / n, where σ²Y represents the population variance.

  • Standard Deviation: σY = √var(Y) = σY/√n. This metric provides insight into the variability of the sample mean relative to the population mean.

Large Sample Approximations

Sampling Distribution Characteristics:

  • If Y is normally distributed (Y ∼ N(µY , σ²Y )), then the sample average is also normally distributed Y ∼ N(µY , σ²Y/n).

  • Even if Y is not normally distributed, the Central Limit Theorem states that as sample size n increases, the sample average can approximate a normal distribution, allowing flexibility in statistical methods used.

Bernoulli Random Variable Example

Distribution:

Consider Y as a Bernoulli random variable defined as:

  • Pr(Y = 1) = 0.78;

  • Pr(Y = 0) = 0.22.

This example shows how probabilities can be calculated and the expected values derived from these distributions.

  • Expected value E(Y) = 0.78, representing an anticipated outcome.

  • Variance: var(Y) = p(1-p) = 0.1716, indicating the degree of variability in the data.

Sampling Distribution with n = 2

Probabilities:
  • Pr(Y = 0) = 0.0484;

  • Pr(Y = 1/2) = 0.3432;

  • Pr(Y = 1) = 0.6084.

Standardization of Distribution

Standardization Process:

This process is essential for comparison:

  1. Re-center the distribution around mean 0 to simplify analysis.

  2. Divide by standard deviation to adjust the scale, allowing different distributions to be compared effectively.

Large Sample Theorem

Principle:

As n increases, the sampling distribution tightens around µY (Law of Large Numbers), ensuring consistency in estimations.

  • The standardized sample average (Y - µY)/σY approaches a normal distribution as sample size increases (Central Limit Theorem).

Convergence in Probability and Consistency

Concepts:

The sample average Y converges in probability to μY, which means that:

  • The probability that Y is in the interval [μY - c, μY + c] approaches 1 as n increases for any c > 0.

The Law of Large Numbers

Assumptions:

  • Yi's should be independent and identically distributed.

  • Expectation E[Yi] = μY must hold for valid conclusions.

  • If large outliers are unlikely, Y is guaranteed to converge to μY, enhancing predictive reliability.

Central Limit Theorem

Statement:

For i.i.d. Yi's with E(Yi) = μY and var(Yi) = σ²Y (subject to constraints on σ), as n approaches infinity, the distribution of the standardized variable (Y - μY)/σY is well approximated by the standard normal distribution, facilitating inference about sample means relative to the population mean.