Sampling Distributions and Estimation

Sample statistic: A random variable whose value depends on which population items are included in the random sample.
Sampling distribution: A probability distribution of all possible values of a sample statistic for a given sample size selected from a population.
The sample statistic's ability to represent the population accurately depends on the sample size.
Sampling variation is illustrated using eight random samples of size $n = 5$ from a large population of GMAT scores.

Estimator: A statistic derived from a sample to infer the value of a population parameter.
Estimate: The value of the estimator in a particular sample.
Population parameters are represented by Greek letters, while corresponding statistics are represented by Roman letters.
- Sample mean ( $\bar{x}$ ) is the estimator for the population mean ( $\mu$ ).
- Sample proportion ( $p$ ) is the estimator for the population proportion ( $\pi$ ).
- Sample standard deviation ( $s$ ) is the estimator for the population standard deviation ( $\sigma$ ).
Sampling error: The difference between an estimate and the corresponding population parameter. For example:
- $Sampling Error = \bar{x} - \mu$
Bias: The difference between the expected value of the estimator and the true parameter.
- $Bias = E(\bar{X}) - \mu$
An estimator is unbiased if its expected value is the parameter being estimated.
- The sample mean is an unbiased estimator of the population mean since: $E(\bar{X}) = \mu$ . On average, an unbiased estimator neither overstates nor understates the true parameter.

Different samples of the same size from the same population will yield different sample means.
Standard Error of the Mean: A measure of the variability in the sample means (from a theoretical distribution of all possible sample means of sample size n) from sample to sample.
- $\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$ , assuming sampling with replacement or without replacement from an infinite population.
- The standard error of the mean decreases as the sample size increases.

If a population is normal with mean $\mu$ and standard deviation $\sigma$ , the sampling distribution of $\bar{x}$ is also normally distributed with:
- $\mu_{\bar{x}} = \mu$
- $\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$
Z-value for Sampling Distribution of the Mean:
- $Z = \frac{\bar{X} - \mu}{\sigma_{\bar{x}}} = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}$
- Where:
  - $\bar{X}$ = sample mean
  - $\mu$ = population mean
  - $\sigma$ = population standard deviation
  - $n$ = sample size

To find a symmetrically distributed interval around $\mu$ that will include 95% of the sample means when $\mu = 368$ , $\sigma = 15$ , and $n = 25$ :
- Since the interval contains 95% of the sample means, 5% of the sample means will be outside the interval.
- Since the interval is symmetric, 2.5% will be above the upper limit, and 2.5% will be below the lower limit.
- From the standardized normal table, the Z score with 2.5% (0.0250) below it is -1.96, and the Z score with 2.5% (0.0250) above it is 1.96.
- Calculating the lower limit of the interval:
  - $\bar{X}_L = \mu + Zscore(\frac{\sigma}{\sqrt{n}}) = 368 + (-1.96)(\frac{15}{\sqrt{25}}) = 362.12$
- Calculating the upper limit of the interval:
  - $\bar{X}_U = \mu + Zscore(\frac{\sigma}{\sqrt{n}}) = 368 + (1.96)(\frac{15}{\sqrt{25}}) = 373.88$
- Based on samples of size 25, the sample means in 95% of all samples are between 362.12 and 373.88.
Generalized Equation for the interval that contains some defined percentage of all sample means:
- $\mu \pm Zscore(\frac{\sigma}{\sqrt{n}})$

$\pi$ = the proportion of the population having some characteristic.
$\sigma^2$ of a proportion is defined as $\pi(1 - \pi)$ , so:
$\sigma$ of a proportion is defined as $\sqrt{\pi(1 - \pi)}$
Sample proportion ( $p$ ) provides an estimate of $\pi$ :
- p = \frac{X}{n} = \frac{# \ of \ items \ in \ the \ sample \ of \ interest}{sample \ size}
- $0 \leq p \leq 1$
- $p$ is approximately distributed as a normal distribution when n is large (assuming sampling with replacement from a finite population or without replacement from an infinite population).

Sampling Distribution of p is approximated by a normal distribution if:
- n\pi > 5 and n(1 - \pi) > 5
- Where: $\mu<em>p = \pi$ and $\sigma</em>p = \sqrt{\frac{\pi(1-\pi)}{n}}$
- Where: $\pi$ is the population proportion.
Z-Value for Proportions
- Standardize p to a Z value with the formula:
  - $Z = \frac{p - \pi}{\sigma_p} = \frac{p - \pi}{\sqrt{\frac{\pi(1-\pi)}{n}}} = \frac{p - \pi}{\sqrt{\frac{\pi(1-\pi)}{n}}}$
Example: If the true proportion of voters who support Proposition A is $\pi = 0.4$ , what is the probability that a sample of size 200 yields a sample proportion between 0.40 and 0.45?
- i.e.: if $\pi = 0.4$ and $n = 200$ , what is $P(0.40 \leq p \leq 0.45)$ ?
  - Find $\sigma_p$ :
    - $\sigma_p = \sqrt{\frac{\pi(1-\pi)}{n}} = \sqrt{\frac{0.40(1-0.40)}{200}} = 0.03464$
  - $P(0.40 \leq p \leq 0.45) = P(\frac{0.40 - 0.40}{0.03464} \leq Z \leq \frac{0.45 - 0.40}{0.03464}) = P(0 \leq Z \leq 1.44)$
  - Utilize the cumulative normal table:
    - $P(0 \leq Z \leq 1.44) = 0.9251 – 0.5000 = 0.4251$
    - Rounded to 2 significant digits = 0.43 or 43%