Chapter 5 Sampling Distributions Flashcards

5.1 Toward Statistical Inference

Parameters and Statistics:
- A parameter describes a characteristic of a population; its value is usually unknown.
- A statistic describes a characteristic of a sample; its value can be computed from the sample data and varies from sample to sample.
- Statistics are used to estimate unknown parameters.
- Mnemonic: "s and p" - statistics come from samples, parameters come from populations.
- µ (mu) represents the population mean, and σ represents the population standard deviation.
- \bar{x} (x-bar) represents the sample mean, and s represents the sample standard deviation.
Statistical Estimation:
- Statistical inference involves using sample information to draw conclusions about the wider population.
- Different random samples yield different statistics, necessitating the description of the sampling distribution of possible statistic values.
Sampling Variability:
- Sampling variability refers to the variation of a statistic's value in repeated random sampling.
- To understand sampling variability, consider what would happen if many samples were taken.
Sampling Distributions:
- The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.
- It consists of all possible values of the statistic and the relative frequency of each value.
- This distribution can be plotted using a histogram.
Simulation:
- In practice, obtaining the actual sampling distribution by taking all possible samples is difficult.
- Simulation can be used to imitate the process of taking many samples.
Bias and Variability:
- Bias concerns the center of the sampling distribution.
  - An unbiased statistic has a mean of its sampling distribution equal to the true value of the parameter.
- Variability is the spread of the sampling distribution, determined by the sampling design and sample size n.
  - Larger samples have smaller spreads.
Analogy:
- The true population parameter is like the bull’s-eye on a target, and the sample statistic is like an arrow fired at the target.
  - Bias and variability describe the pattern of many shots at the target.
Managing Bias and Variability:
- Reduce bias by using random sampling.
- Reduce variability by using a larger sample size.
- The variability of a statistic from a random sample does not depend on the population size, as long as the population is at least 20 times larger than the sample.
Why Randomize?
- The purpose of a sample is to provide information about a larger population, and inference is the process of drawing conclusions about a population based on sample data.
- Reasons to use random sampling:
  1. Eliminates bias.
  2. Allows trustworthy inference using probability laws, including a margin of error.
  3. Larger random samples provide better information.

5.2 The Sampling Distribution of a Sample Mean

Population Distribution:
- The population distribution is the distribution of values of a variable among all individuals in the population.
- It is also the probability distribution of the variable when one individual is chosen at random.
- In some cases, the population of interest does not actually exist, such as future exam scores.
Mean and Standard Deviation of a Sample Mean:
- The mean of the sampling distribution of the sample mean is an unbiased estimate of the population mean µ.
- The standard deviation of the sampling distribution measures how much the sample statistic varies from sample to sample.
  - It is smaller than the standard deviation of the population by a factor of \sqrt{n}. Averages are less variable than individual observations.
- \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
The Sampling Distribution of Sample Means:
- When choosing many SRSs from a population, the sampling distribution of the sample mean is centered at the population mean µ and is less spread out than the population distribution.
The Central Limit Theorem:
- As the sample size increases, the distribution of sample means begins to resemble a Normal distribution, regardless of the population distribution shape, provided the population has a finite standard deviation.
A Few More Facts:
- Any linear combination of independent Normal random variables is also Normally distributed.
- More generally, the central limit theorem notes that the distribution of a sum or average of many small random quantities is close to Normal.
- The central limit theorem also applies to discrete random variables.
  - An average of discrete random variables will never result in a continuous sampling distribution, but the Normal distribution often serves as a good approximation.

5.3 Sampling Distributions for Counts and Proportions

The Binomial Setting:
- A binomial setting arises when performing several independent trials of the same chance process and recording the number of times a particular outcome (success) occurs.
- The four conditions (BINS) are:
  - Binary: Outcomes can be classified as "success" or "failure."
  - Independent: Trials must be independent.
  - Number: The number of trials n must be fixed in advance.
  - Success: The probability p of success must be the same on every trial.
Binomial Distribution:
- The count X of successes in a binomial setting has the binomial distribution with parameters n and p, denoted as X \sim B(n, p).
- The possible values of X are whole numbers from 0 to n.
Form of the Binomial Distribution:
- In a binomial setting with n trials and success probability p, the probability of exactly k successes is:
- P(X = k) = {n \choose k} * p^k * (1 - p)^{(n-k)} = \frac{n!}{k! (n-k)!} * p^k * (1 - p)^{(n-k)}
  Note: the LaTex displayed here use the single backslash as required by the instructions
- k! means k(k – 1)(k – 2) . . . 2(1). For example, 5! = 5(4)(3)(2)(1) = 120 and 0! = 1.
Binomial Mean and Standard Deviation:
- If X has a binomial distribution with n trials and success probability p, the mean and standard deviation of X are:
  - µ_X = np
  - σ_X = \sqrt{np(1 - p)}
- Note: These formulas work ONLY for binomial distributions.
Normal Approximation for Binomial Distributions:
- When n is large, the distribution of X is approximately Normal with mean and standard deviation:
  - µ_X = np
  - σ_X = \sqrt{np(1 - p)}
- Rule of thumb: Use the Normal approximation when np ≥ 10 and n(1 – p) ≥ 10.
Sample Proportion:
- There is an important connection between the sample proportion \hat{p} and the number of “successes” X in the sample.
  - \hat{p} = \frac{\text{count of successes in sample}}{\text{size of sample}} = \frac{X}{n}
Sampling Distribution of a Sample Proportion:
- Choose an SRS of size n from a population of size N with proportion p of successes. Let \hat{p} be the sample proportion of successes. Then:
  - The mean of the sampling distribution is p.
  - The standard deviation of the sampling distribution is σ_{\hat{p}} = \sqrt{\frac{p(1 - p)}{n}}.
  - For large n, \hat{p} has approximately the N(p, \sqrt{\frac{p(1 - p)}{n}})
Sampling Distribution of a Sample Proportion Example:
Considering the previous online shopping example. What is the probability that at least 58\% of 2500 adults agree?
Normal approximation for counts and proportions
Draw an SRS of size n from a large population having population proportion p of successes. Let X be the count of successes in the sample and p= x/n be the sample proportion of successes. When n is large, the sampling distributions of these statistics are approximately Normal:
X is approximately N (np, \sqrt{np(1 - p)}
p is approximately N (P, \sqrt{\frac{p(1 - p)}{n}})