CM

Chapter 5 Sampling Distributions Flashcards

5.1 Toward Statistical Inference

  • Parameters and Statistics:

    • A parameter describes a characteristic of a population; its value is usually unknown.

    • A statistic describes a characteristic of a sample; its value can be computed from the sample data and varies from sample to sample.

    • Statistics are used to estimate unknown parameters.

    • Mnemonic: "s and p" - statistics come from samples, parameters come from populations.

    • µ (mu) represents the population mean, and σ represents the population standard deviation.

    • \bar{x} (x-bar) represents the sample mean, and s represents the sample standard deviation.

  • Statistical Estimation:

    • Statistical inference involves using sample information to draw conclusions about the wider population.

    • Different random samples yield different statistics, necessitating the description of the sampling distribution of possible statistic values.

  • Sampling Variability:

    • Sampling variability refers to the variation of a statistic's value in repeated random sampling.

    • To understand sampling variability, consider what would happen if many samples were taken.

  • Sampling Distributions:

    • The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.

    • It consists of all possible values of the statistic and the relative frequency of each value.

    • This distribution can be plotted using a histogram.

  • Simulation:

    • In practice, obtaining the actual sampling distribution by taking all possible samples is difficult.

    • Simulation can be used to imitate the process of taking many samples.

  • Bias and Variability:

    • Bias concerns the center of the sampling distribution.

      • An unbiased statistic has a mean of its sampling distribution equal to the true value of the parameter.

    • Variability is the spread of the sampling distribution, determined by the sampling design and sample size n.

      • Larger samples have smaller spreads.

  • Analogy:

    • The true population parameter is like the bull’s-eye on a target, and the sample statistic is like an arrow fired at the target.

      • Bias and variability describe the pattern of many shots at the target.

  • Managing Bias and Variability:

    • Reduce bias by using random sampling.

    • Reduce variability by using a larger sample size.

    • The variability of a statistic from a random sample does not depend on the population size, as long as the population is at least 20 times larger than the sample.

  • Why Randomize?

    • The purpose of a sample is to provide information about a larger population, and inference is the process of drawing conclusions about a population based on sample data.

    • Reasons to use random sampling:

      1. Eliminates bias.

      2. Allows trustworthy inference using probability laws, including a margin of error.

      3. Larger random samples provide better information.

5.2 The Sampling Distribution of a Sample Mean

  • Population Distribution:

    • The population distribution is the distribution of values of a variable among all individuals in the population.

    • It is also the probability distribution of the variable when one individual is chosen at random.

    • In some cases, the population of interest does not actually exist, such as future exam scores.

  • Mean and Standard Deviation of a Sample Mean:

    • The mean of the sampling distribution of the sample mean is an unbiased estimate of the population mean µ.

    • The standard deviation of the sampling distribution measures how much the sample statistic varies from sample to sample.

      • It is smaller than the standard deviation of the population by a factor of \sqrt{n}. Averages are less variable than individual observations.

    • \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

  • The Sampling Distribution of Sample Means:

    • When choosing many SRSs from a population, the sampling distribution of the sample mean is centered at the population mean µ and is less spread out than the population distribution.

  • The Central Limit Theorem:

    • As the sample size increases, the distribution of sample means begins to resemble a Normal distribution, regardless of the population distribution shape, provided the population has a finite standard deviation.

  • A Few More Facts:

    • Any linear combination of independent Normal random variables is also Normally distributed.

    • More generally, the central limit theorem notes that the distribution of a sum or average of many small random quantities is close to Normal.

    • The central limit theorem also applies to discrete random variables.

      • An average of discrete random variables will never result in a continuous sampling distribution, but the Normal distribution often serves as a good approximation.

5.3 Sampling Distributions for Counts and Proportions

  • The Binomial Setting:

    • A binomial setting arises when performing several independent trials of the same chance process and recording the number of times a particular outcome (success) occurs.

    • The four conditions (BINS) are:

      • Binary: Outcomes can be classified as "success" or "failure."

      • Independent: Trials must be independent.

      • Number: The number of trials n must be fixed in advance.

      • Success: The probability p of success must be the same on every trial.

  • Binomial Distribution:

    • The count X of successes in a binomial setting has the binomial distribution with parameters n and p, denoted as X \sim B(n, p).

    • The possible values of X are whole numbers from 0 to n.

  • Form of the Binomial Distribution:

    • In a binomial setting with n trials and success probability p, the probability of exactly k successes is:

    • P(X = k) = {n \choose k} * p^k * (1 - p)^{(n-k)} = \frac{n!}{k! (n-k)!} * p^k * (1 - p)^{(n-k)}
      Note: the LaTex displayed here use the single backslash as required by the instructions

    • k! means k(k – 1)(k – 2) . . . 2(1). For example, 5! = 5(4)(3)(2)(1) = 120 and 0! = 1.

  • Binomial Mean and Standard Deviation:

    • If X has a binomial distribution with n trials and success probability p, the mean and standard deviation of X are:

      • µ_X = np

      • σ_X = \sqrt{np(1 - p)}

    • Note: These formulas work ONLY for binomial distributions.

  • Normal Approximation for Binomial Distributions:

    • When n is large, the distribution of X is approximately Normal with mean and standard deviation:

      • µ_X = np

      • σ_X = \sqrt{np(1 - p)}

    • Rule of thumb: Use the Normal approximation when np ≥ 10 and n(1 – p) ≥ 10.

  • Sample Proportion:

    • There is an important connection between the sample proportion \hat{p} and the number of “successes” X in the sample.

      • \hat{p} = \frac{\text{count of successes in sample}}{\text{size of sample}} = \frac{X}{n}

  • Sampling Distribution of a Sample Proportion:

    • Choose an SRS of size n from a population of size N with proportion p of successes. Let \hat{p} be the sample proportion of successes. Then:

      • The mean of the sampling distribution is p.

      • The standard deviation of the sampling distribution is σ_{\hat{p}} = \sqrt{\frac{p(1 - p)}{n}}.

      • For large n, \hat{p} has approximately the N(p, \sqrt{\frac{p(1 - p)}{n}})

  • Sampling Distribution of a Sample Proportion Example:
    Considering the previous online shopping example. What is the probability that at least 58\% of 2500 adults agree?

  • Normal approximation for counts and proportions
    Draw an SRS of size n from a large population having population proportion p of successes. Let X be the count of successes in the sample and p= x/n be the sample proportion of successes. When n is large, the sampling distributions of these statistics are approximately Normal:
    X is approximately N (np, \sqrt{np(1 - p)}
    p is approximately N (P, \sqrt{\frac{p(1 - p)}{n}})

As a rule of thumb, we will use this approximation for values of n and p that satisfy np > 10 and n(1 - p) > 10.