Probability and Samples: The Distribution of Sample Means

Chapter 7: Probability and Samples: The Distribution of Sample Means

Chapter Learning Objectives

  • Define the distribution of sample means.

  • For a specific sampling situation, describe the distribution by identifying its:

    • Shape

    • Expected value of M (the sample mean)

    • Standard error of M

  • Utilize the unit normal table to determine probabilities or z-scores interchangeably.

  • Understand that each sample mean $M$ has a specific location within the distribution of sample means, which can be quantified by a z-score.

  • Using the distribution of sample means, z-scores, and the unit normal table, determine probabilities corresponding to specific sample means.

Samples, Populations, and the Distribution of Sample Means

  • Individual Scores: Whenever a score is selected from a population, a z-score can be computed to describe its exact location in the distribution.

    • If the population is normal, it's possible to determine the probability value for obtaining any individual score.

  • Difficulty with Samples: Working with samples is challenging because a sample provides only an incomplete picture of the overall population.

  • Sampling Error: This is defined as the natural discrepancy or amount of error that exists between a sample statistic (e.g., sample mean $M$) and its corresponding population parameter (e.g., population mean $\mu$).

The Distribution of Sample Means

  • Definition: The distribution of sample means is the collection of all possible sample means for all the random samples of a particular size ($n$) that can be obtained from a population.

  • Sampling Distribution: More broadly, a sampling distribution is any distribution of statistics obtained by selecting all possible samples of a specific size from a population (e.g., distribution of sample variances, distribution of sample medians, etc.).

Characteristics of the Distribution of Sample Means

  • Central Tendency: The sample means tend to pile up around the population mean $\mu$.

  • Shape: The pile of sample means generally tends to form a normal-shaped distribution.

  • Effect of Sample Size: As the sample size ($n$) increases, the sample means tend to be closer to the population mean $\mu$. This implies less variability among sample means for larger samples.

The Central Limit Theorem (CLT)

  • Statement: For any population with a mean $\mu$ and a standard deviation $\sigma$, the distribution of sample means for sample size $n$ will have:

    • A mean of $\mu$.

    • A standard deviation of $\sigma / romString{\sqrt{n}}$ (this is known as the standard error of M).

    • It will approach a normal distribution as $n$ approaches infinity.

  • Application: The CLT applies to the distribution of sample means for any population, regardless of its original shape, mean, or standard deviation.

  • Rate of Approach to Normality: The distribution of sample means approaches a normal distribution very rapidly. Even with a relatively small sample size, such as $n = 30$, the distribution is already almost perfectly normal.

The Shape of the Distribution of Sample Means

  • The distribution of sample means is considered almost perfectly normal if at least one of the following two conditions is met:

    1. The population from which the samples are selected already has a normal distribution.

    2. The number of scores ($n$) in each sample is relatively large, typically around 30 or more.

The Standard Error of M

  • Definition: The standard error of $M$ (${\sigma_M}$) is defined as the standard deviation of the distribution of sample means. It quantifies the standard distance between a sample mean ($M$) and the population mean ($\mu$).

  • Significance: It serves as a measure of how accurately, on average, a sample mean represents its corresponding population mean. A smaller standard error indicates a more precise representation.

  • Formula: ${\sigma_M} = \sigma / romString{\sqrt{n}}$

Factors Determining the Magnitude of the Standard Error

  • The magnitude of the standard error of M is determined by two primary factors:

    1. The Size of the Sample ($n$):

      • Law of Large Numbers: This principle states that the larger the sample size ($n$), the more probable it is that the sample mean will be close to the population mean.

      • Relationship: There is an inverse relationship between the sample size and the standard error. Larger samples lead to smaller error, meaning the sample means are more tightly clustered around the population mean. Conversely, smaller samples result in larger error.

    2. The Standard Deviation of the Population ($\sigma$): (Directly influences the numerator in the standard error formula, meaning larger population variability leads to larger standard error).

Probability and the Distribution of Sample Means

  • Using the Unit Normal Table: Because the distribution of sample means tends to be normal, z-score values obtained for sample means can be used with the unit normal table to determine probabilities.

  • Procedure Similarity: The general procedure for computing z-scores and finding probabilities for sample means is essentially the same as for individual scores.

  • Crucial Differences for Sample Means:

    • Always remember to consider the sample size ($n$) and first compute the standard error (${\sigma_M}$) before attempting any other calculations.

    • Before using the unit normal table, ensure that the distribution of sample means satisfies at least one of the criteria for a normal shape (either the population is normal, or $n \ge 30$).

z-Scores and Location within the Distribution of Sample Means

  • z-Score Formula for Sample Means: Within the distribution of sample means, the location of each sample mean ($M$) can be specified by a z-score:
    z = \frac{M - \mu}{\sigma_M}

  • Interpretation of z-Scores:

    • A positive z-score indicates that the sample mean ($M$) is greater than the population mean ($\mu$).

    • A negative z-score indicates that the sample mean ($M$) is smaller than the population mean ($\mu$).

    • The numerical value of the z-score quantifies the distance between $M$ and $\mu$, measured in units of the standard error (${\sigma_M}$). For example, $z = +2.00$ means $M$ is two standard errors above $\mu$.

Example: Probability with Sample Means (SAT Scores)

  • Scenario: A population of SAT scores forms a normal distribution with a mean $\mu = 500$ and a standard deviation $\sigma = 100$.

  • Question: If a random sample of $n = 25$ students is taken, what is the probability that the sample mean ($M$) will be greater than $M = 540$?

  • Restating as a Proportion Question: Out of all possible sample means from this population with $n = 25$, what proportion has values greater than 540?