Study Notes on Sampling Distributions and Standard Error

Sampling Distributions and Standard Error

Introduction

  • Content is based on the textbook "Statistics, Biology, and R" by Arndt F. Laemmerzahl.

Samples and Populations

  • Objective: Estimate average reaction time for adults at a concussion clinic.
  • Population: All eligible adults seen at the concussion clinic during the study period under the same reaction-time testing protocol. This is the group from which we intend to draw conclusions.
  • Sample: A randomly selected group of 25 adults tested under the same protocol.
  • Sample Size (n): 25
  • Example Measurement: Average reaction time from the sample is 362 ms (milliseconds).
  • Distribution Characteristics: The sample distribution of individual reaction times is right-skewed (the right tail is longer). A few slower reaction times pull the distribution to the right.

Repeating the Sampling Process

  • Second Sample (Sample B): Another random sample of 25 adults, average reaction time is 315 ms.
  • Distribution Characteristics of Sample B:
    • Distribution appears closer to normal.
    • Possible explanations:
    • Random sampling variation: Each sample consists of different individuals; thus, the shapes and means can differ even if drawn from the same population.
    • Skew and Outliers: In cases of right-skewed data, certain very slow responses can artificially inflate the average.
    • Sample B has a more balanced mix of faster and slower times around its mean, leading to a more symmetric histogram.

Sampling Distribution

  • Definitions:
    • Sample Distribution: Refers to the distribution of individual observations within one sample (in this case, the 25 reaction times).
    • Sampling Distribution: Consider the sample means rather than individual observations.
  • Constructing Sampling Distribution:
    • By repeatedly taking random samples of size 25 and calculating their means, you can create a histogram of sample means.
  • Purpose of Focusing on Sample Means:
    • Estimate population average reaction time (µ) directly using the sample mean (𝑦̅).
    • Sample mean provides a summary that smooths variability from individual data points.
  • Sampling Distribution Behavior:
    • Initially, with 12 samples, the distribution of sample means did not appear normal due to insufficient samples.
    • Increasing the number of samples to 120 starts to show a bell-shaped distribution.
    • Further increase to 15,000 samples continues to shape the distribution more clearly.

Understanding the Sampling Distribution

  • Key Takeaways:
    • A single sample can differ markedly from the population mean (µ) due to random sampling variations including shape, center, and outliers.
    • Collecting more samples stabilizes the distribution of sample means, centering it close to µ and revealing the underlying distribution without altering it.
    • This behavior underpins the Central Limit Theorem.

Standard Error

  • Definition: Represents how accurately a sample mean estimates the true population mean.
  • Formula for Standard Error:
    SE = rac{ ext{s}}{ ext{n}} where s is the sample standard deviation and n is the sample size.
  • Distinguishing Standard Deviation from Standard Error:
    • Standard Deviation (SD): Represents the spread of individual data points within a sample relative to the sample mean.
    • Standard Error (SE): Reflects the variation of sample means across multiple samples relative to the population mean.
  • Estimation: Since true values of µ and σ are often unknown, we use estimates; substituting: SE = rac{ar{Y} - ext{µ}}{ ext{s} / ext{n}} .

Reliability of Estimates Based on Standard Error

  • Interpretation of SE:
    • A small SE indicates a reliable estimate of µ by the sample mean.
    • A large SE suggests poor estimation of µ.
  • Limitations of SE:
    • SE does not provide information on the spread of individual data points in the sample (which is captured by standard deviation).

Impact of Sample Size

  • Effect on 𝑦̅:
    • Larger sample sizes lead to better representation of the population, resulting in means closer to the true population mean.
    • Conversely, smaller samples fluctuate more, often leading the sample mean further from true µ as sample size decreases.
    • As n approaches infinity, the relationship converges: ar{Y} o ext{µ} .
  • Effect on σ:
    • σ remains a constant value, representing the overall population standard deviation regardless of the number of samples taken. However, taking more samples leads to a better estimate of σ.
  • Effect on s (Sample SD):
    • Sample SD s changes with sample size.
    • Larger samples yield a sample SD that more closely represents the population SD.
    • Conversely, smaller samples yield larger sample SDs that don't represent the population as well.
  • Effect on SE Based on Sample Size:
    • As the sample size increases, SE = rac{s}{n} decreases, leading to a more reliable estimate of the population mean.
  • Specific Example:
    • Calculating SE for n = 50, SE = 4.67 / 50 = 0.6604.
    • For n = 100, SE = 4.67 / 100 = 0.467.

Case Studies and Examples

  1. Gene Expression Study:
    • True average (µ) = 150 units; σ = 30 units.
    • Impact of increasing sample size from 10 to 100: SE decreases.
  2. Fish Species Length Study:
    • Initial sample of 20 fish vs. 200 fish—fluctuations in sample mean observed.
  3. Deer Body Weight Study:
    • Increasing sample size from 50 to 1,000 won't change σ, which is fixed.
  4. Cholesterol Level Study:
    • Population mean (µ) remains unchanged even if sample size is increased or decreased.
  5. Height of Trees Study:
    • Probability questions are framed around z-scores to assess likelihood.
      • Example: Finding the probability of sample mean being less than 70 meters with calculated z-scores.

Practice Problems

  1. mRNA Levels Calculation:
    • Calculate probabilities for various scenarios based on true mean (μ=200) and standard deviation (σ=25) mRNA levels across 40 samples:
      a) Pr{ 𝑌̅ > 210 } = 0.0057
      b) Pr{ 𝑌̅ < 187 } = 0.0005 c) Pr{ 192 < 𝑌̅ < 207 } = 0.9399 d) Pr{ 𝑌̅ > 207 OR 𝑌̅ < 192 } = 0.0601