CH9 – Estimation of the Mean

Understanding How We Estimate Things

  • What is Estimation?

    • It's like taking a small peek (a sample) to guess something big and unknown (the whole population). We use information from our small peek to make a smart guess about the bigger picture.

    • The guess we make is called an estimate.

    • The tool we use to make that guess is called an estimator (for example, if we want to guess the average height of all students, we might measure the average height of a few students; that average from the few students is our estimator).

  • 4 Simple Steps to Estimate Anything

    1. Pick a good, representative group (sample) to look at.

    2. Get all the needed information from everyone in that group.

    3. Calculate a basic number (statistic) from your group.

    4. Use that number to guess the real value for the whole population. This guess can be a single number (point estimate) or a range (interval estimate).

Making a Single Guess (Point Estimates)

  • What is it? It's just one single number that we think is the best guess for the true value of something in the whole population. For example, if we measure the average height of 100 people and get 5'8", that 5'8" is our point estimate for the average height of everyone.

  • Good to know: A single guess doesn't tell us how accurate it might be. That's why we often use a range of guesses too.

Making a Range Guess (Interval Estimates & Confidence Intervals)

  • Idea of Interval Estimation

    • Instead of just one guess, we create a range of numbers where we're pretty sure the true value lies. It's like saying, "I'm 95% confident the true average height is between 5'7" and 5'9"."

    • It generally looks like this:
      \text{Our best single guess} \; \pm \; \text{A wiggle room (Margin of Error)}

  • Margin of Error (E)

    • This is the "wiggle room" we add and subtract from our single guess. It accounts for how much our sample might vary from the true population.

    • How big this wiggle room is depends on things like how spread out the data is, how many people we sampled, and how confident we want to be.

  • Confidence Level (CL)

    • This tells us how sure we are about our method. If we say we're 95% confident, it means that if we repeated this guessing process many times, 95% of the ranges we create would actually contain the true value. It's a statement about our process, not about a single range once it's made.

    • Common levels are 90\%, 95\%, 99\% Confidence. So, a 95% CI means: "If we tried to make this interval many times, 95% of those intervals would capture the real average."

Guessing the Average When We Know the Population's Spread (Standard Deviation \sigma Known)

  • Three Situations

    1. Small sample (less than 30) AND population looks like a "normal" bell curve: Use the standard normal (z) distribution.

    2. Large sample (30 or more): Even if the population isn't perfectly normal, a big sample usually behaves normally, so we still use the standard normal (z) distribution.

    3. Small sample (less than 30) AND population is NOT normal or unknown: Our usual methods won't work well here; we need special non-normal (non-parametric) methods.

  • Formula for making the range guess (for situations 1 & 2):
    \text{Sample average} \; \pm \; \text{Z-score for confidence level} \times \frac{\text{Population spread}}{\sqrt{\text{Sample size}}}

    • We find the Z-score from a special table.

  • How to calculate the Wiggle Room (Margin of Error) when Population Spread is Known:
    E \; = \; \text{Z-score for confidence level} \times \frac{\text{Population spread}}{\sqrt{\text{Sample size}}}

  • Common Z-Scores for different Confidence Levels:

    • 90 % \Rightarrow 1.645

    • 95 % \Rightarrow 1.96

    • 99 % \Rightarrow 2.575

  • Controlling the Width of Your Range

    • The total width of your range is simply 2 \times E.

    • You can make your range narrower (more precise) or wider (less precise) by:

      • Changing your confidence level (e.g., being less confident makes the range narrower).

      • Changing your sample size (more samples generally make the range narrower).

    • The population's natural spread is fixed; you can't change that.

  • Figuring out how many people to sample for a desired wiggle room:
    \text{Sample size} = \frac{(\text{Z-score})^2 \times (\text{Population spread})^2}{(\text{Desired wiggle room})^2}

    • Always round the result UP to the next whole number.

Guessing the Average When We DON'T Know the Population's Spread (\sigma Is UNKNOWN)

  • Situations when we don't know the spread

    1. Small sample (less than 30) AND population looks like a "normal" bell curve: We use a special distribution called Student’s t.

    2. Large sample (30 or more): We still use the t distribution, but for large samples, it behaves almost exactly like the normal (z) distribution.

    3. Small sample (less than 30) AND population is NOT normal or unknown: Again, our usual methods won't work; we need non-normal (non-parametric) methods.

  • About Student’s t Distribution

    • It's also bell-shaped but has "fatter" tails than the normal curve, meaning extreme values are a bit more likely.

    • It has a parameter called degrees of freedom (df), which is simply n-1 (sample size minus 1).

    • As the degrees of freedom get larger, the t distribution looks more and more like the normal distribution.

    • We use a special t-table to find critical values.

  • Formula for making the range guess (when population spread is unknown):
    \text{Sample average} \; \pm \; \text{t-score for confidence and df} \times \frac{\text{Sample spread}}{\sqrt{\text{Sample size}}}

    • Here, "sample spread" (s) is the standard deviation calculated from your sample.

    • Wiggle Room (Margin of Error): E = \text{t-score for confidence and df} \times \frac{\text{Sample spread}}{\sqrt{\text{Sample size}}}

Key Ideas to Remember

  • The Trade-off Triangle: If you want to be more confident (higher confidence level), your wiggle room (margin of error) will get bigger, unless you increase your sample size. More samples usually mean a smaller wiggle room for the same confidence.

  • When things aren't normal: If your data doesn't fit the typical bell curve and you have a small sample, you might need different statistical tools (non-parametric methods).

Quick Formulas Reference

  • Range guess for average, when population spread is known (z):
    \bar{x} \pm z_{\alpha/2} \sigma/\sqrt{n}

  • Range guess for average, when population spread is unknown (t):
    \bar{x} \pm t_{\alpha/2,df} \; s/\sqrt{n}

  • How to calculate Margin of Error: E = \text{critical value (z or t)} \times \text{Standard Error (SE)}

  • How many samples needed for a desired wiggle room (when population spread is known):
    n = z_{\alpha/2}^2 \sigma^2 / E^2

  • Degrees of Freedom for average guesses: df = n - 1