MK

Confidence Intervals and Binomial Distributions

Introduction to Confidence Intervals

  • Opening quote by Richard Feynman: "The first principle is that you must not fool yourself – and you are the easiest person to fool."

Key Learning Objectives

  • Understand the difference between probability distribution and cumulative binomial distribution.

  • Learn how to use binomial distributions to make inferences about a population.

  • Identify the assumptions that underpin confidence intervals.

  • Calculate 95% confidence intervals (CIs).

  • Comprehend the meaning behind the 95% confidence interval.

Binomial Distribution Example Questions

  • If you flip a coin 10 times, what’s the probability of observing exactly 7 heads?

  • Given that, on average, 5% of patients undergoing an operation get an infection, what’s the chance that 10 out of the next 30 patients will get an infection?

  • If 40% of voters are Democrats, what is the chance that a random sample of 600 voters will comprise 45% Democrats?

Cumulative Binomial Distribution

  • Question: If you flip a coin 10 times, what’s the probability of observing 7 or more heads?

  • Scenario: If 5% of patients get infections, what’s the chance that 10 or more of the next 30 patients will contract infections?

  • Query: If 40% of voters are GOP, what is the chance that a random sample of 600 people will be at least 45% GOP?

Probability vs Cumulative Density Function

  • Data from the U.S. National Lightning Detection Network shows the number of lightning strikes for the year 2022, peaking in late May:

    • Daily lightning counts peaked at 3.3 million strikes on May 21.

    • Cumulative lightning counts were presented in millions throughout the months from January to December 2022.

Binomial Distribution Characteristics

  • Binomial probability relies on a known probability for the population to predict the likelihood of observing various outcomes in random samples.

  • In scientific data, results are often represented as a single proportion.

Example I: Calculating Successes and Success Proportions

  • It’s crucial to determine how many successes occurred and the proportions thereof.

  • Consider if calculations will help make broader conclusions about a phenomenon.

Example I Continued: Estimating Ranges

  • Questions:

    • How far away from those calculated success proportions can the actual population percentages likely be?

    • How certain can we be of these estimates?

  • Confidence intervals provide a range of values that are expected to contain the overall population percentage.

Confidence Interval (CI)

  • Definition: A confidence interval (CI) represents a range of percentages in which we are 95% confident the true population value lies.

  • Queries posed: How do we derive this interval from our data?

Steps for Calculating CIs

  • First step: Plotting the data to visualize observed successes versus missed attempts.

  • Observations must be clear and unambiguous to avoid uncertainty about what we observed.

CI Calculation Methods

  • CF Clopper and Pearson method (1934).

  • Standard Wald method.

  • Modified Wald method.

Example II: Polling Voters

  • Scenario: 100 randomly selected voters are polled before an election, with 33 indicating a vote for a specific candidate.

  • Queries arise on what this implies about the proportion of voters who will support the candidate in the larger population.

Sample Representativity and Sampling Error

  • First Issue: Ensuring that the sample is genuinely representative of the population.

  • Second Issue: Accounting for potential sampling error in our results.

Limitations in Estimating Proportions

  • There is no absolute certainty about the proportion of voters in the population.

  • The best course of action is to calculate a range of possible values for this proportion, known as the 95% confidence interval.

Assumptions of Confidence Intervals

  1. Random representative sample: In polling, if the sample isn't randomly selected, this assumption is violated.

  2. Independent observations: Each observation must be selected independently of one another.

  3. Accurate data: Quality of data must be assured, as poor recording leads to unrelated results (garbage in, garbage out).

Understanding 95% CI Calculation

  • Confidence Interval Formula: CI = ar{p} ext{ ± } z imes SE Where:

    • ar{p} is the sample proportion.

    • SE = rac{ar{p}(1-ar{p})}{n}

    • z is the critical value drawn from the standard normal distribution.

Interpretation of Standard Errors

  • Reason behind multiplying the standard error by the z-value: To ascertain how many standard errors must be extended from the mean in order to obtain our desired confidence level (e.g., 1.96 for a 95% CI).

Insights from the Monty Hall Problem

  • The CI represents a range of acceptable values instead of a single expected value, showcasing assessment variability.

  • Illustrative example using probability bars calculated from multiple simulated experiments regarding confidence intervals.

Proportion Estimation Techniques

  • Explanation of 95% CI: When performing repeated experiments, 95% of confidence intervals will encompass the true population value.

  • True or False Question: "There is a 95% chance that the population value lies within this 95% CI" - understanding the semantics of confidence intervals is key.

Special Significance of 95%

  • Understanding what makes 95% a standard choice for confidence levels in statistical applications.

Best Practices for Calculating Binomial CIs

  1. Calculate adjusted proportion:
    p' = rac{s + 2}{n + 4}

  2. Compute Margin of Error (W):
    W = 2 imes ext{sqrt} rac{p'(1-p')}{n+4}

  3. Establish 95% confidence interval: From ( p' - W ) to ( p' + W ).

Derivation of Equation Constants

  • The constants “2” and “4” within the adjusted proportion formula: p' = rac{s + 2}{n + 4} are explained in context.

Confidence Interval Comparisons: Wald vs. Agresti-Coull

  • Comparative analysis of confidence intervals for varying sample sizes (n) and proportions (p) within produced graphical representations.

Key Reminders

  • Distinctive note on the variable p' not being equivalent to the p-value, where p' signifies proportion.

  • Explanation of margin of error and its respective direction in CI spans, and how the CI’s length correlates with margin of error.

Practical Case Studies

Case Study 1: Plant Growth Under Different Light Conditions

  • Research on plant species growth under various light types:

    • 75 had exposure to red light.

    • 45 exhibited significant growth improvement.

Case Study 2: Antibiotic Resistance in Bacteria

  • Study conducted by microbiologists into bacterial populations:

    • 120 colonies assessed.

    • 80 found to exhibit antibiotic resistance to specific treatments.

Case Study 3: Frog Population Sex Ratio in a Polluted Pond

  • Investigation by ecologists examining frog populations in habitats affected by pollution:

    • Of 60 frogged caught in a pond, 25 were determined to be female.