Confidence Intervals and Binomial Distributions

Introduction to Confidence Intervals

Opening quote by Richard Feynman: "The first principle is that you must not fool yourself – and you are the easiest person to fool."

Key Learning Objectives

Understand the difference between probability distribution and cumulative binomial distribution.
Learn how to use binomial distributions to make inferences about a population.
Identify the assumptions that underpin confidence intervals.
Calculate 95% confidence intervals (CIs).
Comprehend the meaning behind the 95% confidence interval.

Binomial Distribution Example Questions

If you flip a coin 10 times, what’s the probability of observing exactly 7 heads?
Given that, on average, 5% of patients undergoing an operation get an infection, what’s the chance that 10 out of the next 30 patients will get an infection?
If 40% of voters are Democrats, what is the chance that a random sample of 600 voters will comprise 45% Democrats?

Cumulative Binomial Distribution

Question: If you flip a coin 10 times, what’s the probability of observing 7 or more heads?
Scenario: If 5% of patients get infections, what’s the chance that 10 or more of the next 30 patients will contract infections?
Query: If 40% of voters are GOP, what is the chance that a random sample of 600 people will be at least 45% GOP?

Probability vs Cumulative Density Function

Data from the U.S. National Lightning Detection Network shows the number of lightning strikes for the year 2022, peaking in late May:
- Daily lightning counts peaked at 3.3 million strikes on May 21.
- Cumulative lightning counts were presented in millions throughout the months from January to December 2022.

Binomial Distribution Characteristics

Binomial probability relies on a known probability for the population to predict the likelihood of observing various outcomes in random samples.
In scientific data, results are often represented as a single proportion.

Example I: Calculating Successes and Success Proportions

It’s crucial to determine how many successes occurred and the proportions thereof.
Consider if calculations will help make broader conclusions about a phenomenon.

Example I Continued: Estimating Ranges

Questions:
- How far away from those calculated success proportions can the actual population percentages likely be?
- How certain can we be of these estimates?
Confidence intervals provide a range of values that are expected to contain the overall population percentage.

Confidence Interval (CI)

Definition: A confidence interval (CI) represents a range of percentages in which we are 95% confident the true population value lies.
Queries posed: How do we derive this interval from our data?

Steps for Calculating CIs

First step: Plotting the data to visualize observed successes versus missed attempts.
Observations must be clear and unambiguous to avoid uncertainty about what we observed.

CI Calculation Methods

CF Clopper and Pearson method (1934).
Standard Wald method.
Modified Wald method.

Example II: Polling Voters

Scenario: 100 randomly selected voters are polled before an election, with 33 indicating a vote for a specific candidate.
Queries arise on what this implies about the proportion of voters who will support the candidate in the larger population.

Sample Representativity and Sampling Error

First Issue: Ensuring that the sample is genuinely representative of the population.
Second Issue: Accounting for potential sampling error in our results.

Limitations in Estimating Proportions

There is no absolute certainty about the proportion of voters in the population.
The best course of action is to calculate a range of possible values for this proportion, known as the 95% confidence interval.

Assumptions of Confidence Intervals

Random representative sample: In polling, if the sample isn't randomly selected, this assumption is violated.
Independent observations: Each observation must be selected independently of one another.
Accurate data: Quality of data must be assured, as poor recording leads to unrelated results (garbage in, garbage out).

Understanding 95% CI Calculation

Confidence Interval Formula: CI = ar{p} ext{ ± } z imes SE Where:
- ar{p} is the sample proportion.
- SE = rac{ar{p}(1-ar{p})}{n}
- z is the critical value drawn from the standard normal distribution.

Interpretation of Standard Errors

Reason behind multiplying the standard error by the z-value: To ascertain how many standard errors must be extended from the mean in order to obtain our desired confidence level (e.g., 1.96 for a 95% CI).

Insights from the Monty Hall Problem

The CI represents a range of acceptable values instead of a single expected value, showcasing assessment variability.
Illustrative example using probability bars calculated from multiple simulated experiments regarding confidence intervals.

Proportion Estimation Techniques

Explanation of 95% CI: When performing repeated experiments, 95% of confidence intervals will encompass the true population value.
True or False Question: "There is a 95% chance that the population value lies within this 95% CI" - understanding the semantics of confidence intervals is key.

Special Significance of 95%

Understanding what makes 95% a standard choice for confidence levels in statistical applications.

Best Practices for Calculating Binomial CIs

Calculate adjusted proportion:
p' = rac{s + 2}{n + 4}
Compute Margin of Error (W):
W = 2 imes ext{sqrt} rac{p'(1-p')}{n+4}
Establish 95% confidence interval: From ( p' - W ) to ( p' + W ).

Derivation of Equation Constants

The constants “2” and “4” within the adjusted proportion formula: p' = rac{s + 2}{n + 4} are explained in context.

Confidence Interval Comparisons: Wald vs. Agresti-Coull

Comparative analysis of confidence intervals for varying sample sizes (n) and proportions (p) within produced graphical representations.

Key Reminders

Distinctive note on the variable p' not being equivalent to the p-value, where p' signifies proportion.
Explanation of margin of error and its respective direction in CI spans, and how the CI’s length correlates with margin of error.

Practical Case Studies

Case Study 1: Plant Growth Under Different Light Conditions

Research on plant species growth under various light types:
- 75 had exposure to red light.
- 45 exhibited significant growth improvement.

Case Study 2: Antibiotic Resistance in Bacteria

Study conducted by microbiologists into bacterial populations:
- 120 colonies assessed.
- 80 found to exhibit antibiotic resistance to specific treatments.

Case Study 3: Frog Population Sex Ratio in a Polluted Pond

Investigation by ecologists examining frog populations in habitats affected by pollution:
- Of 60 frogged caught in a pond, 25 were determined to be female.