Confidence Intervals and Binomial Distributions
Introduction to Confidence Intervals
Opening quote by Richard Feynman: "The first principle is that you must not fool yourself – and you are the easiest person to fool."
Key Learning Objectives
Understand the difference between probability distribution and cumulative binomial distribution.
Learn how to use binomial distributions to make inferences about a population.
Identify the assumptions that underpin confidence intervals.
Calculate 95% confidence intervals (CIs).
Comprehend the meaning behind the 95% confidence interval.
Binomial Distribution Example Questions
If you flip a coin 10 times, what’s the probability of observing exactly 7 heads?
Given that, on average, 5% of patients undergoing an operation get an infection, what’s the chance that 10 out of the next 30 patients will get an infection?
If 40% of voters are Democrats, what is the chance that a random sample of 600 voters will comprise 45% Democrats?
Cumulative Binomial Distribution
Question: If you flip a coin 10 times, what’s the probability of observing 7 or more heads?
Scenario: If 5% of patients get infections, what’s the chance that 10 or more of the next 30 patients will contract infections?
Query: If 40% of voters are GOP, what is the chance that a random sample of 600 people will be at least 45% GOP?
Probability vs Cumulative Density Function
Data from the U.S. National Lightning Detection Network shows the number of lightning strikes for the year 2022, peaking in late May:
Daily lightning counts peaked at 3.3 million strikes on May 21.
Cumulative lightning counts were presented in millions throughout the months from January to December 2022.
Binomial Distribution Characteristics
Binomial probability relies on a known probability for the population to predict the likelihood of observing various outcomes in random samples.
In scientific data, results are often represented as a single proportion.
Example I: Calculating Successes and Success Proportions
It’s crucial to determine how many successes occurred and the proportions thereof.
Consider if calculations will help make broader conclusions about a phenomenon.
Example I Continued: Estimating Ranges
Questions:
How far away from those calculated success proportions can the actual population percentages likely be?
How certain can we be of these estimates?
Confidence intervals provide a range of values that are expected to contain the overall population percentage.
Confidence Interval (CI)
Definition: A confidence interval (CI) represents a range of percentages in which we are 95% confident the true population value lies.
Queries posed: How do we derive this interval from our data?
Steps for Calculating CIs
First step: Plotting the data to visualize observed successes versus missed attempts.
Observations must be clear and unambiguous to avoid uncertainty about what we observed.
CI Calculation Methods
CF Clopper and Pearson method (1934).
Standard Wald method.
Modified Wald method.
Example II: Polling Voters
Scenario: 100 randomly selected voters are polled before an election, with 33 indicating a vote for a specific candidate.
Queries arise on what this implies about the proportion of voters who will support the candidate in the larger population.
Sample Representativity and Sampling Error
First Issue: Ensuring that the sample is genuinely representative of the population.
Second Issue: Accounting for potential sampling error in our results.
Limitations in Estimating Proportions
There is no absolute certainty about the proportion of voters in the population.
The best course of action is to calculate a range of possible values for this proportion, known as the 95% confidence interval.
Assumptions of Confidence Intervals
Random representative sample: In polling, if the sample isn't randomly selected, this assumption is violated.
Independent observations: Each observation must be selected independently of one another.
Accurate data: Quality of data must be assured, as poor recording leads to unrelated results (garbage in, garbage out).
Understanding 95% CI Calculation
Confidence Interval Formula: CI = ar{p} ext{ ± } z imes SE Where:
ar{p} is the sample proportion.
SE = rac{ar{p}(1-ar{p})}{n}
z is the critical value drawn from the standard normal distribution.
Interpretation of Standard Errors
Reason behind multiplying the standard error by the z-value: To ascertain how many standard errors must be extended from the mean in order to obtain our desired confidence level (e.g., 1.96 for a 95% CI).
Insights from the Monty Hall Problem
The CI represents a range of acceptable values instead of a single expected value, showcasing assessment variability.
Illustrative example using probability bars calculated from multiple simulated experiments regarding confidence intervals.
Proportion Estimation Techniques
Explanation of 95% CI: When performing repeated experiments, 95% of confidence intervals will encompass the true population value.
True or False Question: "There is a 95% chance that the population value lies within this 95% CI" - understanding the semantics of confidence intervals is key.
Special Significance of 95%
Understanding what makes 95% a standard choice for confidence levels in statistical applications.
Best Practices for Calculating Binomial CIs
Calculate adjusted proportion:
p' = rac{s + 2}{n + 4}Compute Margin of Error (W):
W = 2 imes ext{sqrt} rac{p'(1-p')}{n+4}Establish 95% confidence interval: From ( p' - W ) to ( p' + W ).
Derivation of Equation Constants
The constants “2” and “4” within the adjusted proportion formula: p' = rac{s + 2}{n + 4} are explained in context.
Confidence Interval Comparisons: Wald vs. Agresti-Coull
Comparative analysis of confidence intervals for varying sample sizes (n) and proportions (p) within produced graphical representations.
Key Reminders
Distinctive note on the variable p' not being equivalent to the p-value, where p' signifies proportion.
Explanation of margin of error and its respective direction in CI spans, and how the CI’s length correlates with margin of error.
Practical Case Studies
Case Study 1: Plant Growth Under Different Light Conditions
Research on plant species growth under various light types:
75 had exposure to red light.
45 exhibited significant growth improvement.
Case Study 2: Antibiotic Resistance in Bacteria
Study conducted by microbiologists into bacterial populations:
120 colonies assessed.
80 found to exhibit antibiotic resistance to specific treatments.
Case Study 3: Frog Population Sex Ratio in a Polluted Pond
Investigation by ecologists examining frog populations in habitats affected by pollution:
Of 60 frogged caught in a pond, 25 were determined to be female.