Confidence Intervals and Hypothesis Testing Notes

Sampling Distributions

Sample statistics are used to estimate population parameters.
Sampling distributions work when the population parameter is known.
Similar methodology is used when the population parameter is unknown.
It's impossible to calculate the population parameter but an interval is created where there is some level of confidence that it contains the parameter.

Confidence Interval for a Proportion

Margin of Error

In early 2021, when asked, “Have you read a book in any format (print, ebook, or audiobook) in the past 12 months?”, 75% of 1502 US adults surveyed responded “yes”, with a margin or error of ± 3 percentage points.
Margin of Error Definition: Indicates the range within which the true population parameter is likely to fall.

Margin of Error (Cont.)

Polling agencies have agreed what to report as the “Margin of Error”.
If the poll is designed to estimate a population proportion, the “Margin of Error” is approximately two times: SD(\hat{p}) = \sqrt{\frac{pq}{n}}.

Standard Error

The formula for the standard deviation (
adviserof the sampling distribution of the sample proportion includes the population proportion p.
Since p is usually unknown, the sample statistic (\hat{p}) is used to estimate p in the formula.
Estimating the standard deviation of a sampling distribution is called a "standard error."
For a sample proportion, the standard error is SE(\hat{p}) = \sqrt{\frac{{\hat{p}} \times {\hat{q}}}{n}}.
SE(\hat{p}) can be used in place of SD(\hat{p}) as an estimate when describing the sampling distribution.

Analyzing the Reading Poll

75% responded they read a book in any format (print, ebook, or audiobook) in the past 12 months.
Is the proportion of all US adults who read a book in any format close to 75%?
An interval is built around 75% that we are “pretty confident” contains the true population proportion.

Confidence Interval

By the 68-95-99.7% Rule:
- About 68% of all possible samples will have (\hat{p})’s within 1 SE of p.
- About 95% of all possible samples will have (\hat{p})’s within 2 SEs of p.
- About 99.7% of all possible samples will have (\hat{p})’s within 3 SEs of p.
There’s a 95% chance that (\hat{p}) is no more than 2 SEs away from p.
Thus, we’re 95% confident that p is no more than 2 SEs away from (\hat{p}).
If we reach out 2 SEs, we are 95% sure that p will be in that interval. In other words, if we reach out 2 SEs in either direction of (\hat{p}), we can be 95% confident that this interval contains the true proportion p.
This is called a 95% confidence interval.

Assumptions and Conditions to Check, CI

Randomization Condition: The sample should be a simple random sample of the population. NOTE: This is also called the Independence Assumption.
10% Condition: The sample size, n, must be no larger than 10% of the population.
Success/Failure Condition: The sample size has to be big enough so that both n(\hat{p}) \geq 10 and n(1- \hat{p}) \geq 10 are true.
To be able to say the sampling distribution of proportions is approximately normal, the following three conditions must be met:

Critical Values

The ‘2’ in (\hat{p} \pm 2 \times SE(\hat{p})) (our 95% confidence interval) came from the 68-95-99.7% Rule.
Using technology, we find that a more exact value for our 95% confidence interval is 1.96 instead of 2.
1.96 is called the critical value and denote it z*.

Confidence Interval Formula

Confidence Interval formula for the true population proportion:\hat{p} \pm z^* \times SE(\hat{p}) where SE(\hat{p}) = \sqrt{\frac{{\hat{p}}{\hat{q}}}{n}}
The critical value z^* is the appropriate value from the Normal Distribution for the desired confidence level.
For a 90% confidence interval, the critical value z* is 1.645.

Interpreting Confidence Intervals: What Does 95% Confidence Really Mean?

Our confidence is in the process of constructing the interval, not in any one interval itself.
Thus, we expect 95% of all 95% confidence intervals to contain the true parameter that they are estimating.

How to Interpret a Confidence Interval

The interpretation of any confidence interval must contain 4 generic parts:
- Confidence level
- Parameter of interest
- Upper limit
- Lower limit
For example: We are confidence level % confident that the population parameter of interest is contained in the interval of lower limit and upper limit.

Margin of Error: Certainty vs. Precision

The extent of the interval on either side of (\hat{p}) is called the margin of error (ME).
ME = z^* \times SE(\hat{p})
In general, confidence intervals have the form estimate ± ME.
The more confident we want to be (given the same sample data), the larger our ME needs to be (making the interval wider).
Conversely, if we want to be more confident our interval contains the population parameter, we end up making the interval larger (or wider), that is less precise.
We need more values in our confidence interval to be more certain about the parameter.
Because of this, every confidence interval is a balance between certainty and precision.
The choice of confidence level is somewhat arbitrary.
The most commonly chosen confidence levels are 90%, 95%, and 99% (but any percentage can be used).

Confidence Interval for the Mean

Small Problem: What if we don’t know µ and σ?

So, would a 95% confidence interval for the population mean look like this? \bar{y} \pm 1.96 \frac{\sigma}{\sqrt{n}}
Unfortunately, we don’t know σ.
With proportions, we substitute s for σ, the sampling distribution of \bar{y} is no longer normally distributed, so we can’t use 1.96.

The Solution: William Gosset

William S. Gosset, an employee of the Guinness Brewery in Dublin, Ireland, worked long and hard to find out what the sampling distribution model was.
The sampling model that Gosset found has become known as Student’s t (or t-distribution).
The Student’s t-models form a whole family of related distributions that depend on a parameter known as degrees of freedom.
We often denote degrees of freedom as df, and the model as tdf.

Student’s t vs. the Normal Distribution

Student’s t-models are unimodal, symmetric, and bell shaped, just like the Normal.
But t-models with only a few degrees of freedom have much fatter tails than the Normal.

The Impact on Confidence Intervals for Means

Because the Student’s t distribution has heavier tails than the normal distribution, confidence intervals using Gosset’s t-model will be just a bit wider than if the Normal model applied.
A slightly wider interval is the price that is paid for not only estimating the mean of the population, but having to estimate the standard deviation of the population as well.

Assumptions and Conditions

Independence Assumption: The data values should be independent.
- Randomization Condition: The sample should be a simple random sample of the population.
- 10% Condition: The sample size, n, must be no larger than 10% of the population.
Normal Population Assumption: We can never be certain that the data are from a population that follows a Normal model, but we can check the following condition.
- Nearly Normal Condition: The data come from a distribution that is unimodal and symmetric and the sample size is appropriate.
  - Unimodal and Symmetric Check-this condition by making a histogram, Normal probability plot, or Goodness of Fit test for the raw data in the sample.
The smaller the sample size (n < 15 or so), the more closely the data should follow a Normal model.
For moderate sample sizes (n between 15 and 40 or so), the t works well as long as the data are unimodal and reasonably symmetric.
For sample sizes larger than 40 or 50, the t methods are safe to use unless the data are extremely skewed.

Confidence Interval for Means

The confidence interval is \bar{y} \pm t_{n-1}^* \times SE(\bar{y}) where the standard error of the mean is SE(\bar{y}) = \frac{s}{\sqrt{n}}
The critical value, t_{n-1}^*, depends on the particular confidence level, C, that you specify and on the number of degrees of freedom, n – 1, which we get from the sample size.

Statistical Hypotheses

The hypothesis of the researcher(police) gets called the alternative hypothesis.
The “opposite” or initial hypothesis is called the null hypothesis, which we denote by H0. In other words, we start by assuming we are wrong, then prove the null is false. Good for trials and good for science.
Notation→ H0: population parameter = hypothesized value.
The alternative hypothesis, which we denote by HA, contains the values of the parameter that we consider plausible when we reject the null hypothesis.

Zero In on the Null

To perform a hypothesis test, the null hypothesis must be a statement about the value of a population parameter.
We then use this hypothesized value to help us compute the probability that the observed sample statistic—or something even farther from the hypothesized value—will occur.

Alternative Alternatives

There are three possible alternative hypotheses:
- HA: parameter ≠ hypothesized value
- HA: parameter < hypothesized value
- HA: parameter > hypothesized value

P-Values

Once we have both hypotheses, we collect sample data.
We will compare our data to what we would expect given that H0 is true.
We calculate the probability (P-value) of getting the sample data we got (or results more unusual than that) if the null hypothesis were true.
To calculate this probability, we start by calculating how many standard deviations the sample statistic is from the proposed population parameter.
We can use our understanding of sampling distributions to calculate the probability of our sample statistic being that many (or more) standard deviations from the hypothesized population parameter.
Thus, the p-value is the probability of seeing data like these (or even more unlikely data) given the null hypothesis is true.
When the P-value is large, say, above 0.05, we are unable to reject the null hypothesis.
We can’t claim to have proved it; instead we say we “fail to reject the null hypothesis”.
When the P-value is small, say, 0.05 or less, we say we “reject the null hypothesis”, since what we observed would be very unlikely were the null hypothesis true.

Alternative Alternatives

HA: parameter ≠ value is known as a two-sided alternative because we are equally interested in deviations on either side of the null hypothesized value.
For two-sided alternatives, the P-value is the probability of deviating in either direction from the null hypothesis value.
The other two alternative hypotheses are called one-sided alternatives.
A one-sided alternative focuses on deviations from the null hypothesis value in only one direction.
Thus, the P-value for one-sided alternatives is the probability of deviating only in the direction of the alternative away from the null hypothesis value.

The Reasoning of Hypothesis Testing

There are four basic parts to a hypothesis test:
- Hypotheses
- Model
- Mechanics
- Conclusion
The null hypothesis: a statement about a parameter in a statistical model.
In general, we have H0: parameter = hypothesized value.
The alternative hypothesis: the value of the parameter we consider plausible if we reject the null.
In general, we have three options:
- HA: parameter ≠ hypothesized value
- HA: parameter > hypothesized value
- HA: parameter < hypothesized value
Depending on what type of population parameter you are testing, a sampling distribution model is used to compare the sample statistic to the corresponding hypothesized population parameter.
All models require assumptions, so state the assumptions and check any corresponding conditions.
The test about proportions is called a one-proportion z-test.
Randomization Condition: The sample should be a simple random sample of the population. NOTE: This is also called the Independence Assumption.
10% Condition: The sample size, n, must be no larger than 10% of the population.
Success/Failure Condition: The sample size has to be big enough so that both n( 𝑝0) ≥ 10 and n(1- 𝑝0) ≥ 10 are true.
To be able to use the normal model when calculating the test statistic, the following three conditions must be met:
When the conditions are met, the sampling distribution of the sample proportion follows the Normal model, so we can use that model to obtain a P-value.
H0: p = p0 using the “test statistic” z = \frac{\hat{p} - p0}{SE(\hat{p})} where SE(\hat{p}) = \sqrt{\frac{p0q_0}{n}}
Mechanics include the calculation of our test statistic from the data.
Different tests will have different formulas and different test statistics.
Mechanics also include the calculation of the P-value.
The P-value is the probability that the observed test statistic value (or an even more extreme value) could occur if the null hypothesis were true.
The conclusion always begins with a statement about the null hypothesis.
It must begin with a statement that we reject or that we fail to reject the null hypothesis.
The remainder of the conclusion should be in easy to understand language, and in the context of the specific situation.

One-sample t-Test for the mean

The assumptions and conditions for the one-sample t-test for the mean are the same as for the one-sample t-interval.
We test the hypothesis: H0: µ = µ0 using the “test statistic” t{n-1} = \frac{\bar{y} - \mu0}{SE(\bar{y})} where SE(\bar{y}) = \frac{s}{\sqrt{n}}
When the conditions are met and the null hypothesis is true, this statistic follows a Student’s t-model with n – 1 df. We use that model to obtain a P-value.
To be able to say the sampling distribution of sample means follows the t-distribution, which we check by satisfying the following three conditions:
- Randomization Condition: The sample should be a simple random sample of the population.
- 10% Condition: The sample size, n, must be no larger than 10% of the population.
- Nearly Normal Condition: The data come from a distribution that is unimodal and symmetric and the sample size is appropriate.
  - Unimodal and Symmetric
  - Sufficient Sample Size
This question leads to the hypothesis test H0: µ = 0.08 ppm vs. HA: µ > 0.08 ppm

Connection Between Confidence Intervals and Hypothesis Tests

Confidence intervals and hypothesis tests are built from the same calculations.
They have the same assumptions and conditions.
You can approximate a hypothesis test by examining a confidence interval.
Just ask whether the null hypothesized value is consistent with a confidence interval for the parameter at the corresponding confidence level.
Because confidence intervals are two-sided, they correspond to two-sided tests.
In general, a confidence interval with a confidence level of C% (for example, 95%) corresponds to a two-sided hypothesis test with an cutoff of 100 – C% (for example, 100- 95% = 5%).

Interpreting P-Values

A P-value is a conditional probability—the probability of the sample result (or a result more unusual than that) given that the null hypothesis is true.
Be careful to interpret the P-value correctly.

Alpha Levels

The threshold is called an alpha level or level of significance, denoted by α.
If our P-value falls below our threshold, we’ll reject the null hypothesis. We call such results statistically significant. P-value < α
Common alpha levels are 0.10, 0.05, 0.01, and 0.001.
You have the option—almost the obligation—to consider your alpha level carefully and choose an appropriate one for the situation.
When we reject the null hypothesis, we say that the test is “significant at that level.”
The P-value gives the reader far more information than just stating that you reject or fail to reject the null.
In fact, by providing a p-value to the reader, you allow that person to make his or her own decisions about the test.

Statistical vs. Practical Significance

What do we mean when we say that a test is “statistically significant”? p-value < α
If the test is statistical significant, does that imply the results have some practical importance?

Effect Size

The difference between the hypothesized value of the parameter and the actual value of the parameter is called the Effect Size.
Large samples – even a small, unimportant effect size can be statistically significant
Sample not large enough – even a large financially or scientifically important effect may not be statistically significant

Making Errors

When we perform a hypothesis test, we can make mistakes in two ways:
- The null hypothesis is true, but we mistakenly reject it. (Type I error)
- The null hypothesis is false, but we fail to reject it. (Type II error)
Which type of error is more serious is context dependent.
When H0 is true and we reject it, we have made a Type I error.
The probability of a Type I error is our α (alpha) level.
The researcher has complete control over the probability of a Type I error (they pick the α level but usually before data collection).
When H0 is false and we fail to reject it, we have made a Type II error.
We assign the letter β (beta) to the probability of this mistake.

What Impacts the Size of β?

“How false” is H0?
- The difference between the hypothesized value of the parameter and the actual value of the parameter is called the Effect Size.
The larger the effect size, the smaller β will be.
What is the α Level?: The more we’re willing to accept a Type I error, the less likely we will be to make a Type II error.
So, the larger the researcher is willing to set the value of α, the smaller β will be.
What is the Sample Size?: You are bound to make a better decision if you base it on more information.
So, the larger the sample size (n), the smaller β will be.
Once the null and alternative hypotheses are in place, the researcher has no control over the Effect Size.
The researcher can easily make α smaller, but that automatically makes β bigger.

Power

The power of a test is the probability that it correctly rejects a false null hypothesis.
The power of a test is 1 – β.
When we calculate power, we imagine that the null hypothesis is false.
The numerical value of the power depends on how far the truth lies from the null hypothesis value.
The distance between the null hypothesis value, p0, and the truth, p, is effect size.
Power depends directly on effect size (and α and n).

Confidence Interval for the Difference Between Two Means

A natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Once we have examined the side-by-side boxplots of sample data, we can turn to the comparison of the two population means.
The parameter of interest is the difference between the two population means, µ1 – µ2.
For independent random quantities, variances add (standard deviations are not additive).
So, the standard deviation of the difference between two sample means is
SE(\bar{y}1 - \bar{y}2) = \sqrt{\frac{\sigma1^2}{n1} + \frac{\sigma2^2}{n2}}
We still don’t know the true standard deviations of the two groups, so we need to use the standard error:
SE(\bar{y}1 - \bar{y}2) = \sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n2}}
The sampling distribution of the difference in sample means of two independent groups is a Student’s t.
The confidence interval we build is called a two-sample t-interval for the difference in means.
The corresponding hypothesis test is called a two-sample t-test.

Assumptions and Conditions

Independence Assumption (Each condition needs to be checked for both groups.):
- Randomization Condition: The sample should be a simple random sample of the population.
- 10% Condition: The sample size, n, must be no larger than 10% of the population.
Normal Population Assumption:
Nearly Normal Condition: This must be checked for both groups. A violation by either group violates the condition.
Independent Groups Assumption:
Independent Group Condition: The two groups we are comparing must be independent of each other.
When the conditions are met, we are ready to find the confidence interval for the difference between means of two independent groups.
The confidence interval is (\bar{y}1 - \bar{y}2) \pm t{dd}^* \times SE(\bar{y}1 - \bar{y}_2)
where the standard error of the difference of the means is
SE(\bar{y}1 - \bar{y}2) = \sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n2}}
The critical value t_{dd}^* depends on the particular confidence level, C, that you specify and on the number of degrees of freedom.

The Two-Sample t-Test: Testing for the Difference Between Two Means

We test the hypothesis H0: µ1 – µ2 = ∆0, where the hypothesized difference, ∆0, is almost always 0, using the statistic:
t = \frac{(\bar{y}1 - \bar{y}2) - \Delta0}{SE(\bar{y}1 - \bar{y}_2)}
As before, the standard error is
SE(\bar{y}1 - \bar{y}2) = \sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n2}}
The two-sample t-test has the same conditions as the two-sample t-interval.

Test of Independence

Contingency tables categorize counts on two categorical variables so that we can see whether the distribution of counts on one variable is contingent on the other.
A test of independence examines whether there is a significant association between a pair of categorical variables.
In a test of independence of two categorical variables, the “generic” hypotheses are:
H0: Row and column classifications are independent
HA: Row and column classifications are not independent (i.e., they are associated with each other)

Assumptions and Conditions

Counted Data Condition: Check that the data are counts for the categories of two categorical variables.
Independence Assumption: The counts in the cells should be independent of each other.
Randomization Condition: The sample should be a simple random sample of the population.
10% Condition: The sample size, n, must be no larger than 10% of the population.
Sample Size Assumption: We must have enough data for the methods to work.
Expected Cell Frequency Condition: We should “expect” to see at least 5 individuals in each cell.

What Is Meant by “Expected” Cell Count?

Given the row totals and column totals in the contingency table, what cell counts would we “expect” to see inside the table if the row and column classifications were independent of each other?
If rows and columns were really independent, you should be able to take:
- the probability of being in a particular row TIMES
- the probability of being in a particular column DIVIDED BY
- the total number of observations to get the cell count for that corresponding cell.

Calculating Expected Cell Counts

This represents the “expected” cell count. This is done for all cells in the contingency table.
Mechanically, it is simpler (although equivalent) to calculate, for each cell:
Expected Cell Count = \frac{row total \times column total}{total number of observations}

Calculations

How different are the actual (observed) cell counts from these “expected” cell counts?
It is natural to look at the differences between the observed and expected counts in each cell:(Obs − Exp)
These differences are actually residuals, so adding up all of these differences will result in a sum of 0.