Hypothesis Testing I: The One-Sample Case

LEARNING OBJECTIVES

Explain the logic of hypothesis testing.
Explain what it means to “reject the null hypothesis” or “fail to reject the null hypothesis.”
Identify and cite examples of situations in which one- sample tests of hypotheses are appropriate.
Test the significance of single-sample means and proportions using the five-step model, and correctly interpret the results.
Explain the difference between one- and two-tailed tests, and specify when each is appropriate.
Define and explain Type I and Type II errors, and relate each to the selection of an alpha level.
Use the Student’s t distribution to test the significance of a sample mean for a small sample.

CHAPTER OUTLINE

Using Statistics
An Overview of Hypothesis Testing
The Five-Step Model for Hypothesis Testing
Choosing a One-Tailed or Two-Tailed Test
Selecting an Alpha Level
The Student’s t Distribution
Tests of Hypotheses for Single-Sample Proportions (Large Samples)

USING STATISTICS

One-sample hypothesis tests are used to answer research questions, such as:
- Are residents of a particular neighborhood more likely to use a park than all residents of the city?
- Do student athletes consume less alcohol than all other college students?
- Does a program designed to reduce sexual risk taking among teenagers work?
- (Notice that these examples all involve comparing a sample statistic to a known population parameter)
Hypothesis testing (or significance testing) is the second branch of inferential statistics.
Hypothesis testing in the one-sample case is used when we have a probability sample that we want to compare to a population. We want to know if the group represented by the sample is different from the population on some characteristic.
A significant finding means that the difference between the sample and the population on a particular characteristic is very unlikely to be caused by random chance alone.
There is always a small amount of uncertainty in our conclusions. However, an advantage of inferential statistics is that we know the probability of being wrong and can judge results accordingly.

AN OVERVIEW OF HYPOTHESIS TESTING

Example:
- A sociologist is assessing the effectiveness of a rehabilitation program for alcoholics in her city. The program serves a large area, and she cannot test every single client. Instead, she draws a random sample of 127 people from the list of all clients and questions them on a variety of issues. She notices that, on average, the people in her sample miss fewer days of work each year than workers in the city as a whole. On the basis of this unexpected finding, she decides to conduct a test to see if the workers in her sample really are more reliable than workers in the community as a whole.
- Our research question is: Are people treated in this program more reliable workers than people in the community?
A sample (N=127) of the program’s clients is used because it is impractical to compare all clients of the program to all city residents.
Hypothesis testing will be used to compare the absentee rates of the sample of clients to a known average for all city residents.
The sample of program clients (mean=6.8 missed days/year) appears to be different from the city population (mean=7.2 missed days/year).
Is the mean for our sample of program clients different enough from the mean for the city population for us to conclude that the difference is not due to chance?

TESTING HYPOTHESES IN 5 STEPS

The Five-Step Model:
- Step 1: Make assumptions and meet test requirements.
- Step 2: State the null hypothesis.
- Step 3: Select the sampling distribution and establish the critical region.
- Step 4: Compute the test statistic.
- Step 5: Make a decision and interpret the results.

STEP 1 OF 5

Step 1: Make Assumptions and Meet Test Requirements
- All statistical tests must meet certain assumptions prior to their use. For hypothesis testing, these include:
  - A probability (EPSEM) sample
  - An interval-ratio level variable – so that calculation of a mean is appropriate
  - A normal sampling distribution – so that we can use the normal curve table to find areas and probabilities
  - We can be sure that this assumption is satisfied by using large samples (see the Central Limit Theorem in Chapter 6).

STEP 2 OF 5

Step 2: State the Null Hypothesis
- The null hypothesis is a statement of “no difference.”
- For one-sample hypothesis tests, the null hypothesis states that the sample comes from a population with a certain characteristic.
- The null hypothesis usually takes the form of $Ho: μ = some number$ .
- We also usually state the research hypothesis, which directly contradicts the null hypothesis, and represents the researcher’s expectations.
- The research hypothesis usually takes the form of $H1: μ ≠ some number$ .
Why might our sample of program clients differ from the population of city residents? Two options:
- The difference between the two means reflects a real difference between the two groups(of program clients and city residents).
  - That is, the difference is statistically significant.
- The difference between the two means was caused by random chance. The respective groups are not different.
  - This reason is represented by the null hypothesis (Ho).
  - In this example, the null would be written as $Ho: μ=7.2$ .
- Hypothesis tests assume that the null hypothesis (no difference) is true and that the sample results were due to chance.
Before we find the probability of our sample outcome, we must first establish a threshold of difference.
- That is, exactly how different must our sample mean (of program clients) be from what we hypothesize the population mean (of program clients) to be, for us to determine that the difference is real rather than an artifact of chance?
- Often, we use the threshold of 0.05. If there is only a 5% chance of getting a sample mean of 6.8, given that the population mean is really 7.2, then we can conclude that the null hypothesis is wrong.
Given the assumption of a true null hypothesis, we can estimate the probability of obtaining our sample result (with a mean of 6.8 absences).
- That is, if the population of program clients really has the same mean as the city population (7.2), what is the probability that we could have selected a sample of clients from the population of all clients that has a mean of 6.8?
- Using the sampling distribution, the normal curve, and the Central Limit Theorem, we can determine this probability.

STEP 3 OF 5

Step 3: Select the Sampling Distribution and Establish the Critical Region
- The sampling distribution depends on the type of hypothesis test being conducted. For one-sample hypothesis tests it will often be the Z distribution (Appendix A).
- The critical region (or rejection region) refers to the shaded area of unlikely sample outcomes in the sampling distribution (given a true null hypothesis).
- The alpha level (which represents the total area of the critical region) determines the critical test statistic, $Z(critical)$ .
- For example, when alpha is 0.05, the $Z(critical)$ is ±1.96. Other common alpha levels are 0.01 and 0.001.

STEP 4 OF 5

Step 4: Compute the Test Statistic
- Convert the sample outcome to a standardized Z score.
- This statistic is called the $Z(obtained)$ .
- For a one-sample hypothesis test of a mean (large sample) use:
- In symbolic form, our threshold looks like this:
- Here we have divided the 0.05 area in half across the two tails. The Z score that corresponds to this area is ±1.96.
- If our sample outcome falls in the shaded area (the critical region or rejection region) then the null hypothesis can be rejected. That is, the difference is likely real and not due to chance.
To locate our sample mean in the sampling distribution we must standardize it into a Z score.
- Previously, we converted single raw values into Z scores by subtracting the mean, and dividing by the sample standard deviation.
- To standardize a sample mean into a Z score for a sampling distribution, rather than an empirical distribution, we must compare it to the population mean and divide by the standard error (which is the standard deviation of the sampling distribution).
Using this formula, the Z score for our sample mean of 6.8 is -3.15. Our sample mean falls 3.15 standard errors below our hypothesized population mean of 7.2.

STEP 5 OF 5

Step 5: Make a Decision and Interpret the Results
- Compare the $Z(obtained)$ to the $Z(critical)$
- If the $Z(obtained)$ falls in the critical region, then reject the null hypothesis.
  - The results suggest a significant difference; the sample results reflect a true difference.
- If the $Z(obtained)$ does not fall in the critical region, then fail to reject the null hypothesis.
  - The results suggest an insignificant difference; the sample results are due to chance.
Now we can conclude the hypothesis test from our example. We just locate our sample mean (converted to a z statistic)on the sampling distribution.
Because our sample mean falls in the shaded region, we can conclude that it is highly unlikely that, given a mean of 7.2 for the population of program clients, we could have selected a sample that had a mean of 6.8.
- Thus, we reject the null hypothesis and conclude that program clients are significantly different from city residents on absenteeism.
- Of course, there is a 5% chance that we could be wrong. This would be called a Type I error.

CHOOSING A ONE-TAILED OR TWO-TAILED TEST

In a two-tailed hypothesis test, the researcher hypothesizes that the population mean is “equal to” or “not equal to” some value.
- In a two-tailed test, the critical region is evenly split across both tails of the sampling distribution.
In a one-tailed hypothesis test, the researcher hypothesizes that the population mean is “greater than” or “less than” some value.
- In a one-tailed test, the entire critical region is put in one tail of the sampling distribution.
- A one-tailed test is described as a directional test.
The decision to use a one-tailed or two-tailed test is reflected in the research hypothesis.
- In a two-tailed test the research hypothesis looks like this: $H1: μ ≠ (some number)$
- In a one-tailed test the research hypothesis looks like this: H1: μ > (some number) or H1: μ < (some number)
Because the critical region differs across one-tailed and two-tailed tests, the $Z(critical)$ will also change.
Notice that it is more ‘difficult’ to reject the null with a two-tailed versus one-tailed critical value.
Typically, two-tailed tests are preferred because they are more conservative (they reduce the likelihood of making a Type 1 error).
Conceptually, choosing between the two depends on the research question. If the researcher is only interested in one direction of difference, then a one- tailed test may be appropriate
- One-tailed example: SAT tutoring company is only interested in whether their services improve student scores over the national average
- Two-tailed example: Gym franchise wants to know if sample of regulars who take promotional supplement experience higher or lower performance compared to the population of members.

SELECTING AN ALPHA LEVEL

Setting an alpha level requires defining what an “unlikely” sample outcome is.
The most common alpha level is 0.05, but 0.10, 0.01, and 0.001 are also used.
There are two possible mistakes:
- Type I Errors: Rejecting a null hypothesis that is actually true.
- Type II Errors: Failing to reject a null hypothesis that is actually false.
The probability of committing a Type I Error is alpha. The lower we set alpha, the less the probability of committing a Type I Error.
However, as the probability of committing a Type I Error decreases, the probability of committing a Type II Error increases.
But as social scientists, we are usually more concerned with Type I Error than Type II Error.
An accepted balance between these two error probabilities is with an alpha of 0.05.

THE Z DISTRIBUTION & THE STUDENT’S T DISTRIBUTION

For large samples of 100 or more, we use the Z Distribution.
- In these cases, the sample standard deviation is a good estimate of the population standard deviation.
- However, when substituting the sample standard deviation for the population standard deviation, we use N-1 in the denominator instead of N (see the formula in Slide 21).
For smaller samples of less than 100, we use the Student’s t Distribution.
- The Student’s t distribution is flatter than the Z distribution.
- The Student’s t distribution varies with sample size. The larger the sample, the closer the t distribution is to the Z distribution.

THE STUDENT’S T DISTRIBUTION

The Student’s t Distribution is displayed differently than the Z table.
For the t distribution, we need to calculate degrees of freedom (df). For a one-sample hypothesis tests, df is equal to N-1.
The numbers in the body of the t table are not areas, but critical values of the test statistic, or $t(critical)$ .
When conducting a one-sample hypothesis test with a small sample (N<100), Step 3 of the hypothesis test will change.
- Instead of using the Z table in Appendix A to find $Z(critical)$ , you will use the t table in Appendix B to find $t(critical)$ .
- In Step 5 you will compare the computed test statistic to the value from the t table.
See Appendix B for the full Student’s t Distribution.
Practice computing the critical t value when the sample size is 20, and the df (N-1) is equal to 19.
- For a One-tailed test, using the .05 significance level, the Critical t would be 1.729.
- For a Two-tailed test, using the .05 significance level, the Critical t would be 2.093.

TESTS OF HYPOTHESES FOR SINGLE-SAMPLE PROPORTIONS (LARGE SAMPLES)

One-sample hypothesis tests of proportions use the same five steps as tests of means, with a couple of changes:
- In Step 2 your null and research hypotheses will reference the population proportion (Pu).
- In Step 3 you will always use the Z distribution (Appendix A) to look up the $Z(critical)$ .
- In Step 4 you will calculate the $Z(obtained)$ as:
Example: A random sample of 122 households in a low-income neighborhood revealed that 53 (or a proportion of 0.43) of the households were headed by women. In the city as a whole, the proportion of female- headed households is 0.39. Are households in the lower-income neighborhoods significantly different from the city as a whole in terms of this characteristic?
Step 1: Make Assumptions and Meet Test Requirements
- Probability sample
- Nominal level of measurement
- Normal sampling distribution
Step 2: State the Null Hypothesis
- $Ho: Pu = 0.39$
- $H1: Pu ≠ 0.39$
Step 3: Select the Sampling Distribution and Establish the Critical Region
- Use the Z distribution
- Alpha = 0.10, two-tailed
- $Z(critical) = ±1.65$
Step 4: Compute the Test Statistic
Step 5: Make a Decision and Interpret the Results
- The $Z(obtained)$ does not fall in the critical region.
- Fail to reject the null hypothesis.
- We cannot conclude that there is a statistically significant difference between the low-income neighborhoods and the city as a whole in terms of female-headed households.

SUMMARY

All tests of a hypothesis involve finding the probability of the observed sample outcome, given that the null hypothesis is true.
The five-step model will be our framework for decision making throughout the hypothesis testing chapters. What we do during each step, however, will vary, depending on the specific test being conducted
If we can predict a direction for the difference in stating the research hypothesis, we use a one-tailed test is called. If no direction can be predicted, a two-tailed test is appropriate.
There are two kinds of errors in hypothesis testing. Type I, or alpha error, is rejecting a true null hypothesis; Type II, or beta error, is failing to reject a false null hypothesis. The probabilities of committing these two types of error are inversely related.
When testing sample means, the t distribution is used to find the critical region when the population standard deviation is unknown and sample size is small.
Sample proportions can also be tested for significance using the five-step model. Unlike tests using sample means, tests of sample proportions assume a nominal level of measurement, use different symbols to state the null hypothesis, and use a different formula to compute $Z(obtained)$ .

BASIC TERMS

Alpha level ( $\alpha$ )
Critical region (region of rejection)
Five-step model
Hypothesis testing
Null hypothesis ( $Ho$ )
One-tailed test
Research hypothesis ( $H1$ )
Significance testing
Student’s t distribution
$t(critical)$
$t(obtained)$
Test statistic
Two-tailed test
Type I error (alpha error)
Type II error (beta error)
$Z(critical)$
$Z(obtained)$