Study Notes on Hypothesis Testing, Errors, and Power
HYPOTHESIS TESTING, ERRORS, AND POWER
Topic Overview
Instructor: Dr. Erin K. Freeman
HYPOTHESIS TESTING
Is My Coin Fair?
Scenario: You have two coins: one fair (equal heads/tails probability) and one biased towards heads. They appear identical in appearance, weight, and feel.
Task: You randomly select one coin and must decide to bet all your money on it being fair or not. Consider how to make this decision based on probability.
Probability Example for Coin Flips
Question: What’s the probability of flipping a coin 5 times and getting three heads and two tails?
Calculation of Total Possible Outcomes: Total possible outcomes = 2^N = 2^5 = 32
Determination of Specific Outcomes: How many ways can you get three heads (H) and two tails (T)? The outcome can be arranged as HHH, HTT in various combinations:
The number of ways = inom{5}{3} = 10 (ways to choose 3 heads in 5 flips)
Thus, the probability: rac{10}{32} = 0.3125 or 31.25% chance.
Consideration: What would you bet based on the probability?
Statistical Significance
Definition of Statistical Significance: Results are statistically significant if they are unlikely due to chance alone. Improbable results from a sample suggest an effect or difference exists besides random chance.
Revisiting Coin Example: Flipping a coin 100 times yielding 100 heads in a row is possible but highly improbable if the coin were fair. Therefore, we reject the notion of the coin being fair.
Determining Significance
Sampling Distribution: To assess significance, refer to a theoretical reference distribution showing probabilities stemming from chance alone.
Standardizing Mean: To evaluate significance further, convert means to z-scores and use a z-table for percentile ranks.
Example Calculation: If a calculated z-score of -1.47 provides a percentile rank falling between the 7th and 8th percentiles, there is a 7.08% chance of achieving that z-score if only chance were at play.
Criterion for Statistical Significance
Unlikely Results Definition: The benchmark for statistical significance is somewhat arbitrary; the professional standard is generally taken as ext{alpha} = 0.05 (5%).
If results occur 5% or less of the time due to chance alone, they are considered statistically significant.
Hypothesis Testing Definitions
Alternative Hypothesis (H1): The hypothesis stating that there is a true relationship or difference; results are not merely due to chance.
Null Hypothesis (H0): The default hypothesis; suggests there is no real relationship or difference, and results arise from random chance.
Significance Level (Alpha)
Definition: The probability threshold for determining statistical significance; typically set at ext{alpha} = 0.05 before research begins.
P-value Definition: Represents the probability of observing the sample results if the null hypothesis is true.
Decision Rules Based on P-values
Reject H0: If p-value <= alpha; results are statistically significant.
Retain H0: If p-value > alpha; results likely arose from chance.
Critical Value and Test Statistic
Critical Value: The threshold for the test statistic defined using the significance level prior to testing.
Directional vs Non-Directional Hypothesis
Non-Directional Hypothesis: Suggests a difference exists without specifying the direction; leads to two-tailed tests.
Example Null Hypothesis: No difference in spatial reasoning between male and female students.
Directional Hypothesis: Specifies which direction a difference exists (greater than or less than); employs one-tailed tests.
Example with Null: Females show greater spatial reasoning than males.
Testing Process Examples
Testing Sample Means with Known Populations:
Use standard z-tests for two-tailed or one-tailed tests depending on the hypothesis.
Example of Comparing Means: Performing a hypothesis test on statistics students at OU against the general population with a sample mean and standard deviation known.
Statistical Conclusion
Conclusion Type: Depending on results from critical value tests and p-value tests, conclusions are either to reject or retain the null hypothesis.
Practical Hypothesis Testing Scenarios: Includes determining the influence of various factors such as sample size and significance level on the power and results of hypothesis tests.
Errors in Hypothesis Testing
Type I Error (False Positive): Rejecting the null hypothesis when it is actually true.
Type II Error (False Negative): Failing to reject the null when it is false.
Power of a Test (1 - β): The likelihood of correctly rejecting a false null hypothesis.
Factors Influencing Power: Effect size, sample size, variability, significance level, and the type of hypothesis.
Effect Size
Definition: Quantifies the true difference between groups; categorized as Small (< 0.2), Medium (0.5), and Large (> 0.8).
Effect size helps interpret whether results have practical significance in relation to statistical significance.
Variance and Power
More variability in the data correlates to lower power. To increase power (assuming a true effect), increase sample size, thereby reducing variability.
Alpha Manipulation and Power
Increasing alpha can enhance power but may come at the cost of increased type I error risk.
Conclusion**
Confident intervals and hypothesis testing: The confidence interval must be interpreted concerning the population mean and considers implications for statistical decisions and practical implications.
This section encourages a deeper understanding of hypothesis testing principles, types, errors, and power implications on research results.
Next Up
Topic Twelve: Examination of hypothesis testing and confidence intervals applied to different statistical tests.