Study Notes on Hypothesis Testing, Errors, and Power

Scenario: You have two coins: one fair (equal heads/tails probability) and one biased towards heads. They appear identical in appearance, weight, and feel.
Task: You randomly select one coin and must decide to bet all your money on it being fair or not. Consider how to make this decision based on probability.

Question: What’s the probability of flipping a coin 5 times and getting three heads and two tails?
- Calculation of Total Possible Outcomes: Total possible outcomes = $2^N = 2^5 = 32$
- Determination of Specific Outcomes: How many ways can you get three heads (H) and two tails (T)? The outcome can be arranged as HHH, HTT in various combinations:
- The number of ways = $\binom{5}{3} = 10$ (ways to choose 3 heads in 5 flips)
- Thus, the probability: $rac{10}{32} = 0.3125$ or 31.25% chance.
- Consideration: What would you bet based on the probability?

Definition of Statistical Significance: Results are statistically significant if they are unlikely due to chance alone. Improbable results from a sample suggest an effect or difference exists besides random chance.
Revisiting Coin Example: Flipping a coin 100 times yielding 100 heads in a row is possible but highly improbable if the coin were fair. Therefore, we reject the notion of the coin being fair.

Sampling Distribution: To assess significance, refer to a theoretical reference distribution showing probabilities stemming from chance alone.
Standardizing Mean: To evaluate significance further, convert means to z-scores and use a z-table for percentile ranks.
- Example Calculation: If a calculated z-score of -1.47 provides a percentile rank falling between the 7th and 8th percentiles, there is a 7.08% chance of achieving that z-score if only chance were at play.

Unlikely Results Definition: The benchmark for statistical significance is somewhat arbitrary; the professional standard is generally taken as $ext{alpha} = 0.05$ (5%).
If results occur 5% or less of the time due to chance alone, they are considered statistically significant.

Alternative Hypothesis (H1): The hypothesis stating that there is a true relationship or difference; results are not merely due to chance.
Null Hypothesis (H0): The default hypothesis; suggests there is no real relationship or difference, and results arise from random chance.

Definition: The probability threshold for determining statistical significance; typically set at $ext{alpha} = 0.05$ before research begins.
P-value Definition: Represents the probability of observing the sample results if the null hypothesis is true.

Critical Value: The threshold for the test statistic defined using the significance level prior to testing.

Non-Directional Hypothesis: Suggests a difference exists without specifying the direction; leads to two-tailed tests.
- Example Null Hypothesis: No difference in spatial reasoning between male and female students.
Directional Hypothesis: Specifies which direction a difference exists (greater than or less than); employs one-tailed tests.
- Example with Null: Females show greater spatial reasoning than males.

Testing Sample Means with Known Populations:
- Use standard z-tests for two-tailed or one-tailed tests depending on the hypothesis.
- Example of Comparing Means: Performing a hypothesis test on statistics students at OU against the general population with a sample mean and standard deviation known.

Conclusion Type: Depending on results from critical value tests and p-value tests, conclusions are either to reject or retain the null hypothesis.
Practical Hypothesis Testing Scenarios: Includes determining the influence of various factors such as sample size and significance level on the power and results of hypothesis tests.

Type I Error (False Positive): Rejecting the null hypothesis when it is actually true.
Type II Error (False Negative): Failing to reject the null when it is false.
Power of a Test (1 - β): The likelihood of correctly rejecting a false null hypothesis.
Factors Influencing Power: Effect size, sample size, variability, significance level, and the type of hypothesis.

Definition: Quantifies the true difference between groups; categorized as Small (< 0.2), Medium (0.5), and Large (> 0.8).
Effect size helps interpret whether results have practical significance in relation to statistical significance.

More variability in the data correlates to lower power. To increase power (assuming a true effect), increase sample size, thereby reducing variability.

Increasing alpha can enhance power but may come at the cost of increased type I error risk.

Confident intervals and hypothesis testing: The confidence interval must be interpreted concerning the population mean and considers implications for statistical decisions and practical implications.
This section encourages a deeper understanding of hypothesis testing principles, types, errors, and power implications on research results.

Topic Twelve: Examination of hypothesis testing and confidence intervals applied to different statistical tests.