Hypothesis Testing Overview
Overview of key concepts related to hypothesis testing: Statistical Power, Statistical Confidence, Statistical Errors, Effect Sizes, and the One Sample Z Test
Statistical Significance
Definition: When something is "statistically significant," it implies that a difference, relationship, association, or prediction is backed by substantial evidence, leading us to believe it is genuine (whether by a treatment effect or random sampling error).
Evaluation of Statistical Significance:
p-values are the metrics used to assess whether a statistical result is significant.
Key Points to Remember:
Low probability events are found in the tails of the population or sampling distribution of sample means.
High probability events cluster around the mean of the population or sampling distribution, making them more common.
Confidence Intervals:
Utilized to ascertain the probable range of actual population parameters based on tested sample statistics.
Effect Sizes/Explained Variability:
Used to signify the practical significance or usefulness of findings.
Important distinction: Statistical significance does not always imply practical significance, but practical significance cannot exist without statistical significance.
Visual Representation of Sample Mean Confidence Score
Example: Sample mean confidence score depicted on a scale, illustrating that statistically significant sample means are expected to fall within certain ranges of the sampling distribution.
Four Steps of Hypothesis Testing
Step 1: State Hypotheses
Null and Alternative Hypotheses:
Null Hypothesis (H₀): The starting assumption, indicating no effect or difference.
Alternative Hypothesis (H₁): Indicates what is presumed true if the null hypothesis is rejected.
Step 2: Set Criteria
Alpha Level (α):
Defined as the threshold for rejecting the null.
Representing the risk of incorrectly rejecting a true null hypothesis (Type 1 Error).
If p-value < α, reject the null hypothesis.
If p-value > α, retain the null hypothesis.
Critical Value:
Determined by both the alpha level and sample size.
Tails of the Statistical Test:
One-tailed test: Interest only in whether the mean increases or decreases.
Two-tailed test: Interest in any change, alternative hypothesizing can go either way.
Step 3: Collect Data and Calculate Statistics
Importance of accurate and descriptive statistics generation.
Usage of software such as SPSS or Excel may assist in generating p-values.
Step 4: Make a Decision
Deciding on hypotheses based on calculated p-values and comparisons to alpha levels.
Errors in Hypothesis Testing
Statistical Confidence and Type 1 Error
Definition: The probability of correctly rejecting a false null hypothesis.
Type 1 Error (α): Occurs when a true null hypothesis is incorrectly rejected.
The standard α level is commonly set to 0.05, indicating a 5% chance of this error occurring.
Causes of Type 1 Error:
Random chance or sampling error leading to exaggerated findings.
Poor research designs, such as non-random sampling or other biases.
Numerical Effects:
Large differences between sample mean and population mean increase error probability due to increased numerator.
Smaller standard errors may derive from smaller variability or larger sample sizes, thus increasing chances of significant z-statistics.
Statistical Power and Type 2 Error
Definition of Statistical Power
The probability of correctly rejecting a false null hypothesis, detecting real differences, relationships, or associations.
Type 2 Error (β)
Occurs when a false null hypothesis is retained.
α typically set at 0.20 indicating a 20% chance of the error occurring.
Conditions Affecting Type 2 Error:
Small sample sizes or high variability in samples can lead to increased Type 2 error likelihood.
Expected statistical power (
= 1 - β) is calculated and assessed before and after studies.Target power level is often set at 0.80.
Effect Sizes
Cohen’s d and R-squared (R²) as Measures of Effect Size
Cohen’s d:
Formula: (d = \frac{M{treatment} - M{no treatment}}{SD} ) where the mean difference is scaled against standard deviation to evaluate treatment importance.
Interpretation: E.g., Music resulted in an increased honey production by 0.8 standard deviations above the population mean.
R-squared (R²):
Represents the variability in the dependent variable explained by the independent variable.
Example: A Pearson's R correlation coefficient of R=0.6 yields R²=0.36, interpreted as “Good sleep accounts for 36% variability in scores on a statistics exam.”.
Calculating and Interpreting Cohen’s d
Formula: (d = \frac{treatment - no treatment}{SD} )
Cohen’s d interpretations:
d = 0.2 represents small effect size.
d = 0.5 represents medium effect size.
d = 0.8 represents large effect size.
One Sample Z Test
Conditions for Use
Population mean (μ) and population standard deviation (σ) must be known.
A random sample size (n) of at least 30 is required.
Critical Z Value
Critical z-values (\pm 1.96) correlates to an α level of 0.05, corresponding to 5% Type 1 error risk.
One Sample Z Test Formulae
Standard error of the mean: (SE = \frac{σ}{\sqrt{n}})
Z-Test Statistic: (Z = \frac{M - μ}{SE})
One Sample Z Test Example
Scenario Description
Researching whether music affects productivity in chair making.
Known parameters: Population mean = 80, Population standard deviation = 9.35, Sample size (n) = 30.
Hypothesis Formulation
Null Hypothesis (H₀): Music = 80 chairs/day.
Alternative Hypothesis (H₁): Music ≠ 80 chairs/day.
This is a two-tailed test.
Step 2: Decision Criteria
Alpha levels can be set at conventional thresholds (α = 0.05; α = 0.01; α = 0.001).
Critical regions detailed, with a focus on z-scores for distinction.
Steps 3 and 4: Data Collection and Conclusion
Data Collection Steps
Calculate mean sample, standard error, and z-statistic using the established formulas.
Decision Making
For example, if the calculated z is not in the critical region, retain null: "We failed to reject the null hypothesis (Z = 0.819, p ≥ 0.05)".