Untitled Notes

Hypothesis Testing Overview

Overview of key concepts related to hypothesis testing: Statistical Power, Statistical Confidence, Statistical Errors, Effect Sizes, and the One Sample Z Test

Statistical Significance

Definition: When something is "statistically significant," it implies that a difference, relationship, association, or prediction is backed by substantial evidence, leading us to believe it is genuine (whether by a treatment effect or random sampling error).
Evaluation of Statistical Significance:
- p-values are the metrics used to assess whether a statistical result is significant.
- Key Points to Remember:
- Low probability events are found in the tails of the population or sampling distribution of sample means.
- High probability events cluster around the mean of the population or sampling distribution, making them more common.
- Confidence Intervals:
- Utilized to ascertain the probable range of actual population parameters based on tested sample statistics.
- Effect Sizes/Explained Variability:
- Used to signify the practical significance or usefulness of findings.
- Important distinction: Statistical significance does not always imply practical significance, but practical significance cannot exist without statistical significance.

Visual Representation of Sample Mean Confidence Score

Example: Sample mean confidence score depicted on a scale, illustrating that statistically significant sample means are expected to fall within certain ranges of the sampling distribution.

Four Steps of Hypothesis Testing

Step 1: State Hypotheses

Null and Alternative Hypotheses:
- Null Hypothesis (H₀): The starting assumption, indicating no effect or difference.
- Alternative Hypothesis (H₁): Indicates what is presumed true if the null hypothesis is rejected.

Step 2: Set Criteria

Alpha Level (α):
- Defined as the threshold for rejecting the null.
- Representing the risk of incorrectly rejecting a true null hypothesis (Type 1 Error).
- If p-value < α, reject the null hypothesis.
- If p-value > α, retain the null hypothesis.
Critical Value:
- Determined by both the alpha level and sample size.
Tails of the Statistical Test:
- One-tailed test: Interest only in whether the mean increases or decreases.
- Two-tailed test: Interest in any change, alternative hypothesizing can go either way.

Step 3: Collect Data and Calculate Statistics

Importance of accurate and descriptive statistics generation.
- Usage of software such as SPSS or Excel may assist in generating p-values.

Step 4: Make a Decision

Deciding on hypotheses based on calculated p-values and comparisons to alpha levels.

Errors in Hypothesis Testing

Statistical Confidence and Type 1 Error

Definition: The probability of correctly rejecting a false null hypothesis.
Type 1 Error (α): Occurs when a true null hypothesis is incorrectly rejected.
- The standard α level is commonly set to 0.05, indicating a 5% chance of this error occurring.

Causes of Type 1 Error:

Random chance or sampling error leading to exaggerated findings.
Poor research designs, such as non-random sampling or other biases.
Numerical Effects:
- Large differences between sample mean and population mean increase error probability due to increased numerator.
- Smaller standard errors may derive from smaller variability or larger sample sizes, thus increasing chances of significant z-statistics.

Statistical Power and Type 2 Error

Definition of Statistical Power

The probability of correctly rejecting a false null hypothesis, detecting real differences, relationships, or associations.

Type 2 Error (β)

Occurs when a false null hypothesis is retained.
- α typically set at 0.20 indicating a 20% chance of the error occurring.

Conditions Affecting Type 2 Error:

Small sample sizes or high variability in samples can lead to increased Type 2 error likelihood.
Expected statistical power (
= 1 - β) is calculated and assessed before and after studies.
Target power level is often set at 0.80.

Effect Sizes

Cohen’s d and R-squared (R²) as Measures of Effect Size

Cohen’s d:
- Formula: (d = \frac{M{treatment} - M{no treatment}}{SD} ) where the mean difference is scaled against standard deviation to evaluate treatment importance.
- Interpretation: E.g., Music resulted in an increased honey production by 0.8 standard deviations above the population mean.
R-squared (R²):
- Represents the variability in the dependent variable explained by the independent variable.
- Example: A Pearson's R correlation coefficient of R=0.6 yields R²=0.36, interpreted as “Good sleep accounts for 36% variability in scores on a statistics exam.”.

Calculating and Interpreting Cohen’s d

Formula: (d = \frac{treatment - no treatment}{SD} )
Cohen’s d interpretations:
- d = 0.2 represents small effect size.
- d = 0.5 represents medium effect size.
- d = 0.8 represents large effect size.

One Sample Z Test

Conditions for Use

Population mean (μ) and population standard deviation (σ) must be known.
A random sample size (n) of at least 30 is required.

Critical Z Value

Critical z-values (\pm 1.96) correlates to an α level of 0.05, corresponding to 5% Type 1 error risk.

One Sample Z Test Formulae

Standard error of the mean: (SE = \frac{σ}{\sqrt{n}})
Z-Test Statistic: (Z = \frac{M - μ}{SE})

One Sample Z Test Example

Scenario Description

Researching whether music affects productivity in chair making.
Known parameters: Population mean = 80, Population standard deviation = 9.35, Sample size (n) = 30.

Hypothesis Formulation

Null Hypothesis (H₀): Music = 80 chairs/day.
Alternative Hypothesis (H₁): Music ≠ 80 chairs/day.
- This is a two-tailed test.

Step 2: Decision Criteria

Alpha levels can be set at conventional thresholds (α = 0.05; α = 0.01; α = 0.001).
Critical regions detailed, with a focus on z-scores for distinction.

Steps 3 and 4: Data Collection and Conclusion

Data Collection Steps

Calculate mean sample, standard error, and z-statistic using the established formulas.

Decision Making

For example, if the calculated z is not in the critical region, retain null: "We failed to reject the null hypothesis (Z = 0.819, p ≥ 0.05)".