Hypothesis Testing Overview

  • Overview of key concepts related to hypothesis testing: Statistical Power, Statistical Confidence, Statistical Errors, Effect Sizes, and the One Sample Z Test

Statistical Significance

  • Definition: When something is "statistically significant," it implies that a difference, relationship, association, or prediction is backed by substantial evidence, leading us to believe it is genuine (whether by a treatment effect or random sampling error).

  • Evaluation of Statistical Significance:

    • p-values are the metrics used to assess whether a statistical result is significant.

    • Key Points to Remember:

    • Low probability events are found in the tails of the population or sampling distribution of sample means.

    • High probability events cluster around the mean of the population or sampling distribution, making them more common.

    • Confidence Intervals:

    • Utilized to ascertain the probable range of actual population parameters based on tested sample statistics.

    • Effect Sizes/Explained Variability:

    • Used to signify the practical significance or usefulness of findings.

    • Important distinction: Statistical significance does not always imply practical significance, but practical significance cannot exist without statistical significance.

Visual Representation of Sample Mean Confidence Score

  • Example: Sample mean confidence score depicted on a scale, illustrating that statistically significant sample means are expected to fall within certain ranges of the sampling distribution.

Four Steps of Hypothesis Testing

Step 1: State Hypotheses

  • Null and Alternative Hypotheses:

    • Null Hypothesis (H₀): The starting assumption, indicating no effect or difference.

    • Alternative Hypothesis (H₁): Indicates what is presumed true if the null hypothesis is rejected.

Step 2: Set Criteria

  • Alpha Level (α):

    • Defined as the threshold for rejecting the null.

    • Representing the risk of incorrectly rejecting a true null hypothesis (Type 1 Error).

    • If p-value < α, reject the null hypothesis.

    • If p-value > α, retain the null hypothesis.

  • Critical Value:

    • Determined by both the alpha level and sample size.

  • Tails of the Statistical Test:

    • One-tailed test: Interest only in whether the mean increases or decreases.

    • Two-tailed test: Interest in any change, alternative hypothesizing can go either way.

Step 3: Collect Data and Calculate Statistics

  • Importance of accurate and descriptive statistics generation.

    • Usage of software such as SPSS or Excel may assist in generating p-values.

Step 4: Make a Decision

  • Deciding on hypotheses based on calculated p-values and comparisons to alpha levels.

Errors in Hypothesis Testing

Statistical Confidence and Type 1 Error

  • Definition: The probability of correctly rejecting a false null hypothesis.

  • Type 1 Error (α): Occurs when a true null hypothesis is incorrectly rejected.

    • The standard α level is commonly set to 0.05, indicating a 5% chance of this error occurring.

Causes of Type 1 Error:

  • Random chance or sampling error leading to exaggerated findings.

  • Poor research designs, such as non-random sampling or other biases.

  • Numerical Effects:

    • Large differences between sample mean and population mean increase error probability due to increased numerator.

    • Smaller standard errors may derive from smaller variability or larger sample sizes, thus increasing chances of significant z-statistics.

Statistical Power and Type 2 Error

Definition of Statistical Power

  • The probability of correctly rejecting a false null hypothesis, detecting real differences, relationships, or associations.

Type 2 Error (β)

  • Occurs when a false null hypothesis is retained.

    • α typically set at 0.20 indicating a 20% chance of the error occurring.

Conditions Affecting Type 2 Error:

  • Small sample sizes or high variability in samples can lead to increased Type 2 error likelihood.

  • Expected statistical power (
    = 1 - β) is calculated and assessed before and after studies.

  • Target power level is often set at 0.80.

Effect Sizes

Cohen’s d and R-squared (R²) as Measures of Effect Size

  • Cohen’s d:

    • Formula: (d = \frac{M{treatment} - M{no treatment}}{SD} ) where the mean difference is scaled against standard deviation to evaluate treatment importance.

    • Interpretation: E.g., Music resulted in an increased honey production by 0.8 standard deviations above the population mean.

  • R-squared (R²):

    • Represents the variability in the dependent variable explained by the independent variable.

    • Example: A Pearson's R correlation coefficient of R=0.6 yields R²=0.36, interpreted as “Good sleep accounts for 36% variability in scores on a statistics exam.”.

Calculating and Interpreting Cohen’s d

  • Formula: (d = \frac{treatment - no treatment}{SD} )

  • Cohen’s d interpretations:

    • d = 0.2 represents small effect size.

    • d = 0.5 represents medium effect size.

    • d = 0.8 represents large effect size.

One Sample Z Test

Conditions for Use

  • Population mean (μ) and population standard deviation (σ) must be known.

  • A random sample size (n) of at least 30 is required.

Critical Z Value

  • Critical z-values (\pm 1.96) correlates to an α level of 0.05, corresponding to 5% Type 1 error risk.

One Sample Z Test Formulae

  • Standard error of the mean: (SE = \frac{σ}{\sqrt{n}})

  • Z-Test Statistic: (Z = \frac{M - μ}{SE})

One Sample Z Test Example

Scenario Description

  • Researching whether music affects productivity in chair making.

  • Known parameters: Population mean = 80, Population standard deviation = 9.35, Sample size (n) = 30.

Hypothesis Formulation

  • Null Hypothesis (H₀): Music = 80 chairs/day.

  • Alternative Hypothesis (H₁): Music ≠ 80 chairs/day.

    • This is a two-tailed test.

Step 2: Decision Criteria

  • Alpha levels can be set at conventional thresholds (α = 0.05; α = 0.01; α = 0.001).

  • Critical regions detailed, with a focus on z-scores for distinction.

Steps 3 and 4: Data Collection and Conclusion

Data Collection Steps

  • Calculate mean sample, standard error, and z-statistic using the established formulas.

Decision Making

  • For example, if the calculated z is not in the critical region, retain null: "We failed to reject the null hypothesis (Z = 0.819, p ≥ 0.05)".