Effect Sizes and Confidence Intervals

Previous lecture focused on statistical significance testing.
Current lecture shifts focus to effect sizes and confidence intervals.
Effect sizes and confidence intervals are becoming preferred metrics for understanding data.

Some experts argue against continuing use of statistical significance testing.
Confidence intervals can provide the same information about effect sizes as statistical significance testing.

Effect sizes quantify the strength or magnitude of an experimental effect.
Allows an understanding of how large or small the effect or relationship between two variables is.
Different from statistical significance testing, which answers the binary question of whether an effect exists.
Effect sizes clarify how impactful that effect actually is.

Three main types of effect sizes discussed:
- Standardized difference
- Correlation
- Proportions
It's important to recognize that many tests can be utilized to measure these effect sizes.

Standardized mean difference is commonly associated with experimental designs.
Difference between standardized mean difference and raw mean difference:
- Standardized mean difference helps interpret results without context on what change signifies.
- Example:
- Testing a drug meant to reduce hiccups from 1000 to 925 daily (a reduction of 75),
- Without context, the magnitude of 75 hiccups is unclear.
- Standardization provides context via standard deviation, indicating how significant this change really is.
Raw mean difference may suffice when the context offers meaningful understanding.

Standardized mean difference = context-sensitive.
Raw mean difference = specific context offers understanding (e.g., a reduction of 10 cigarettes daily in smoking intervention).

Two types of correlation:
- Pearson's r: Strength between two continuous variables.
- Point-biserial: Strength between one continuous variable and one dichotomous variable (two options).
Dichotomous variables define two clear categories (e.g., yes/no).

Commonly used in observational studies.
Note the unique scoring range:
- Score range: 0 to ∞ (no negative odds ratios).
- Odds ratio of 1 indicates no effect.
- Scores < 1 signify a protective factor; scores > 1 indicate greater risk of outcome.

Effect size measures have predefined ranges for interpretation:
- Standardized Mean Difference:
- Small effect = 0.2; Medium effect = 0.5; Large effect = 0.8.
Pearson's r (Correlations):
- Small = 0.1 to 0.3;
- Large > 0.5.
Odds Ratios:
- < 1 = protective factor; > 1 = risky.

Effect sizes communicate the magnitude of relationships or experimental effects.
Confidence intervals further clarify expected ranges of effect sizes.

Confidence interval (CI): a range of expected values for the effect sizes from different sample collections.
Allows qualitative assessment of confidence in findings based on different samples.
Common calculation levels: 95% and 99%.
CIs built around effect sizes are based on sample characteristics.

CIs connect to significance testing via alpha levels:
- Alpha = 0.05 relates to a 95% CI.
- Alpha = 0.01 relates to a 99% CI.
The calculated CI is always associated with the effect size calculated in the study.

Report CIs using lower and upper limits.
Example, for an odds ratio calculation of 7.5: 95% CI may present as [5.32, 10.45].
Interpret CIs to evaluate if they contain no effect values (0 for mean differences, 1 for odds ratios).

If confidence intervals contain 0 or 1:
- Confidence is weak in asserting an effect occurred.
Provides more information than simple statistical significance:
- Indicates if effect size is likely positive, negative, protective, or risky.

Smaller confidence intervals indicate greater precision in findings.
Larger intervals suggest less precision, indicating potential variability in effect sizes across samples.

Example (Standardized Mean Difference):
- Cohen's d with a 95% CI of [0.1, 0.9]: An effect occurred, as 0 is not included. Precision is relatively large (spanning from positive ranges).
Example (Odds Ratios):
- Odds ratio calculated at 2 with a 99% CI spanning [0.7, 3.5]: Indicates an inconclusive effect since it contains the value 1. Thus, statistical significance is not assured. The potential impact remains uncertain (protective vs. risky).