Random Error: An Introduction

Why: Understanding random error in epidemiology is crucial for interpreting study results accurately. Random error affects the reliability of results, leading to possible misinterpretations if unaccounted for. It conveys the idea that biological systems exhibit variability, and no single study can definitively capture the "exact truth."

What: The main components covered in this lecture include the definitions of exact truth, biological variability, confidence intervals, and p-values. Confidence intervals provide a range of values that contain the true effect in the population, while p-values indicate the statistical significance of observed results. Both concepts are interconnected, providing an overarching understanding of how random error influences data interpretation.

How: The session uses practical examples to elucidate how random error manifests within epidemiological findings, emphasizing the use of statistical measures such as odds ratios (OR), relative risk (RR), and risk difference (RD). By discussing various scenarios—including significant, non-significant findings and study power—it illustrates the implications of these errors in real-world research. Participants will engage in discussions to identify how the theoretical concepts apply to their experience, fostering a richer learning environment.

This is the first presentation on random error, followed by presentations on p-values and confidence intervals, and then study power.

The second formula of the one picture, two formulas, and three acronyms relates to confidence intervals.

Exact Truth

In epidemiology, "exact truth" refers to the precise value of a measure like odds ratio (OR), relative risk (RR), risk difference, or number needed to treat (NNT). This true value is conceptualized as the center of a bull's eye, which is difficult to pinpoint in biological science due to:

Biology's inherent variability and randomness
Studies conducted on samples rather than entire populations

Biological Variability

Biology is a moving, living, and random target. Even under identical conditions, results vary due to inherent biological randomness.

Studying Samples

Due to practical constraints such as limited time and funding, studies are performed on samples, which are inherently different due to random variation. This leads to various study results known as random sampling error, which will be randomly spread around the true but unknown result that would be obtained if the entire population was studied.

Confidence Intervals

When reading a research paper, results are based on one of many possible studies conducted in a population. A confidence interval estimates the range of results likely to contain the true result across the entire population. Each epidemiological measure (OR, RR, risk difference, NNT) has random sampling error, typically denoted by a 95% confidence interval.

Example

For instance, a relative risk of 0.31 with a 95% confidence interval of 0.11 to 0.83.

Definitions of 95% Confidence Interval

Exact Definition: In 100 identical studies using samples from the same population, 95 out of 100 of the 95% confidence intervals will include the true value. Consequently, there will be 5 out of 100 studies whose intervals do not include the true value.
Acceptable Definition: Based on one study, there is about a 95% chance that the true value in the population lies within the 95% confidence interval, assuming minimal non-random error.

Understanding Confidence Interval

The 95% confidence intervals from a study signify approximately a 95% chance that the true value in the underlying population from which participants were sampled lies within that interval, provided non-random error is minimal. The formula is given by: $95 ext{ extbf{C.I.}} = ext{mean} ext{ } ext{ extbf{(mean +/- 1.96 × standard error)}}$
[The standard error measures data spread.]

Components of a Confidence Interval Graph

Point Estimate: The square in the middle of the horizontal line (e.g., the 0.31 relative risk).
Lower Confidence Limit: One end of the horizontal line.
Upper Confidence Limit: The other end of the horizontal line.
95% Confidence Interval: The entire interval between upper and lower limits.
No Effect Line: A vertical line indicating no effect; for risk difference, it is zero; for relative risk, it is one.

Interpreting Confidence Intervals

Example: Heart attacks per 100 people in 5 years.

EGO (Exposed Group Outcome) = 9 (Confidence Interval: 8 to 10).
SEGO (Unexposed Group Outcome) = 6 (Confidence Interval: 5 to 7).
Risk Difference = 3 (Confidence Interval: 2 to 4).

Interpretation of EGO:

There is a 95% chance that the heart attack rate per 100 people in 5 years for the population in the exposure group is between 8 and 10.

Key Observation:

The confidence interval for EGO (8 to 10) does not overlap with the SEGO (5 to 7). Even the lowest estimate for EGO (8) is higher than the highest estimate for SEGO (7), indicating that EGO is statistically significantly higher than SEGO.

Non-Significant Findings

Example:

Identical point estimates for EGO, SEGO, and risk difference (9, 6, and 3) but with wider confidence intervals:

EGO: 6 to 14.
SEGO: 4 to 10.
Risk Difference: -2 to +8.

In this situation, the bottom of EGO's confidence interval (6) overlaps with SEGO's (10), resulting in an inability to determine any difference, and thus it is classified as statistically non-significant.

Relative Risk Confidence Intervals

Relative risks lack units as they are derived from the ratio of EGO to SEGO. For example:

EGO = 9, SEGO = 6, so, Relative Risk = 9/6 = 1.5.

If both sides of the confidence interval are beyond 1 (the no-effect line), the result remains statistically significant. Conversely, if the confidence interval crosses 1, the finding is classified as non-statistically significant.

Symmetry of Confidence Intervals

Confidence intervals for risk differences are always symmetrical.
Confidence intervals for relative risks are not symmetrical due to their maximum potential being unbounded.

Summary

The objective is to ascertain what EGO and SEGO, and consequently the risk difference and relative risk, would appear to be in the total population based on the findings from one study. The true result in the population is most likely situated within the 95% confidence interval. It's essential to utilize online calculators for accurate confidence intervals.

P-Values and Confidence Intervals

This section elucidates the distinct roles and relationships between p-values and confidence intervals. Depending on the relation with the no-effect line, one can assert conclusions about statistical significance.

Conclusion of Statistical Significance

Relevant factors affecting a study's power are the sample size and the effect of participants. The larger the effect, the greater the power typically.

Meta-Analysis of Low-Power Studies

If faced with non-significantly powered studies, consider systematic reviews or meta-analyses which combine results to increase overall power, potentially revealing significant effects in cases where individual studies do not seem to indicate such.