EBP Week 11 Estimates, Hypothesis testing and P-values

Point Estimate

The term ‘estimate’ indicates that the data are obtained from a representative sample of the population.
The ‘point estimate’ provides an estimate about what might be observed in the whole population from which the sample is derived.

Mean Difference

Perhaps the most basic and commonly reported measure of treatment effect size is the mean difference.
This is the absolute difference between two sets of values.
Example: A trial involving cream for heel pain showed participants took three more steps before experiencing pain after one week.
Question: Is ‘three extra steps’ a small, moderate, or large change?

Standardised Mean Difference (SMD)

The SMD (also referred to as the effect size or Cohen’s d) expresses the absolute change relative to the standard deviation.
Calculation: Absolute difference (mean difference between the experimental condition and the control) divided by the standard deviation (either the pooled standard deviation or the standard deviation of the baseline scores).
The formula for SMD is: $SMD = \frac{\text{Mean Difference}}{\text{Standard Deviation}}$
An SMD of:
- 0.20 or less represents a small change
- 0.50 represents a moderate change
- 0.80 represents a large change
In the example above, an improvement of three extra steps is considered a small to moderate effect.
Advantage: Easier to compare effect size estimates across replicated experiments using similar methods.
Limitations: Skewed data, appreciably different standard deviations between treatment conditions.

Odds Ratio (OR) and Relative Risk (RR)

Two popular point estimates of treatment effect used to compare risk in two different groups of people.
In health research, groups of people (e.g. smokers) are compared to other groups (i.e. non- smokers), to see whether belonging to a group increases or decreases a person’s risk of developing certain diseases (e.g. lung cancer).
OR and RR are usually interpreted as being equivalent.
Both range from zero to infinity.
Values > 1.0 indicate increased risk.
Values < 1.0 indicate reduced risk.
Values = 1.0 indicate risk is no better than chance (‘no effect’).

Odds Ratio

The OR is a way of representing probability.
The ‘odds of an event’ is the number of cases who experience the event of interest, divided by the number of those who do not experience the event of interest.
Expressed as a number from zero (event will never happen) to infinity (event is certain to happen).
Example: A trial investigating the effect of zinc (vitamin) supplementation on the incidence of the common cold.
- A 2x2 table shows the number of individuals who developed a cold, and the number of individuals who did not develop a cold, including whether they belonged to the zinc or placebo group).

Relative Risk

The RR is the ratio of the incidence in people with the risk factor (exposed persons) to the incidence in people without the risk factor (nonexposed persons).
Not only used to estimate treatment effects (e.g. using some type of therapy), but also to estimate the risk for developing a disease in the presence of a particular characteristic.
Example: Using zinc for the common cold.
- Risk (Zinc) = $9/20 = 0.45$
- Risk (Placebo) = $18/20 = 0.9$
- Relative Risk = $0.45 / 0.9 = 0.5$
A person taking zinc is half as likely to develop a cold as someone taking a placebo.
The value of RR (0.5) is ‘higher’ than that of the OR (0.3), meaning the OR has overestimated the size of the effect.
There is a ‘push’ for researchers to use RR rather than OR for reporting treatment effects.

Confidence Intervals (CI)

When reviewing evidence, you need to make decisions about the precision of the point estimate of treatment effect.
If an experiment is repeated, the point estimate may be smaller or larger than the original study.
The CI is the key to interpreting treatment effects and is paramount when deciding whether the treatment effect makes a treatment worth implementing in clinical practice.
The CI is a range, either side of the point estimate that tells you how much the point estimate may vary in the population.
Sometimes described as a margin of error.
Confidence ‘limits’ are simply the extreme ends of the CI – the highest and lowest values of the interval.
To calculate the CI, you need to know three things: the sample size, the standard deviation and the ‘level of confidence’.
Reported CIs are 95%, 98% and 99%, where, for example, a 95% CI indicates that there is a 95% probability that the point estimate is contained within the 95% CI.

Confidence Intervals for Mean Difference and Standardised Mean Difference

Example: Caffeine trial on exam results.
- Mean difference = 15 points.
- Standard deviation = 10, sample size = 50 students, 95% CI = ± 2.77.
- Lower limit = 15 - 2.77 = 12.23.
- Upper limit = 15 + 2.77 = 17.77.
- 95% CI = 12.23–17.77.
- Interpretation: We can be 95% certain that the true performance (i.e. the point estimate of the treatment effect) in the population is somewhere between 12.23 and 17.77.
SMD = 1.5 (large effect size), 95% CI = 1.19–1.81.

Effect of Sample Size on CI Width

Reducing sample size increases the width of the CI.
Example: Reducing sample size from 50 to 10 students increases the 95% CI from ± 2.77 to ± 7.15.
The 95% CI becomes much wider. We are now less certain about the precision of the point estimate of treatment effect (15 points).

Key Concepts from Forest Plot

The CI becomes wider as the sample size decreases; that is, the lower the sample size, the less certainty about the true size of the point estimate.
The 95% CI actually crosses over the vertical line; that is, the lower limit of the CI includes the value zero (0). This indicates that there is a 95% chance that the true estimate of effect is located somewhere between favoring the decaffeinated placebo and favoring the caffeinated supplement.

Confidence Intervals for Risk Ratios

Risk ratios with a value of 1.0 indicate a point of no effect (i.e. no change in risk).
Example: Zinc study, the odds ratio that someone with a cold was taking zinc compared to the placebo was 0.30. The 95% CI for this point estimate is 0.18–0.53.
There is a 95% chance that the true odds ratio in the population lies somewhere between 0.18 and 0.53.

Summary of Confidence Intervals

The CI is a range, either side of the point estimate, that tells you how much the point estimate may vary in the population.
To calculate the CI, you need to know the sample size, the standard deviation, and the level of confidence.
A 95% CI of ‘5–10’ indicates a 95% probability that the true treatment effect is between 5 and 10.
Smaller sample size and larger variance lead to wider CIs.
If the lower limit of the CI includes 0 (for mean difference and SMD), the findings are not statistically significant.
If the CI includes 1 (for risk ratios), the findings are not statistically significant.

Null Hypothesis Significance Testing

A hypothesis is an idea or explanation for something that is based on known facts but has not yet been proved.
To prove or disprove the explanation, it is necessary to undertake an experiment using a scientific method (i.e. a research experiment).

Statistical Hypotheses

Null hypothesis: Results were obtained from the same sample and there is no real difference between groups or observations.
Alternative hypothesis: Sample observations are influenced by some non-random cause, and there is a real difference between the groups due to some systematic cause.
The alternative hypothesis is therefore the counterpart of the null hypothesis.
‘Null hypothesis significance testing’ means that statistical testing using p-values is used to decide whether to reject or retain the null hypothesis.
The outcome is dichotomous – there is either a difference or there is not.

What Does the P-Value Indicate?

Definition: The p-value first assumes that the null hypothesis is true and indicates the probability of obtaining the observed difference (or a larger difference).
The p-value is the probability that the observed result (or greater) occurred by chance alone.
Example: Group A performed 12 points higher than Group B, p = 0.01. There is a 1% chance the difference occurred by chance.
As the p-value approaches a value of 1, this adds support to the null hypothesis. However, as the p-value approaches zero, this adds support to the alternative hypothesis.
A p-value of less than 0.05 provides support for the alternative hypothesis, and means that we reject the null hypothesis in favor of the alternative hypothesis.

Type I Errors

Occur when a true null hypothesis is rejected (i.e. concluding there is a ‘difference’ when no difference exists).
The groups in the trial will be different due to chance alone (i.e. a random finding rather than a systematic finding).

Type II Errors

Involve failing to reject the null hypothesis when it is false.
Occur when there is a small sample size, which leads to wide CIs that cross the point of ‘no effect’.
There is in fact a difference, but the sample size is insufficient to power the study.

Power and Sample Size Estimations

Power and sample size estimations are measures of how many participants are needed in a study.
Most quantitative studies should use a sample size calculation to work out how many participants are required to adequately ‘power’ the study.
If we do not include enough participants, we are at risk of a type II error, which essentially means that we might miss a statistically significant difference.
To answer this, you need to know the following:
- The size of the effect (i.e. mean difference and SMD) that is important or meaningful – sometimes this is just estimated, because the desired effect is unknown
- How certain we want to be of avoiding a type I error (i.e. critical level of significance, α)
- The precision and variance of measurements within any sample
The power of a study is the probability that the study will detect a predetermined difference in measurement between the two groups if such a difference truly exists.

Interpreting P-Values from a Research Article

You will need to be able to distinguish statistical significance (particularly using p values) from practical or medical relevance and clinically worthwhile effects.
You must pay attention to the size of the difference and the CI, because these are what matter for successful practice.
Power and sample size calculations reduce the risk of a study being ‘underpowered’ because it has too few participants.