EBP Week 11 Estimates, Hypothesis testing and P-values

Point Estimate

  • The term ‘estimate’ indicates that the data are obtained from a representative sample of the population.
  • The ‘point estimate’ provides an estimate about what might be observed in the whole population from which the sample is derived.

Mean Difference

  • Perhaps the most basic and commonly reported measure of treatment effect size is the mean difference.
  • This is the absolute difference between two sets of values.
  • Example: A trial involving cream for heel pain showed participants took three more steps before experiencing pain after one week.
  • Question: Is ‘three extra steps’ a small, moderate, or large change?

Standardised Mean Difference (SMD)

  • The SMD (also referred to as the effect size or Cohen’s d) expresses the absolute change relative to the standard deviation.
  • Calculation: Absolute difference (mean difference between the experimental condition and the control) divided by the standard deviation (either the pooled standard deviation or the standard deviation of the baseline scores).
  • The formula for SMD is: SMD=Mean DifferenceStandard DeviationSMD = \frac{\text{Mean Difference}}{\text{Standard Deviation}}
  • An SMD of:
    • 0.20 or less represents a small change
    • 0.50 represents a moderate change
    • 0.80 represents a large change
  • In the example above, an improvement of three extra steps is considered a small to moderate effect.
  • Advantage: Easier to compare effect size estimates across replicated experiments using similar methods.
  • Limitations: Skewed data, appreciably different standard deviations between treatment conditions.

Odds Ratio (OR) and Relative Risk (RR)

  • Two popular point estimates of treatment effect used to compare risk in two different groups of people.
  • In health research, groups of people (e.g. smokers) are compared to other groups (i.e. non- smokers), to see whether belonging to a group increases or decreases a person’s risk of developing certain diseases (e.g. lung cancer).
  • OR and RR are usually interpreted as being equivalent.
  • Both range from zero to infinity.
  • Values > 1.0 indicate increased risk.
  • Values < 1.0 indicate reduced risk.
  • Values = 1.0 indicate risk is no better than chance (‘no effect’).
Odds Ratio
  • The OR is a way of representing probability.
  • The ‘odds of an event’ is the number of cases who experience the event of interest, divided by the number of those who do not experience the event of interest.
  • Expressed as a number from zero (event will never happen) to infinity (event is certain to happen).
  • Example: A trial investigating the effect of zinc (vitamin) supplementation on the incidence of the common cold.
    • A 2x2 table shows the number of individuals who developed a cold, and the number of individuals who did not develop a cold, including whether they belonged to the zinc or placebo group).
Relative Risk
  • The RR is the ratio of the incidence in people with the risk factor (exposed persons) to the incidence in people without the risk factor (nonexposed persons).

  • Not only used to estimate treatment effects (e.g. using some type of therapy), but also to estimate the risk for developing a disease in the presence of a particular characteristic.

  • Example: Using zinc for the common cold.

    • Risk (Zinc) = 9/20=0.459/20 = 0.45
    • Risk (Placebo) = 18/20=0.918/20 = 0.9
    • Relative Risk = 0.45/0.9=0.50.45 / 0.9 = 0.5
  • A person taking zinc is half as likely to develop a cold as someone taking a placebo.

  • The value of RR (0.5) is ‘higher’ than that of the OR (0.3), meaning the OR has overestimated the size of the effect.

  • There is a ‘push’ for researchers to use RR rather than OR for reporting treatment effects.

Confidence Intervals (CI)

  • When reviewing evidence, you need to make decisions about the precision of the point estimate of treatment effect.
  • If an experiment is repeated, the point estimate may be smaller or larger than the original study.
  • The CI is the key to interpreting treatment effects and is paramount when deciding whether the treatment effect makes a treatment worth implementing in clinical practice.
  • The CI is a range, either side of the point estimate that tells you how much the point estimate may vary in the population.
  • Sometimes described as a margin of error.
  • Confidence ‘limits’ are simply the extreme ends of the CI – the highest and lowest values of the interval.
  • To calculate the CI, you need to know three things: the sample size, the standard deviation and the ‘level of confidence’.
  • Reported CIs are 95%, 98% and 99%, where, for example, a 95% CI indicates that there is a 95% probability that the point estimate is contained within the 95% CI.
Confidence Intervals for Mean Difference and Standardised Mean Difference
  • Example: Caffeine trial on exam results.
    • Mean difference = 15 points.
    • Standard deviation = 10, sample size = 50 students, 95% CI = ± 2.77.
    • Lower limit = 15 - 2.77 = 12.23.
    • Upper limit = 15 + 2.77 = 17.77.
    • 95% CI = 12.23–17.77.
    • Interpretation: We can be 95% certain that the true performance (i.e. the point estimate of the treatment effect) in the population is somewhere between 12.23 and 17.77.
  • SMD = 1.5 (large effect size), 95% CI = 1.19–1.81.
Effect of Sample Size on CI Width
  • Reducing sample size increases the width of the CI.
  • Example: Reducing sample size from 50 to 10 students increases the 95% CI from ± 2.77 to ± 7.15.
  • The 95% CI becomes much wider. We are now less certain about the precision of the point estimate of treatment effect (15 points).
Key Concepts from Forest Plot
  • The CI becomes wider as the sample size decreases; that is, the lower the sample size, the less certainty about the true size of the point estimate.
  • The 95% CI actually crosses over the vertical line; that is, the lower limit of the CI includes the value zero (0). This indicates that there is a 95% chance that the true estimate of effect is located somewhere between favoring the decaffeinated placebo and favoring the caffeinated supplement.
Confidence Intervals for Risk Ratios
  • Risk ratios with a value of 1.0 indicate a point of no effect (i.e. no change in risk).
  • Example: Zinc study, the odds ratio that someone with a cold was taking zinc compared to the placebo was 0.30. The 95% CI for this point estimate is 0.18–0.53.
  • There is a 95% chance that the true odds ratio in the population lies somewhere between 0.18 and 0.53.

Summary of Confidence Intervals

  • The CI is a range, either side of the point estimate, that tells you how much the point estimate may vary in the population.
  • To calculate the CI, you need to know the sample size, the standard deviation, and the level of confidence.
  • A 95% CI of ‘5–10’ indicates a 95% probability that the true treatment effect is between 5 and 10.
  • Smaller sample size and larger variance lead to wider CIs.
  • If the lower limit of the CI includes 0 (for mean difference and SMD), the findings are not statistically significant.
  • If the CI includes 1 (for risk ratios), the findings are not statistically significant.

Null Hypothesis Significance Testing

  • A hypothesis is an idea or explanation for something that is based on known facts but has not yet been proved.
  • To prove or disprove the explanation, it is necessary to undertake an experiment using a scientific method (i.e. a research experiment).
Statistical Hypotheses
  • Null hypothesis: Results were obtained from the same sample and there is no real difference between groups or observations.
  • Alternative hypothesis: Sample observations are influenced by some non-random cause, and there is a real difference between the groups due to some systematic cause.
  • The alternative hypothesis is therefore the counterpart of the null hypothesis.
  • ‘Null hypothesis significance testing’ means that statistical testing using p-values is used to decide whether to reject or retain the null hypothesis.
  • The outcome is dichotomous – there is either a difference or there is not.
What Does the P-Value Indicate?
  • Definition: The p-value first assumes that the null hypothesis is true and indicates the probability of obtaining the observed difference (or a larger difference).
  • The p-value is the probability that the observed result (or greater) occurred by chance alone.
  • Example: Group A performed 12 points higher than Group B, p = 0.01. There is a 1% chance the difference occurred by chance.
  • As the p-value approaches a value of 1, this adds support to the null hypothesis. However, as the p-value approaches zero, this adds support to the alternative hypothesis.
  • A p-value of less than 0.05 provides support for the alternative hypothesis, and means that we reject the null hypothesis in favor of the alternative hypothesis.
Type I Errors
  • Occur when a true null hypothesis is rejected (i.e. concluding there is a ‘difference’ when no difference exists).
  • The groups in the trial will be different due to chance alone (i.e. a random finding rather than a systematic finding).
Type II Errors
  • Involve failing to reject the null hypothesis when it is false.
  • Occur when there is a small sample size, which leads to wide CIs that cross the point of ‘no effect’.
  • There is in fact a difference, but the sample size is insufficient to power the study.
Power and Sample Size Estimations
  • Power and sample size estimations are measures of how many participants are needed in a study.
  • Most quantitative studies should use a sample size calculation to work out how many participants are required to adequately ‘power’ the study.
  • If we do not include enough participants, we are at risk of a type II error, which essentially means that we might miss a statistically significant difference.
  • To answer this, you need to know the following:
    • The size of the effect (i.e. mean difference and SMD) that is important or meaningful – sometimes this is just estimated, because the desired effect is unknown
    • How certain we want to be of avoiding a type I error (i.e. critical level of significance, α)
    • The precision and variance of measurements within any sample
  • The power of a study is the probability that the study will detect a predetermined difference in measurement between the two groups if such a difference truly exists.
Interpreting P-Values from a Research Article
  • You will need to be able to distinguish statistical significance (particularly using p values) from practical or medical relevance and clinically worthwhile effects.
  • You must pay attention to the size of the difference and the CI, because these are what matter for successful practice.
  • Power and sample size calculations reduce the risk of a study being ‘underpowered’ because it has too few participants.