Power
The power is the probability that we detect an effect which is large enough that it has practical value – the probability of correctly rejecting the null when the alternative is true (the portion of the alternative distribution which falls in the relevant rejection region of the null). We can compute this value for different sample sizes or different effect sizes.
The minimum effect size is the threshold assigned as the smallest value for which an effect is practical/significant. E.g., if we care about finding any effect on the independent variable that is x units or larger compared to normal, then x is the minimum effect size. If the minimum effect size would not be far enough from the mean under null such that we would typically reject null, then that indicates the power will be small.
Steps for calculating power:
Determine sampling distribution by shifting the null distribution by the value given as minimum effect size.
Determine rejection regions under the null
Find the fraction of the sampling distribution which falls in the rejection region.
Calculate z-score using the bound of the rejection region as the sample mean and the minimum effect size as the population mean.
The probability of observing a value more extreme than given by this z-score can be interpreted as a percentage to give the power.
**Note that we typically ignore the rejection region which is in the opposite direction of the hypothetical truth (minimum effect size) as there wouldn’t be any value in rejecting the null hypothesis in favour of the desired effect when there exists a real effect in the opposite direction (opposite from what we want).
If the sample is large enough, it is simpler to carry out this process by approximating our distribution to the normal. For small sample sizes, we must use the t-distribution.
The power is affected by the sample size:
Using a too small sample size limits our ability to find a real effect (so gives a small power) and can therefore be both costly (as money is being wasted if we cant find results) and ethically questionable (as treatment group was subjected to the trial for no apparent results).
Using a too large sample size such that we have a very large power (>90%) exposes an unnecessary number of participants to the treatment which is also ethically questionable, and is needlessly expensive.
We want to find the appropriate sample size such that the power is 80-90% (depending on context) to balance maintaining a high power value, not exposing too many participants and not wasting too much money.
If a trial is inexpensive and not ethically dangerous, it may be reasonable to run a larger experiment if there is value from having a more precise estimate of the effect (larger power).