Biostatistics Notes

Role of Biostatistics in Pharmacy

Statistics involves the collection, analysis, and interpretation of data.
It's a crucial tool in pharmacological research for:
- Summarizing experimental data using descriptive statistics.
- Conducting hypothesis testing.
It helps determine if one drug's pharmacological effect is superior to another, impacting drug development.
Statistics aids in formulating experimental designs and drawing appropriate inferences from data, reducing the risk of misleading claims and increasing confidence in translating proof-of-concept studies to clinical applications.

Key Statistical Terms

Population: All objects of a similar type in the universe, represented by "N".
Sample: A fraction of the population that represents the population of interest, represented by "n". Samples are used to make generalizations about the population.
**Sampling Types:
- Simple Random Sampling: Individuals are chosen randomly.
- Stratified Random Sampling: Population is divided into mutually exclusive groups (strata) based on a specific factor (e.g., race, gender).
- Cluster Sampling: The population is divided into clusters, and random samples are taken from each cluster.
- Convenience Sampling: Participants are selected based on the researcher's convenience.

Measurement Scales

Nominal/Categorical Scale: Classifies observations into named categories (e.g., HIV status: positive/negative).
Ordinal Scale: Categories with a rank order (e.g., cancer stage: I, II, III, IV).
Interval Scale: True numerical values on a number line (e.g., temperature in Celsius).
Ratio Scale: Has all properties of an interval scale, with the ability to calculate ratios due to an absolute zero point (e.g., height, weight).

Variables and Data

Variable: A characteristic being observed or measured.
Data: Measured values assigned to a variable.
- Example: Age of a person (age is the variable, 32 is the data).
- Example: HIV status of a person (HIV status is the variable, +/- is the data).
Types of Variables:
- Dependent Variable (DV): The outcome variable.
- Independent Variable (IV): The variable being manipulated.
  - Example: Effect of physical activity on serum cholesterol.
- Confounding Variable: Affects the DV beyond the effect of the IV, but is not of specific research interest.
  - Example: Diet is a confounding variable when studying the effect of physical activity on serum cholesterol.

Continuous and Discrete Variables

Continuous Variables: Data measured on interval or ratio scales (e.g., age, weight).
Discrete Variables: Data with distinct, separate values, typically integers or whole numbers (e.g., number of students in a class).

Accuracy and Precision

Accuracy: How closely a computed/measured value agrees with the true value.
Precision: How closely individual computed/measured values agree with each other.
Inaccuracy: Bias.
Imprecision: Uncertainty.

Statistical Significance

Statistical significance indicates whether a result is likely due to chance or a factor of interest.
It provides evidence regarding the plausibility of the null hypothesis.
The null hypothesis states that there is nothing more than random chance at work in the data.

Hypothesis Testing

Hypothesis testing is also called significance testing.
It tests claims about parameters.
Steps:
1. State the problem.
2. Null and alternative hypotheses.
3. Test statistic.
4. P-value and interpretation.
5. Significance level (optional).

P-Value

The P-value is the probability of finding the observed, or more extreme, results when the null hypothesis is true.
The null hypothesis is usually a hypothesis of "no difference."
Statistical significance is often defined as P < 0.05, and highly significant as P < 0.001.

Type I and Type II Errors

$α$ = probability of a Type I error.
$β$ = Probability of a Type II error.
Type I Error: Erroneous rejection of a true null hypothesis (false positive).
Type II Error: Erroneous retention of a false null hypothesis (false negative).

Decision	H0 True	H0 False
Retain H0	Correct retention	Type II error
Reject H0	Type I error	Correct rejection

Alpha (α) represents the upper bound of the Type I error rate, not the exact rate.
Beta (β) is the probability of making a Type II error.
Power is (1 - β), the probability of observing an effect in the sample if one exists in the population.
Many studies set alpha at 0.05 and beta at 0.20 (a power of 0.80).

Statistical vs. Clinical Significance

Statistically significant differences may not always translate into clinically significant differences, and vice versa.
Example 1: A drug reduces blood pressure from 180/100 to 145/90. This is statistically significant but may not be clinically significant.
Example 2: A compound in grape juice inhibits cytochrome P450 enzymes, increasing bioavailability of certain medications by 5%. This may not be clinically significant if narrow therapeutic index medications are excluded.

Types of Statistics

Descriptive Statistics: Present, organize, and summarize data numerically, graphically, or in tables. Includes measures of central tendency and dispersion.
Inferential Statistics: Indicate whether a difference exists between groups or an association between variables. Used to determine if the difference or association is real or due to random chance (e.g., t-test, ANOVA).

Measures of Central Tendency and Dispersion

Measures of Central Tendency:
- Mean
- Median
- Mode
Measures of Dispersion (Variability):
- Range
- Percentiles
- Interquartile range
- Variance
- Standard deviation
- Standard error of the mean

Mean

The average numerical value for data within a variable.
Calculated by summing all data for a variable and dividing by the total number of data points ( $n$ ).
Sensitive to extreme values, which can be misleading.

Median

The value that divides a distribution into two equal parts.
The middle value when observations are arranged in order of magnitude.
If the number of observations is odd, it's the middle value.
If the number of observations is even, it's the average of the two middle values.

Mode

The most frequently occurring value in a data set.
A data set can have no mode, one mode, or multiple modes.
Can be used for qualitative and quantitative data.

Range

The difference between the maximum and minimum values in a data set.
The simplest measure of dispersion.

Percentiles

Divide the distribution into 100 equal parts, each representing 1% of the data set.
The median is the 50th percentile.
$Percentile = (number \ of \ values \ below \ score) ÷ (total \ number \ of \ scores) \times 100$

Interquartile Range (IQR)

The difference between the 25th (Q1) and 75th (Q3) percentiles.
$Outliers < Q1 - 1.5(Q3 - Q1) \ or > Q3 + 1.5(Q3 - Q1)$

Variance

A measure of dispersion that accounts for the spread of all data points in a data set.

Standard Deviation

A measure of the average distance of individual data points from the mean.
Applies to normally distributed quantitative data.
Directly proportional to variability; larger variability means larger standard deviation ( $SD$ ).

Standard Error of the Mean (SEM)

Measures the accuracy of a sample mean by measuring the sample-to-sample variability of the sample means.
Describes how precise the sample mean is as an estimate of the true population mean.
$SEM = standard \ deviation / \sqrt{sample \ size}$

Measures of Shape

The distribution of the dependent variable is important for choosing the appropriate statistical test.
Most parametric statistical tests require a normal distribution.

Graphical Representations

Useful for visually inspecting the distribution of variables, especially with large sample sizes.
Common types: Histograms, boxplots, and scatterplots.
Histograms show the range and frequency of data.

Boxplot

Displays the distribution of data based on a 5-point summary: minimum, Q1, median, Q3, maximum.
IQR = Q3 – Q1.
Fences are calculated to identify outliers: $FU = Q3 + 1.5(IQR)$ , $FL = Q1 – 1.5(IQR)$

Commonly Used Inferential Statistics

Student's t-test: Compares the means of two groups to determine statistical significance.
Analysis of Variance (ANOVA): Compares the means of two or more groups for statistical significance.
Correlation: Examines the association between two or more variables.
Regression: Examines how one or more independent variables predict a dependent variable.

Pharmacoepidemiology

The study of the use and effects of drugs in large numbers of people.
Applies epidemiology principles to study the effects of medications in human populations.

Pharmacoepidemiology Research Questions

Patterns of Use:
- What are the patterns of drug utilization?
- How are drugs used in specific patient populations?
- How long do people take a drug?
Safety:
- What is the frequency of drug-induced outcomes?
- Are there drug–drug interactions?
- Are there drug–disease interactions?
Effectiveness:
- What are the clinical benefits of a drug?
- Is a drug effective when used in the “real world”?
- Is drug A more effective than drug B?
Economic Evaluations:
- What are the economic consequences of therapy?

Study Designs Used in Pharmacoepidemiology

Identification/Exploration of Associations: Case studies, case series, prevalence studies, cross-sectional studies.
Determination of Causal Relationships: Randomized controlled trials, quasi-experiments.

Sensitivity, Specificity, and Predictive Values

Measure the ability of a test to correctly identify those experiencing an event and those who did not.
True-positives (TP): Have the disease and test positive.
False-positives (FP): Do not have the disease but test positive.
True-negatives (TN): Do not have the disease and test negative.
False-negatives (FN): Have the disease but test negative.

Formulas

$Sensitivity = [a/(a+c)] \times 100$
$Specificity = [d/(b+d)] \times 100$
$Positive \ Predictive \ Value (PPV) = [a/(a+b)] \times 100$
$Negative \ Predictive \ Value (NPV) = [d/(c+d)] \times 100$

Risk Ratio (RR)

The ratio of the risk of developing an event in exposed individuals to that in unexposed individuals.
$RR = Risk \ in \ Exposed / Risk \ in \ Unexposed$
RR of 1 indicates equal risk in both groups.
RR > 1 means exposure is associated with the event.
RR < 1 suggests that the exposure has protective effects.

Odds Ratio (OR)

The ratio of the probability of an event to that of the nonevent.
$OR = (events \ in \ cases / nonevents \ in \ cases) / (events \ in \ control / nonevents \ in \ control)$
OR of 1 is a baseline for comparison.
OR > 1 indicates higher odds for the event in the exposed group.
OR < 1 suggests an event is less likely in the exposed group.

Measuring Therapeutic Effects

Relative Risk Reduction (RRR)
Absolute Risk Reduction (ARR)
Number Needed to Treat (NNT)
Number Needed to Harm (NNH)

Relative Risk Reduction (RRR)

Measures the extent to which an exposure (therapy) reduces a risk.
$RRR = (Risk \ in \ Untreated - Risk \ in \ Treated) / Risk \ in \ Untreated$

Absolute Risk Reduction (ARR)

The absolute difference in event rates between the exposed and unexposed groups.
$ARR = |Event \ Rate \ in \ Treated - Event \ Rate \ in \ Untreated|$

Number Needed to Treat (NNT)

The number of individuals who need to be treated for one to benefit.
$NNT = 1 / ARR$

Number Needed to Harm (NNH)

The number of persons who need to be treated for one to experience an adverse event.
$NNH = 1 / (Incidence \ Rate \ in \ Treatment - Incidence \ Rate \ in \ Control)$