Biostatistics Notes

Role of Biostatistics in Pharmacy

  • Statistics involves the collection, analysis, and interpretation of data.
  • It's a crucial tool in pharmacological research for:
    • Summarizing experimental data using descriptive statistics.
    • Conducting hypothesis testing.
  • It helps determine if one drug's pharmacological effect is superior to another, impacting drug development.
  • Statistics aids in formulating experimental designs and drawing appropriate inferences from data, reducing the risk of misleading claims and increasing confidence in translating proof-of-concept studies to clinical applications.

Key Statistical Terms

  • Population: All objects of a similar type in the universe, represented by "N".
  • Sample: A fraction of the population that represents the population of interest, represented by "n". Samples are used to make generalizations about the population.
  • **Sampling Types:
    • Simple Random Sampling: Individuals are chosen randomly.
    • Stratified Random Sampling: Population is divided into mutually exclusive groups (strata) based on a specific factor (e.g., race, gender).
    • Cluster Sampling: The population is divided into clusters, and random samples are taken from each cluster.
    • Convenience Sampling: Participants are selected based on the researcher's convenience.

Measurement Scales

  • Nominal/Categorical Scale: Classifies observations into named categories (e.g., HIV status: positive/negative).
  • Ordinal Scale: Categories with a rank order (e.g., cancer stage: I, II, III, IV).
  • Interval Scale: True numerical values on a number line (e.g., temperature in Celsius).
  • Ratio Scale: Has all properties of an interval scale, with the ability to calculate ratios due to an absolute zero point (e.g., height, weight).

Variables and Data

  • Variable: A characteristic being observed or measured.
  • Data: Measured values assigned to a variable.
    • Example: Age of a person (age is the variable, 32 is the data).
    • Example: HIV status of a person (HIV status is the variable, +/- is the data).
  • Types of Variables:
    • Dependent Variable (DV): The outcome variable.
    • Independent Variable (IV): The variable being manipulated.
      • Example: Effect of physical activity on serum cholesterol.
    • Confounding Variable: Affects the DV beyond the effect of the IV, but is not of specific research interest.
      • Example: Diet is a confounding variable when studying the effect of physical activity on serum cholesterol.

Continuous and Discrete Variables

  • Continuous Variables: Data measured on interval or ratio scales (e.g., age, weight).
  • Discrete Variables: Data with distinct, separate values, typically integers or whole numbers (e.g., number of students in a class).

Accuracy and Precision

  • Accuracy: How closely a computed/measured value agrees with the true value.
  • Precision: How closely individual computed/measured values agree with each other.
  • Inaccuracy: Bias.
  • Imprecision: Uncertainty.

Statistical Significance

  • Statistical significance indicates whether a result is likely due to chance or a factor of interest.
  • It provides evidence regarding the plausibility of the null hypothesis.
  • The null hypothesis states that there is nothing more than random chance at work in the data.

Hypothesis Testing

  • Hypothesis testing is also called significance testing.
  • It tests claims about parameters.
  • Steps:
    1. State the problem.
    2. Null and alternative hypotheses.
    3. Test statistic.
    4. P-value and interpretation.
    5. Significance level (optional).

P-Value

  • The P-value is the probability of finding the observed, or more extreme, results when the null hypothesis is true.
  • The null hypothesis is usually a hypothesis of "no difference."
  • Statistical significance is often defined as P < 0.05, and highly significant as P < 0.001.

Type I and Type II Errors

  • α = probability of a Type I error.
  • β = Probability of a Type II error.
  • Type I Error: Erroneous rejection of a true null hypothesis (false positive).
  • Type II Error: Erroneous retention of a false null hypothesis (false negative).
DecisionH0 TrueH0 False
Retain H0Correct retentionType II error
Reject H0Type I errorCorrect rejection
  • Alpha (α) represents the upper bound of the Type I error rate, not the exact rate.
  • Beta (β) is the probability of making a Type II error.
  • Power is (1 - β), the probability of observing an effect in the sample if one exists in the population.
  • Many studies set alpha at 0.05 and beta at 0.20 (a power of 0.80).

Statistical vs. Clinical Significance

  • Statistically significant differences may not always translate into clinically significant differences, and vice versa.
  • Example 1: A drug reduces blood pressure from 180/100 to 145/90. This is statistically significant but may not be clinically significant.
  • Example 2: A compound in grape juice inhibits cytochrome P450 enzymes, increasing bioavailability of certain medications by 5%. This may not be clinically significant if narrow therapeutic index medications are excluded.

Types of Statistics

  • Descriptive Statistics: Present, organize, and summarize data numerically, graphically, or in tables. Includes measures of central tendency and dispersion.
  • Inferential Statistics: Indicate whether a difference exists between groups or an association between variables. Used to determine if the difference or association is real or due to random chance (e.g., t-test, ANOVA).

Measures of Central Tendency and Dispersion

  • Measures of Central Tendency:
    • Mean
    • Median
    • Mode
  • Measures of Dispersion (Variability):
    • Range
    • Percentiles
    • Interquartile range
    • Variance
    • Standard deviation
    • Standard error of the mean

Mean

  • The average numerical value for data within a variable.
  • Calculated by summing all data for a variable and dividing by the total number of data points (n).
  • Sensitive to extreme values, which can be misleading.

Median

  • The value that divides a distribution into two equal parts.
  • The middle value when observations are arranged in order of magnitude.
  • If the number of observations is odd, it's the middle value.
  • If the number of observations is even, it's the average of the two middle values.

Mode

  • The most frequently occurring value in a data set.
  • A data set can have no mode, one mode, or multiple modes.
  • Can be used for qualitative and quantitative data.

Range

  • The difference between the maximum and minimum values in a data set.
  • The simplest measure of dispersion.

Percentiles

  • Divide the distribution into 100 equal parts, each representing 1% of the data set.
  • The median is the 50th percentile.
  • Percentile = (number \ of \ values \ below \ score) ÷ (total \ number \ of \ scores) \times 100

Interquartile Range (IQR)

  • The difference between the 25th (Q1) and 75th (Q3) percentiles.
  • Outliers < Q1 - 1.5(Q3 - Q1) \ or > Q3 + 1.5(Q3 - Q1)

Variance

  • A measure of dispersion that accounts for the spread of all data points in a data set.

Standard Deviation

  • A measure of the average distance of individual data points from the mean.
  • Applies to normally distributed quantitative data.
  • Directly proportional to variability; larger variability means larger standard deviation (SD).

Standard Error of the Mean (SEM)

  • Measures the accuracy of a sample mean by measuring the sample-to-sample variability of the sample means.
  • Describes how precise the sample mean is as an estimate of the true population mean.
  • SEM = standard \ deviation / \sqrt{sample \ size}

Measures of Shape

  • The distribution of the dependent variable is important for choosing the appropriate statistical test.
  • Most parametric statistical tests require a normal distribution.

Graphical Representations

  • Useful for visually inspecting the distribution of variables, especially with large sample sizes.
  • Common types: Histograms, boxplots, and scatterplots.
  • Histograms show the range and frequency of data.

Boxplot

  • Displays the distribution of data based on a 5-point summary: minimum, Q1, median, Q3, maximum.
  • IQR = Q3 – Q1.
  • Fences are calculated to identify outliers: FU = Q3 + 1.5(IQR), FL = Q1 – 1.5(IQR)

Commonly Used Inferential Statistics

  • Student's t-test: Compares the means of two groups to determine statistical significance.
  • Analysis of Variance (ANOVA): Compares the means of two or more groups for statistical significance.
  • Correlation: Examines the association between two or more variables.
  • Regression: Examines how one or more independent variables predict a dependent variable.

Pharmacoepidemiology

  • The study of the use and effects of drugs in large numbers of people.
  • Applies epidemiology principles to study the effects of medications in human populations.

Pharmacoepidemiology Research Questions

  • Patterns of Use:
    • What are the patterns of drug utilization?
    • How are drugs used in specific patient populations?
    • How long do people take a drug?
  • Safety:
    • What is the frequency of drug-induced outcomes?
    • Are there drug–drug interactions?
    • Are there drug–disease interactions?
  • Effectiveness:
    • What are the clinical benefits of a drug?
    • Is a drug effective when used in the “real world”?
    • Is drug A more effective than drug B?
  • Economic Evaluations:
    • What are the economic consequences of therapy?

Study Designs Used in Pharmacoepidemiology

  • Identification/Exploration of Associations: Case studies, case series, prevalence studies, cross-sectional studies.
  • Determination of Causal Relationships: Randomized controlled trials, quasi-experiments.

Sensitivity, Specificity, and Predictive Values

  • Measure the ability of a test to correctly identify those experiencing an event and those who did not.
  • True-positives (TP): Have the disease and test positive.
  • False-positives (FP): Do not have the disease but test positive.
  • True-negatives (TN): Do not have the disease and test negative.
  • False-negatives (FN): Have the disease but test negative.

Formulas

  • Sensitivity = [a/(a+c)] \times 100
  • Specificity = [d/(b+d)] \times 100
  • Positive \ Predictive \ Value (PPV) = [a/(a+b)] \times 100
  • Negative \ Predictive \ Value (NPV) = [d/(c+d)] \times 100

Risk Ratio (RR)

  • The ratio of the risk of developing an event in exposed individuals to that in unexposed individuals.
  • RR = Risk \ in \ Exposed / Risk \ in \ Unexposed
  • RR of 1 indicates equal risk in both groups.
  • RR > 1 means exposure is associated with the event.
  • RR < 1 suggests that the exposure has protective effects.

Odds Ratio (OR)

  • The ratio of the probability of an event to that of the nonevent.
  • OR = (events \ in \ cases / nonevents \ in \ cases) / (events \ in \ control / nonevents \ in \ control)
  • OR of 1 is a baseline for comparison.
  • OR > 1 indicates higher odds for the event in the exposed group.
  • OR < 1 suggests an event is less likely in the exposed group.

Measuring Therapeutic Effects

  • Relative Risk Reduction (RRR)
  • Absolute Risk Reduction (ARR)
  • Number Needed to Treat (NNT)
  • Number Needed to Harm (NNH)

Relative Risk Reduction (RRR)

  • Measures the extent to which an exposure (therapy) reduces a risk.
  • RRR = (Risk \ in \ Untreated - Risk \ in \ Treated) / Risk \ in \ Untreated

Absolute Risk Reduction (ARR)

  • The absolute difference in event rates between the exposed and unexposed groups.
  • ARR = |Event \ Rate \ in \ Treated - Event \ Rate \ in \ Untreated|

Number Needed to Treat (NNT)

  • The number of individuals who need to be treated for one to benefit.
  • NNT = 1 / ARR

Number Needed to Harm (NNH)

  • The number of persons who need to be treated for one to experience an adverse event.
  • NNH = 1 / (Incidence \ Rate \ in \ Treatment - Incidence \ Rate \ in \ Control)