Statistics involves the collection, analysis, and interpretation of data.
It's a crucial tool in pharmacological research for:
Summarizing experimental data using descriptive statistics.
Conducting hypothesis testing.
It helps determine if one drug's pharmacological effect is superior to another, impacting drug development.
Statistics aids in formulating experimental designs and drawing appropriate inferences from data, reducing the risk of misleading claims and increasing confidence in translating proof-of-concept studies to clinical applications.
Key Statistical Terms
Population: All objects of a similar type in the universe, represented by "N".
Sample: A fraction of the population that represents the population of interest, represented by "n". Samples are used to make generalizations about the population.
**Sampling Types:
Simple Random Sampling: Individuals are chosen randomly.
Stratified Random Sampling: Population is divided into mutually exclusive groups (strata) based on a specific factor (e.g., race, gender).
Cluster Sampling: The population is divided into clusters, and random samples are taken from each cluster.
Convenience Sampling: Participants are selected based on the researcher's convenience.
Measurement Scales
Nominal/Categorical Scale: Classifies observations into named categories (e.g., HIV status: positive/negative).
Ordinal Scale: Categories with a rank order (e.g., cancer stage: I, II, III, IV).
Interval Scale: True numerical values on a number line (e.g., temperature in Celsius).
Ratio Scale: Has all properties of an interval scale, with the ability to calculate ratios due to an absolute zero point (e.g., height, weight).
Variables and Data
Variable: A characteristic being observed or measured.
Data: Measured values assigned to a variable.
Example: Age of a person (age is the variable, 32 is the data).
Example: HIV status of a person (HIV status is the variable, +/- is the data).
Types of Variables:
Dependent Variable (DV): The outcome variable.
Independent Variable (IV): The variable being manipulated.
Example: Effect of physical activity on serum cholesterol.
Confounding Variable: Affects the DV beyond the effect of the IV, but is not of specific research interest.
Example: Diet is a confounding variable when studying the effect of physical activity on serum cholesterol.
Continuous and Discrete Variables
Continuous Variables: Data measured on interval or ratio scales (e.g., age, weight).
Discrete Variables: Data with distinct, separate values, typically integers or whole numbers (e.g., number of students in a class).
Accuracy and Precision
Accuracy: How closely a computed/measured value agrees with the true value.
Precision: How closely individual computed/measured values agree with each other.
Inaccuracy: Bias.
Imprecision: Uncertainty.
Statistical Significance
Statistical significance indicates whether a result is likely due to chance or a factor of interest.
It provides evidence regarding the plausibility of the null hypothesis.
The null hypothesis states that there is nothing more than random chance at work in the data.
Hypothesis Testing
Hypothesis testing is also called significance testing.
It tests claims about parameters.
Steps:
State the problem.
Null and alternative hypotheses.
Test statistic.
P-value and interpretation.
Significance level (optional).
P-Value
The P-value is the probability of finding the observed, or more extreme, results when the null hypothesis is true.
The null hypothesis is usually a hypothesis of "no difference."
Statistical significance is often defined as P < 0.05, and highly significant as P < 0.001.
Type I and Type II Errors
α = probability of a Type I error.
β = Probability of a Type II error.
Type I Error: Erroneous rejection of a true null hypothesis (false positive).
Type II Error: Erroneous retention of a false null hypothesis (false negative).
Decision
H0 True
H0 False
Retain H0
Correct retention
Type II error
Reject H0
Type I error
Correct rejection
Alpha (α) represents the upper bound of the Type I error rate, not the exact rate.
Beta (β) is the probability of making a Type II error.
Power is (1 - β), the probability of observing an effect in the sample if one exists in the population.
Many studies set alpha at 0.05 and beta at 0.20 (a power of 0.80).
Statistical vs. Clinical Significance
Statistically significant differences may not always translate into clinically significant differences, and vice versa.
Example 1: A drug reduces blood pressure from 180/100 to 145/90. This is statistically significant but may not be clinically significant.
Example 2: A compound in grape juice inhibits cytochrome P450 enzymes, increasing bioavailability of certain medications by 5%. This may not be clinically significant if narrow therapeutic index medications are excluded.
Types of Statistics
Descriptive Statistics: Present, organize, and summarize data numerically, graphically, or in tables. Includes measures of central tendency and dispersion.
Inferential Statistics: Indicate whether a difference exists between groups or an association between variables. Used to determine if the difference or association is real or due to random chance (e.g., t-test, ANOVA).
Measures of Central Tendency and Dispersion
Measures of Central Tendency:
Mean
Median
Mode
Measures of Dispersion (Variability):
Range
Percentiles
Interquartile range
Variance
Standard deviation
Standard error of the mean
Mean
The average numerical value for data within a variable.
Calculated by summing all data for a variable and dividing by the total number of data points (n).
Sensitive to extreme values, which can be misleading.
Median
The value that divides a distribution into two equal parts.
The middle value when observations are arranged in order of magnitude.
If the number of observations is odd, it's the middle value.
If the number of observations is even, it's the average of the two middle values.
Mode
The most frequently occurring value in a data set.
A data set can have no mode, one mode, or multiple modes.
Can be used for qualitative and quantitative data.
Range
The difference between the maximum and minimum values in a data set.
The simplest measure of dispersion.
Percentiles
Divide the distribution into 100 equal parts, each representing 1% of the data set.
The median is the 50th percentile.
Percentile = (number \ of \ values \ below \ score) ÷ (total \ number \ of \ scores) \times 100
Interquartile Range (IQR)
The difference between the 25th (Q1) and 75th (Q3) percentiles.