Statistics: Variability and Standard Deviation
Introduction
- Importance of refining formulas to improve accuracy in statistics.
- Application to a smoking cessation trial.
- Availability of study questions and scripts for practice.
Understanding Variability
- Definition of Variability: The degree to which scores differ in a dataset. Can be visualized as a distribution graph:
- Low Variability: Distribution is low and wide, indicating a range of different scores.
- High Variability: Distribution is high and smushed, indicating many scores clustered in a narrow range.
- Key Terms:
- Range: Difference between the highest and lowest score.
- Average Absolute Deviation: Average distance of scores from the mean without considering direction (ignoring signs).
- Variance: Average squared difference of scores from the mean.
- Standard Deviation: Preferred measure of variability reflecting the typical distance of scores from the mean in research studies.
- Average Absolute Deviation vs. Variance: Although intuitive, average absolute deviation is less useful than variance and standard deviation in statistical analysis since it doesn't incorporate the squared differences.
Calculating Measures of Variability
- Concepts and Calculations:
- Begin with raw scores. Calculate deviations from the mean.
- Deviation ($d$): $d = x - ar{x}$ (where $x$ is the score, $ar{x}$ is the mean).
- Sum of deviations will equal zero, thus squaring deviations helps retain information.
- Sum of Squares (SS): Sum of squared deviations from the mean, useful for later calculations.
- Standard Deviation Calculation Steps:
- Calculate deviations.
- Square each deviation.
- Sum squared deviations (Sum of Squares).
- Average these squared deviations.
- Take the square root to find standard deviation.
The Importance of Standard Deviation and Variance
- Standard Deviation ($s$): Represents typical distance from the mean, calculated as:
s = ext{Square Root of Variance} - Variance measures "spread" of the data, while standard deviation gives a direct interpretation by returning to the original units of measurement (by taking the square root).
- Bell Curve Properties:
- In a normal distribution:
- ~68% of scores fall within one standard deviation.
- ~95% of scores fall within two standard deviations.
- Usage in Research: Helps to characterize datasets and compare group differences.
Adjustments for Sample Size - Degrees of Freedom
- Bias in Samples: Sample statistics like standard deviation can underestimate the population parameters because they don’t account for unobserved variability.
- Degrees of Freedom ($n - 1$): Adjusting the denominator in variance and standard deviation formulas to reduce bias. It compensates for the reduced variability captured in the sample compared to the population.
- Population vs. Sample:
- Population mean ($eta$) vs. Sample mean ($ar{x}$) and how differences arise.
- Corrected Standard Deviation: Uses $n - 1$ in the denominator to improve estimates of population parameters.
Application to Research: Smoking Cessation Trial
- Example from a smoking cessation trial comparing two medication arms:
- Data Representation: Mean and standard deviation output for each medication group.
- Interpret Results: Standard deviation indicates variability within each group, informing about the consistency of smoking scores among participants.
- Implications for Analysis:
- Researchers interpret how differences in medications influence smoking behavior based on these standard deviations.
- Importance of considering outliers and their effect on variability.
Summary
- Understanding statistical measures of central tendency and variability is crucial for accurate data analysis.
- Variance, standard deviation, and bias corrections play a significant role in statistical analysis and interpretation of research data.