Measures of Variability
Variability and Measures of Central Tendency
Discussion of variability and its importance alongside measures of central tendency.
Two primary characteristics of a score distribution: central tendency and variability.
Measures of Variability
Different metrics available for measuring variability, some more useful than others.
Range
Defined as the difference between the largest and smallest scores.
Can be found in descriptive statistics but not deviation scores.
Deviation Scores
Definition: Deviation score is the difference between a score and the mean (Score - Mean).
Deviation scores always sum to zero. Example calculation displayed:
Mean = 10; Scores: 11 (1), 10 (0), 9 (-1), 8 (-2); Sum = 0.
Deviation scores are not ideal for representing the variability of a dataset as they focus only on individual scores.
Sum of Squares
Squaring deviation scores eliminates negative signs, resolving the cancellation issue when summed.
Definition: Sum of squares refers to the total of squared deviation scores.
Example: Squaring scores leads to a clearer representation of variability (e.g., 1^2 + 0^2 + (-1)^2 + (-2)^2 ...).
Variance
Obtained by dividing the sum of squares by N - 1, where N is the number of observations. This formula accounts for degrees of freedom.
Degrees of Freedom: N - 1. Explained with a practical example:
In choosing phone numbers that sum to a certain total, only three numbers can be chosen freely; the last is determined by the total.
Calculating Variance
To calculate variance for a sample:
Calculate mean.
Compute each deviation score.
Square each deviation score.
Sum the squared deviations.
Divide by N - 1.
Standard Deviation
Found by taking the square root of the variance, returning to original measurement units.
Definition: Standard deviation measures how much scores deviate from the mean on average.
Important to memorize procedures for manual calculation. Quiz/exam questions may focus on this.
Interquartile Range (IQR)
Defined by dividing the dataset into quartiles, primarily used with median measures of central tendency.
IQR is useful when the median is preferred due to its resistance to skewed data.
Process for IQR:
Identify median; determine first and third quartiles of the lower and upper halves.
Calculate IQR as the difference between the upper and lower quartiles (Q3 - Q1).
Example: If Q3 = 8 and Q1 = 7, then IQR = 1.
Reporting Statistics Based on Data Distribution
When datasets are skewed, use the median and IQR for reporting instead of mean and standard deviation.
If datasets are symmetrical, mean and standard deviation are appropriate.
Importance of using appropriate measures depending on data distribution.
Conclusion
Reviewed measures of variation and variability including variance and standard deviation.
Advocated for proper handling of skewed datasets with median and IQR, alongside mean and standard deviation for symmetrical datasets.