Central Tendency and Measures of Variance
Central Tendency and Measures of Variance
Introduction
This lecture combines chapters two and three, focusing on central tendency and measures of variance.
The goal is to ensure a basic understanding of mean, median, mode, range, and standard deviation for the exam covering chapters one through four.
Descriptive statistics, frequency distribution tables, and central tendency should be approached with time and precision.
Central Tendency
Definition
Central tendency involves using a single number to describe a large dataset.
Example: The average cost of a car purchased in Southern California between January and June.
Averages are commonly used, such as GPA, which represents all grades in an undergraduate degree with one number.
Purpose
To find a number that represents the middle of all collected numbers.
Mean
The most common measure of central tendency.
It does not indicate how scores vary or spread out (variability).
The mean is the balance point of the scores and is sensitive to extreme scores.
Suitable for interval data.
Formula
The mean is calculated as the sum of all scores divided by the number of scores.
For a sample:
Where:
= sample mean
= sum of all raw scores
= total number of scores in the sample
For a population:
Where:
= population mean
= sum of all raw scores
= total number of scores in the population
Example Calculation
Linda's exam scores: 58, 67, 60, 84, 93, 98, 100.
Calculate the mean:
Linda's grade in the class is a B (80%).
Computing the Mean from a Frequency Table
Formula:
Multiply each (score) by its frequency .
Sum the products of each frequency multiplied by the raw score.
Divide by the total number of scores.
Example:
Scores: 6 (frequency 1), 5 (frequency 0), 4 (frequency 3), 3 (frequency 3), 2 (frequency 2)
Total number of scores = 9
Mean =
When Not to Use the Mean
The mean is susceptible to outliers (extreme scores that lie outside the trend).
Outliers can skew the mean, making it a less representative measure of central tendency.
Example: Salaries of 10 staff members where two large salaries skew the mean.
Most workers earn between $12,000 and $18,000, but the mean salary is $30,700 due to the manager's salaries.
In skewed situations, the median is a better measure of central tendency.
Median
The score that marks the 50th percentile, with half of the scores above and half below.
The median is the middle score or middle value.
Uses position rather than the actual value of the data.
Changes in extreme values do not affect the median.
Applicable to interval data.
Steps to Find the Median
Put the scores in rank order (smallest to largest).
For an odd number of data points, the median is the middle value.
For an even number of data points, add the two middle numbers and divide by two.
Examples
Even set: 10, 12, 19, 25, 28, 30. Middle numbers are 19 and 25. Median =
Odd set: 10, 12, 19, 25, 28. The median is 19.
For large datasets, software like SPSS can be used to find the median.
Mode
The data point that occurs most frequently.
No calculations are needed; it is found by inspection.
There can be more than one mode in a dataset.
Best used with nominal data (categorical data collected by name).
Example
Count the letters in each word of the sentence: "Count the letters in each word of the sentence and then find the mode."
Counts: 5, 3, 7, 2, 4, 4, 2, 4, 1, 2, 3, 4, 5, 3, 4.
The mode is 4, as it appears most frequently (five times).
Data Skewness
Skewed data means the data doesn't have an equal distribution.
In a normal distribution, the mean, median, and mode are all in the center.
If data is skewed (not a normal shape), the mean is pulled in the direction of the skew.
Right-skewed data: the tail is on the right side.
Mode is at the peak, the median is to the right of the mode, and the mean is pulled towards the right (the tail).
In this case, the mean does not represent the dataset well; the median is a better measure.
When data is skewed, the median is a more appropriate measure of central tendency.
Variability
Definition
Variability provides a quantitative measure of how spread out the scores are in a distribution.
Small differences: variability is small.
Large differences: variability is large.
Variability describes the distribution by how far the values deviate from the average.
Range
The difference between the largest and smallest scores.
Measures the spread of the data but only uses the highest and lowest values, ignoring the other values.
Example: John's scores range from 75 to 99 (range = 24), while Joe's range from 80 to 90 (range = 10).
Smaller range indicates more consistency.
Standard Deviation
Measures the dispersion of data values around the mean (how much the data deviates from the mean).
Standard means typical movement from the average.
Deviation is the distance from the mean.
SS (sum of squares) is the sum of the squared deviations.
Variance is the average of the squared deviations.
Formulas:
For a sample:
Where:
= sample standard deviation
= each value in the sample
= mean of the sample
= number of values in the sample
For a population:
Where:
= population standard deviation
= each value in the population
= mean of the population
= number of values in the population
Big Blossom Greenhouse Example
Measuring rose diameters in inches for the Rose Bowl parade.
Bush 1: 2, 3, 4, 5, 6, 8, 10, 10
Bush 2: 5, 5, 5, 6, 6, 6, 7, 8
Calculation for Bush 1
Sum of raw scores = 48
Mean =
Calculate deviations (X - Mean): -4, -3, -2, -1, 0, 2, 4, 4
Square the deviations: 16, 9, 4, 1, 0, 4, 16, 16
Sum of squares (SS) = 74
Variance =
Standard deviation =
On average, the rose size is 6 inches, with a typical deviation of about 3.25 inches.
Coefficient of Variation
Measures relative variation (standard deviation divided by the mean).
Useful for comparing one data series to another.
Formulas
Sample:
Population:
Example
New cars: mean price = $20,000, standard deviation = $6,000. Coefficient of variation =
Used cars: mean price = $5,485, standard deviation = $2,730. Coefficient of variation =
The smaller variation in the price of new cars (30%) indicates more stability compared to used cars (49%).