Central Tendency and Measures of Variance

This lecture combines chapters two and three, focusing on central tendency and measures of variance.
The goal is to ensure a basic understanding of mean, median, mode, range, and standard deviation for the exam covering chapters one through four.
Descriptive statistics, frequency distribution tables, and central tendency should be approached with time and precision.

Central tendency involves using a single number to describe a large dataset.
Example: The average cost of a car purchased in Southern California between January and June.
Averages are commonly used, such as GPA, which represents all grades in an undergraduate degree with one number.

The mean is susceptible to outliers (extreme scores that lie outside the trend).
- Outliers can skew the mean, making it a less representative measure of central tendency.
Example: Salaries of 10 staff members where two large salaries skew the mean.
- Most workers earn between $12,000 and $18,000, but the mean salary is $30,700 due to the manager's salaries.
- In skewed situations, the median is a better measure of central tendency.

The score that marks the 50th percentile, with half of the scores above and half below.
The median is the middle score or middle value.
Uses position rather than the actual value of the data.
Changes in extreme values do not affect the median.
Applicable to interval data.

Put the scores in rank order (smallest to largest).
For an odd number of data points, the median is the middle value.
For an even number of data points, add the two middle numbers and divide by two.

Even set: 10, 12, 19, 25, 28, 30. Middle numbers are 19 and 25. Median = $(19 + 25) / 2 = 22$
Odd set: 10, 12, 19, 25, 28. The median is 19.
For large datasets, software like SPSS can be used to find the median.

Count the letters in each word of the sentence: "Count the letters in each word of the sentence and then find the mode."
Counts: 5, 3, 7, 2, 4, 4, 2, 4, 1, 2, 3, 4, 5, 3, 4.
The mode is 4, as it appears most frequently (five times).

Skewed data means the data doesn't have an equal distribution.
In a normal distribution, the mean, median, and mode are all in the center.
If data is skewed (not a normal shape), the mean is pulled in the direction of the skew.
Right-skewed data: the tail is on the right side.
- Mode is at the peak, the median is to the right of the mode, and the mean is pulled towards the right (the tail).
- In this case, the mean does not represent the dataset well; the median is a better measure.
When data is skewed, the median is a more appropriate measure of central tendency.

Variability provides a quantitative measure of how spread out the scores are in a distribution.
Small differences: variability is small.
Large differences: variability is large.
Variability describes the distribution by how far the values deviate from the average.

The difference between the largest and smallest scores.
Measures the spread of the data but only uses the highest and lowest values, ignoring the other values.
Example: John's scores range from 75 to 99 (range = 24), while Joe's range from 80 to 90 (range = 10).
Smaller range indicates more consistency.

Measures the dispersion of data values around the mean (how much the data deviates from the mean).
Standard means typical movement from the average.
Deviation is the distance from the mean.
SS (sum of squares) is the sum of the squared deviations.
Variance is the average of the squared deviations.
Formulas:
- For a sample:
  - $s = \sqrt{\frac{\sum(X - M)^2}{n-1}}$
  - Where:
    - $s$ = sample standard deviation
    - $X$ = each value in the sample
    - $M$ = mean of the sample
    - $n$ = number of values in the sample
- For a population:
  - $σ = \sqrt{\frac{\sum(X - \mu)^2}{N}}$
  - Where:
    - $σ$ = population standard deviation
    - $X$ = each value in the population
    - $μ$ = mean of the population
    - $N$ = number of values in the population

Measuring rose diameters in inches for the Rose Bowl parade.
- Bush 1: 2, 3, 4, 5, 6, 8, 10, 10
- Bush 2: 5, 5, 5, 6, 6, 6, 7, 8

Sum of raw scores = 48
Mean = $48 / 8 = 6$
Calculate deviations (X - Mean): -4, -3, -2, -1, 0, 2, 4, 4
Square the deviations: 16, 9, 4, 1, 0, 4, 16, 16
Sum of squares (SS) = 74
Variance = $74 / (8 - 1) = 74 / 7 = 10.57$
Standard deviation = $\sqrt{10.57} = 3.25$
On average, the rose size is 6 inches, with a typical deviation of about 3.25 inches.

New cars: mean price = $20,000, standard deviation = $6,000. Coefficient of variation = $(6000 / 20000) * 100 = 30%$
Used cars: mean price = $5,485, standard deviation = $2,730. Coefficient of variation = $(2730 / 5485) * 100 = 49%$
The smaller variation in the price of new cars (30%) indicates more stability compared to used cars (49%).