Central Tendency and Measures of Variance

Central Tendency and Measures of Variance

Introduction

  • This lecture combines chapters two and three, focusing on central tendency and measures of variance.

  • The goal is to ensure a basic understanding of mean, median, mode, range, and standard deviation for the exam covering chapters one through four.

  • Descriptive statistics, frequency distribution tables, and central tendency should be approached with time and precision.

Central Tendency

Definition
  • Central tendency involves using a single number to describe a large dataset.

  • Example: The average cost of a car purchased in Southern California between January and June.

  • Averages are commonly used, such as GPA, which represents all grades in an undergraduate degree with one number.

Purpose
  • To find a number that represents the middle of all collected numbers.

Mean
  • The most common measure of central tendency.

  • It does not indicate how scores vary or spread out (variability).

  • The mean is the balance point of the scores and is sensitive to extreme scores.

  • Suitable for interval data.

Formula
  • The mean is calculated as the sum of all scores divided by the number of scores.

    • For a sample:

      • M=XnM = \frac{\sum X}{n}

      • Where:

        • MM = sample mean

        • X\sum X = sum of all raw scores

        • nn = total number of scores in the sample

    • For a population:

      • μ=XNμ = \frac{\sum X}{N}

      • Where:

        • μμ = population mean

        • X\sum X = sum of all raw scores

        • NN = total number of scores in the population

Example Calculation
  • Linda's exam scores: 58, 67, 60, 84, 93, 98, 100.

  • Calculate the mean: (58+67+60+84+93+98+100)/7=560/7=80(58 + 67 + 60 + 84 + 93 + 98 + 100) / 7 = 560 / 7 = 80

  • Linda's grade in the class is a B (80%).

Computing the Mean from a Frequency Table
  • Formula: mean=(f")nmean = \frac{\sum(f ")}{n}

    • Multiply each xx (score) by its frequency (f)(f).

    • Sum the products of each frequency multiplied by the raw score.

    • Divide by the total number of scores.

    • Example:

      • Scores: 6 (frequency 1), 5 (frequency 0), 4 (frequency 3), 3 (frequency 3), 2 (frequency 2)

      • (6<em>1)+(5</em>0)+(4<em>3)+(3</em>3)+(22)=6+0+12+9+4=31(6<em>1) + (5</em>0) + (4<em>3) + (3</em>3) + (2*2) = 6 + 0 + 12 + 9 + 4 = 31

      • Total number of scores = 9

      • Mean = 31/9=3.4431 / 9 = 3.44

When Not to Use the Mean
  • The mean is susceptible to outliers (extreme scores that lie outside the trend).

    • Outliers can skew the mean, making it a less representative measure of central tendency.

  • Example: Salaries of 10 staff members where two large salaries skew the mean.

    • Most workers earn between $12,000 and $18,000, but the mean salary is $30,700 due to the manager's salaries.

    • In skewed situations, the median is a better measure of central tendency.

Median
  • The score that marks the 50th percentile, with half of the scores above and half below.

  • The median is the middle score or middle value.

  • Uses position rather than the actual value of the data.

  • Changes in extreme values do not affect the median.

  • Applicable to interval data.

Steps to Find the Median
  • Put the scores in rank order (smallest to largest).

  • For an odd number of data points, the median is the middle value.

  • For an even number of data points, add the two middle numbers and divide by two.

Examples
  • Even set: 10, 12, 19, 25, 28, 30. Middle numbers are 19 and 25. Median = (19+25)/2=22(19 + 25) / 2 = 22

  • Odd set: 10, 12, 19, 25, 28. The median is 19.

  • For large datasets, software like SPSS can be used to find the median.

Mode
  • The data point that occurs most frequently.

  • No calculations are needed; it is found by inspection.

  • There can be more than one mode in a dataset.

  • Best used with nominal data (categorical data collected by name).

Example
  • Count the letters in each word of the sentence: "Count the letters in each word of the sentence and then find the mode."

  • Counts: 5, 3, 7, 2, 4, 4, 2, 4, 1, 2, 3, 4, 5, 3, 4.

  • The mode is 4, as it appears most frequently (five times).

Data Skewness
  • Skewed data means the data doesn't have an equal distribution.

  • In a normal distribution, the mean, median, and mode are all in the center.

  • If data is skewed (not a normal shape), the mean is pulled in the direction of the skew.

  • Right-skewed data: the tail is on the right side.

    • Mode is at the peak, the median is to the right of the mode, and the mean is pulled towards the right (the tail).

    • In this case, the mean does not represent the dataset well; the median is a better measure.

  • When data is skewed, the median is a more appropriate measure of central tendency.

Variability

Definition
  • Variability provides a quantitative measure of how spread out the scores are in a distribution.

  • Small differences: variability is small.

  • Large differences: variability is large.

  • Variability describes the distribution by how far the values deviate from the average.

Range
  • The difference between the largest and smallest scores.

  • Measures the spread of the data but only uses the highest and lowest values, ignoring the other values.

  • Example: John's scores range from 75 to 99 (range = 24), while Joe's range from 80 to 90 (range = 10).

  • Smaller range indicates more consistency.

Standard Deviation
  • Measures the dispersion of data values around the mean (how much the data deviates from the mean).

  • Standard means typical movement from the average.

  • Deviation is the distance from the mean.

  • SS (sum of squares) is the sum of the squared deviations.

  • Variance is the average of the squared deviations.

  • Formulas:

    • For a sample:

      • s=(XM)2n1s = \sqrt{\frac{\sum(X - M)^2}{n-1}}

      • Where:

        • ss = sample standard deviation

        • XX = each value in the sample

        • MM = mean of the sample

        • nn = number of values in the sample

    • For a population:

      • σ=(Xμ)2Nσ = \sqrt{\frac{\sum(X - \mu)^2}{N}}

      • Where:

        • σσ = population standard deviation

        • XX = each value in the population

        • μμ = mean of the population

        • NN = number of values in the population

Big Blossom Greenhouse Example
  • Measuring rose diameters in inches for the Rose Bowl parade.

    • Bush 1: 2, 3, 4, 5, 6, 8, 10, 10

    • Bush 2: 5, 5, 5, 6, 6, 6, 7, 8

Calculation for Bush 1
  • Sum of raw scores = 48

  • Mean = 48/8=648 / 8 = 6

  • Calculate deviations (X - Mean): -4, -3, -2, -1, 0, 2, 4, 4

  • Square the deviations: 16, 9, 4, 1, 0, 4, 16, 16

  • Sum of squares (SS) = 74

  • Variance = 74/(81)=74/7=10.5774 / (8 - 1) = 74 / 7 = 10.57

  • Standard deviation = 10.57=3.25\sqrt{10.57} = 3.25

  • On average, the rose size is 6 inches, with a typical deviation of about 3.25 inches.

Coefficient of Variation
  • Measures relative variation (standard deviation divided by the mean).

  • Useful for comparing one data series to another.

Formulas
  • Sample: (s/mean)100(s / mean) * 100

  • Population: (σ/μ)100(σ / μ) * 100

Example
  • New cars: mean price = $20,000, standard deviation = $6,000. Coefficient of variation = (6000/20000)100=30(6000 / 20000) * 100 = 30%

  • Used cars: mean price = $5,485, standard deviation = $2,730. Coefficient of variation = (2730/5485)100=49(2730 / 5485) * 100 = 49%

  • The smaller variation in the price of new cars (30%) indicates more stability compared to used cars (49%).