Shifting and Scaling Data

Module 3 - Section 1: Shifting and Scaling Data

Presented by: Rosana Fok

Overview of Data Shifting and Scaling

  • This section covers the effects of shifting and scaling data on the measures of position, spread, and shape of the data sets.


Shifting Data

  1. Definition of Shifting Data

    • Shifting data involves adding (or subtracting) a constant to every data value.

    • This operation results in the following effects:

      • Position Measures: Measures of position, such as the center (mean and median), percentiles, maximum, and minimum values will all increase (or decrease) by the constant added or subtracted.

      • Shape and Spread: The shape and spread of the distribution, represented by range, interquartile range (IQR), and standard deviation, remain unchanged.

  2. Mathematical Representation

    • If $y{ ext{new}} = y{ ext{original}} + c$ for each observation, the effects are:

      • Center Measures:

      • Center new $= ext{Center original} + c$

      • Position new $= ext{Position original} + c$

      • Spread and Shape Measures:

      • Spread new $= ext{Spread original}$

      • Shape new $= ext{Shape original}$


Example of Shifting Data

  • Data Set Provided:

    • $y1 = 1, y2 = 2, y3 = 3, y4 = 4, y_5 = 5$.

  • Constant Added: $c = 2$.

  • Summary Statistics Calculation:

    • Original Statistics:

      • n = 5

      • Mean = 3

      • Variance = 2.5

      • Std. Dev. = 1.581138

      • Median = 3

      • Range = 4 (from 1 to 5)

      • Min = 1

      • Max = 5

      • Q1 = 2

      • Q3 = 4

    • Shifting the Data:

      • New Data Set: $y + 2$ (i.e., $y1 + 2, y2 + 2, y3 + 2, y4 + 2, y_5 + 2$)

      • New Statistics:

      • n = 5

      • Mean = 5

      • Variance = 2.5

      • Std. Dev. = 1.581138

      • Median = 5

      • Range = 4

      • Min = 3

      • Max = 7

      • Q1 = 4

      • Q3 = 6


Rescaling Data

  1. Definition of Rescaling Data

    • Rescaling data involves multiplying (or dividing) all data values by a constant.

    • This action affects:

      • Measures of Position: All measures of position (mean, median, percentiles) are multiplied (or divided) by the same constant.

      • Measures of Spread: All measures of spread (range, IQR, standard deviation) are multiplied (or divided) by the same constant.

  2. Mathematical Representation

    • If $y{ ext{new}} = d imes y{ ext{original}}$ for each observation:

      • Center Measures:

      • Center new $= d imes ext{Center original}$

      • Position new $= d imes ext{Position original}$

      • Spread Measures:

      • Spread new $= d imes ext{Spread original}$

      • Shape new $= ext{Shape original}$


Example of Rescaling Data

  • Data Set Provided:

    • $y1 = 1, y2 = 2, y3 = 3, y4 = 4, y_5 = 5$.

  • Constant Multiplied: $d = 2$.

  • Summary Statistics Calculation:

    • Original Statistics:

      • n = 5

      • Mean = 3

      • Variance = 2.5

      • Std. Dev. = 1.581138

      • Median = 3

      • Range = 4

      • Min = 1

      • Max = 5

      • Q1 = 2

      • Q3 = 4

    • After Rescaling:

      • New Data Set: $y imes 2$ (i.e., $y1 imes 2, y2 imes 2, y3 imes 2, y4 imes 2, y_5 imes 2$)

      • New Statistics:

      • n = 5

      • Mean = 6

      • Variance = 10

      • Std. Dev. = 3.1622777

      • Median = 6

      • Range = 8

      • Min = 2

      • Max = 10

      • Q1 = 4

      • Q3 = 8


Summary of Shifting and Rescaling Data

  1. Combined Effects

    • For transformations such as $y{ ext{new}} = d imes y{ ext{original}} + c$:

      • Center and Position Measures:

      • Center new $= d imes ext{Center original} + c$

      • Position new $= d imes ext{Position original} + c$

      • Spread and Shape Measures:

      • Spread new $= d imes ext{Spread original}$

      • Shape new $= ext{Shape original}$


Example of Application in Real-World Context

  • Context: Students taking an introductory statistics class reporting credit hours.

  • Given Charges: College charges $73 per credit hour plus a flat fee of $35 per quarter.

    • Cost Calculation: For 12 credit hours, payment calculation:

      • Cost = $35 + 12(73) = $911

    • Summary Statistics Provided:

      • Mean Fee Paid = 16.65

      • Standard Deviation = 2.96

      • Minimum = 5

      • Q1 = 15

      • Median = 16

      • Q3 = 19

      • Maximum = 28


The Standard Deviation as a Ruler

  1. Concept Explanation

    • The distance from the mean of an observation measured in standard deviations provides insight into the observation's relative standing compared to others in the sample.

    • It is particularly useful in standardized testing scenarios to evaluate where an individual's score stands.


z-score (Standardized Value)

  1. Definition of z-score

    • A z-score is a measure of relative standing expressing how many standard deviations an observation lies from the mean.

      • Formula: z = \frac{y - \mu}{\sigma}

      • Where $y$ is the observation, $
        u$ is the population mean, and $ heta$ is the population standard deviation.

    • It allows for comparison of values measured on different scales or from different populations.

  2. Key Points about Standardization:

    • Standardizing involves shifting data by subtracting the mean and rescaling via dividing by the standard deviation.

    • Standardized values help compare disparate data sets based on standardized deviations from the mean.


Example of Calculating z-scores in a Class

  • Class Statistics: Average score on the final is 75% with a standard deviation of 5%.

  • Individual Scores:

    • Amy: 80%

      • Calculation: How many standard deviations above the mean?

    • Mandy: 70%

      • Calculation: How many standard deviations below the mean?

    • Judy: 75%

      • Calculation: How many standard deviations from the mean?


Summary of z-scores

  1. Types of z-scores

    • Positive z-score: Indicates above-average performance.

    • Negative z-score: Indicates below-average performance.

    • z-score of 0: Indicates performance exactly at the mean.


Summary for Standardization

  1. Effects of Standardizing Data

    • Standardization does not change the shape of a distribution.

    • It modifies the center by making the mean 0.

    • It alters the spread by making the standard deviation 1.


Job Offer Example Comparing Two Students

  • Student A: Accounting major with a job offer of $35,000.

  • Student B: Advertising major with a job offer of $33,000.

  • Population Statistics for Context:

    • Accounting students: Mean ($
      u$) = $34,500, Standard deviation ($ heta$) = $1,500.

    • Advertising students: Mean ($
      u$) = $32,500, Standard deviation ($ heta$) = $1,000.

    • Analysis: Determine which student may feel happier based on relative z-scores of their job offers in context to major performance statistics.


Conclusion

  • Awareness of shifting and scaling data is integral in statistics as it impacts interpretations of measures of position, spread, and their implications in real-world scenarios.


Thank you for Watching!