Shifting and Scaling Data

Presented by: Rosana Fok

This section covers the effects of shifting and scaling data on the measures of position, spread, and shape of the data sets.

Definition of Shifting Data
- Shifting data involves adding (or subtracting) a constant to every data value.
- This operation results in the following effects:
  - Position Measures: Measures of position, such as the center (mean and median), percentiles, maximum, and minimum values will all increase (or decrease) by the constant added or subtracted.
  - Shape and Spread: The shape and spread of the distribution, represented by range, interquartile range (IQR), and standard deviation, remain unchanged.
Mathematical Representation
- If $y{ ext{new}} = y{ ext{original}} + c$ for each observation, the effects are:
  - Center Measures:
  - Center new $= ext{Center original} + c$
  - Position new $= ext{Position original} + c$
  - Spread and Shape Measures:
  - Spread new $= ext{Spread original}$
  - Shape new $= ext{Shape original}$

Data Set Provided:
- $y1 = 1, y2 = 2, y3 = 3, y4 = 4, y_5 = 5$.
Constant Added: $c = 2$.
Summary Statistics Calculation:
- Original Statistics:
  - n = 5
  - Mean = 3
  - Variance = 2.5
  - Std. Dev. = 1.581138
  - Median = 3
  - Range = 4 (from 1 to 5)
  - Min = 1
  - Max = 5
  - Q1 = 2
  - Q3 = 4
- Shifting the Data:
  - New Data Set: $y + 2$ (i.e., $y1 + 2, y2 + 2, y3 + 2, y4 + 2, y_5 + 2$)
  - New Statistics:
  - n = 5
  - Mean = 5
  - Variance = 2.5
  - Std. Dev. = 1.581138
  - Median = 5
  - Range = 4
  - Min = 3
  - Max = 7
  - Q1 = 4
  - Q3 = 6

Definition of Rescaling Data
- Rescaling data involves multiplying (or dividing) all data values by a constant.
- This action affects:
  - Measures of Position: All measures of position (mean, median, percentiles) are multiplied (or divided) by the same constant.
  - Measures of Spread: All measures of spread (range, IQR, standard deviation) are multiplied (or divided) by the same constant.
Mathematical Representation
- If $y{ ext{new}} = d imes y{ ext{original}}$ for each observation:
  - Center Measures:
  - Center new $= d imes ext{Center original}$
  - Position new $= d imes ext{Position original}$
  - Spread Measures:
  - Spread new $= d imes ext{Spread original}$
  - Shape new $= ext{Shape original}$

Data Set Provided:
- $y1 = 1, y2 = 2, y3 = 3, y4 = 4, y_5 = 5$.
Constant Multiplied: $d = 2$.
Summary Statistics Calculation:
- Original Statistics:
  - n = 5
  - Mean = 3
  - Variance = 2.5
  - Std. Dev. = 1.581138
  - Median = 3
  - Range = 4
  - Min = 1
  - Max = 5
  - Q1 = 2
  - Q3 = 4
- After Rescaling:
  - New Data Set: $y imes 2$ (i.e., $y1 imes 2, y2 imes 2, y3 imes 2, y4 imes 2, y_5 imes 2$)
  - New Statistics:
  - n = 5
  - Mean = 6
  - Variance = 10
  - Std. Dev. = 3.1622777
  - Median = 6
  - Range = 8
  - Min = 2
  - Max = 10
  - Q1 = 4
  - Q3 = 8

Combined Effects
- For transformations such as $y{ ext{new}} = d imes y{ ext{original}} + c$:
  - Center and Position Measures:
  - Center new $= d imes ext{Center original} + c$
  - Position new $= d imes ext{Position original} + c$
  - Spread and Shape Measures:
  - Spread new $= d imes ext{Spread original}$
  - Shape new $= ext{Shape original}$

Context: Students taking an introductory statistics class reporting credit hours.
Given Charges: College charges $73 per credit hour plus a flat fee of $35 per quarter.
- Cost Calculation: For 12 credit hours, payment calculation:
  - Cost = $35 + 12(73) = $911
- Summary Statistics Provided:
  - Mean Fee Paid = 16.65
  - Standard Deviation = 2.96
  - Minimum = 5
  - Q1 = 15
  - Median = 16
  - Q3 = 19
  - Maximum = 28

Concept Explanation
- The distance from the mean of an observation measured in standard deviations provides insight into the observation's relative standing compared to others in the sample.
- It is particularly useful in standardized testing scenarios to evaluate where an individual's score stands.

Definition of z-score
- A z-score is a measure of relative standing expressing how many standard deviations an observation lies from the mean.
  - Formula: z = \frac{y - \mu}{\sigma}
  - Where $y$ is the observation, $
    u$ is the population mean, and $ heta$ is the population standard deviation.
- It allows for comparison of values measured on different scales or from different populations.
Key Points about Standardization:
- Standardizing involves shifting data by subtracting the mean and rescaling via dividing by the standard deviation.
- Standardized values help compare disparate data sets based on standardized deviations from the mean.

Class Statistics: Average score on the final is 75% with a standard deviation of 5%.
Individual Scores:
- Amy: 80%
  - Calculation: How many standard deviations above the mean?
- Mandy: 70%
  - Calculation: How many standard deviations below the mean?
- Judy: 75%
  - Calculation: How many standard deviations from the mean?

Types of z-scores
- Positive z-score: Indicates above-average performance.
- Negative z-score: Indicates below-average performance.
- z-score of 0: Indicates performance exactly at the mean.

Effects of Standardizing Data
- Standardization does not change the shape of a distribution.
- It modifies the center by making the mean 0.
- It alters the spread by making the standard deviation 1.

Student A: Accounting major with a job offer of $35,000.
Student B: Advertising major with a job offer of $33,000.
Population Statistics for Context:
- Accounting students: Mean ($
  u$) = $34,500, Standard deviation ($ heta$) = $1,500.
- Advertising students: Mean ($
  u$) = $32,500, Standard deviation ($ heta$) = $1,000.
- Analysis: Determine which student may feel happier based on relative z-scores of their job offers in context to major performance statistics.

Awareness of shifting and scaling data is integral in statistics as it impacts interpretations of measures of position, spread, and their implications in real-world scenarios.