Shifting and Scaling Data
Module 3 - Section 1: Shifting and Scaling Data
Presented by: Rosana Fok
Overview of Data Shifting and Scaling
This section covers the effects of shifting and scaling data on the measures of position, spread, and shape of the data sets.
Shifting Data
Definition of Shifting Data
Shifting data involves adding (or subtracting) a constant to every data value.
This operation results in the following effects:
Position Measures: Measures of position, such as the center (mean and median), percentiles, maximum, and minimum values will all increase (or decrease) by the constant added or subtracted.
Shape and Spread: The shape and spread of the distribution, represented by range, interquartile range (IQR), and standard deviation, remain unchanged.
Mathematical Representation
If $y{ ext{new}} = y{ ext{original}} + c$ for each observation, the effects are:
Center Measures:
Center new $= ext{Center original} + c$
Position new $= ext{Position original} + c$
Spread and Shape Measures:
Spread new $= ext{Spread original}$
Shape new $= ext{Shape original}$
Example of Shifting Data
Data Set Provided:
$y1 = 1, y2 = 2, y3 = 3, y4 = 4, y_5 = 5$.
Constant Added: $c = 2$.
Summary Statistics Calculation:
Original Statistics:
n = 5
Mean = 3
Variance = 2.5
Std. Dev. = 1.581138
Median = 3
Range = 4 (from 1 to 5)
Min = 1
Max = 5
Q1 = 2
Q3 = 4
Shifting the Data:
New Data Set: $y + 2$ (i.e., $y1 + 2, y2 + 2, y3 + 2, y4 + 2, y_5 + 2$)
New Statistics:
n = 5
Mean = 5
Variance = 2.5
Std. Dev. = 1.581138
Median = 5
Range = 4
Min = 3
Max = 7
Q1 = 4
Q3 = 6
Rescaling Data
Definition of Rescaling Data
Rescaling data involves multiplying (or dividing) all data values by a constant.
This action affects:
Measures of Position: All measures of position (mean, median, percentiles) are multiplied (or divided) by the same constant.
Measures of Spread: All measures of spread (range, IQR, standard deviation) are multiplied (or divided) by the same constant.
Mathematical Representation
If $y{ ext{new}} = d imes y{ ext{original}}$ for each observation:
Center Measures:
Center new $= d imes ext{Center original}$
Position new $= d imes ext{Position original}$
Spread Measures:
Spread new $= d imes ext{Spread original}$
Shape new $= ext{Shape original}$
Example of Rescaling Data
Data Set Provided:
$y1 = 1, y2 = 2, y3 = 3, y4 = 4, y_5 = 5$.
Constant Multiplied: $d = 2$.
Summary Statistics Calculation:
Original Statistics:
n = 5
Mean = 3
Variance = 2.5
Std. Dev. = 1.581138
Median = 3
Range = 4
Min = 1
Max = 5
Q1 = 2
Q3 = 4
After Rescaling:
New Data Set: $y imes 2$ (i.e., $y1 imes 2, y2 imes 2, y3 imes 2, y4 imes 2, y_5 imes 2$)
New Statistics:
n = 5
Mean = 6
Variance = 10
Std. Dev. = 3.1622777
Median = 6
Range = 8
Min = 2
Max = 10
Q1 = 4
Q3 = 8
Summary of Shifting and Rescaling Data
Combined Effects
For transformations such as $y{ ext{new}} = d imes y{ ext{original}} + c$:
Center and Position Measures:
Center new $= d imes ext{Center original} + c$
Position new $= d imes ext{Position original} + c$
Spread and Shape Measures:
Spread new $= d imes ext{Spread original}$
Shape new $= ext{Shape original}$
Example of Application in Real-World Context
Context: Students taking an introductory statistics class reporting credit hours.
Given Charges: College charges $73 per credit hour plus a flat fee of $35 per quarter.
Cost Calculation: For 12 credit hours, payment calculation:
Cost = $35 + 12(73) = $911
Summary Statistics Provided:
Mean Fee Paid = 16.65
Standard Deviation = 2.96
Minimum = 5
Q1 = 15
Median = 16
Q3 = 19
Maximum = 28
The Standard Deviation as a Ruler
Concept Explanation
The distance from the mean of an observation measured in standard deviations provides insight into the observation's relative standing compared to others in the sample.
It is particularly useful in standardized testing scenarios to evaluate where an individual's score stands.
z-score (Standardized Value)
Definition of z-score
A z-score is a measure of relative standing expressing how many standard deviations an observation lies from the mean.
Formula: z = \frac{y - \mu}{\sigma}
Where $y$ is the observation, $
u$ is the population mean, and $ heta$ is the population standard deviation.
It allows for comparison of values measured on different scales or from different populations.
Key Points about Standardization:
Standardizing involves shifting data by subtracting the mean and rescaling via dividing by the standard deviation.
Standardized values help compare disparate data sets based on standardized deviations from the mean.
Example of Calculating z-scores in a Class
Class Statistics: Average score on the final is 75% with a standard deviation of 5%.
Individual Scores:
Amy: 80%
Calculation: How many standard deviations above the mean?
Mandy: 70%
Calculation: How many standard deviations below the mean?
Judy: 75%
Calculation: How many standard deviations from the mean?
Summary of z-scores
Types of z-scores
Positive z-score: Indicates above-average performance.
Negative z-score: Indicates below-average performance.
z-score of 0: Indicates performance exactly at the mean.
Summary for Standardization
Effects of Standardizing Data
Standardization does not change the shape of a distribution.
It modifies the center by making the mean 0.
It alters the spread by making the standard deviation 1.
Job Offer Example Comparing Two Students
Student A: Accounting major with a job offer of $35,000.
Student B: Advertising major with a job offer of $33,000.
Population Statistics for Context:
Accounting students: Mean ($
u$) = $34,500, Standard deviation ($ heta$) = $1,500.Advertising students: Mean ($
u$) = $32,500, Standard deviation ($ heta$) = $1,000.Analysis: Determine which student may feel happier based on relative z-scores of their job offers in context to major performance statistics.
Conclusion
Awareness of shifting and scaling data is integral in statistics as it impacts interpretations of measures of position, spread, and their implications in real-world scenarios.