Standard Deviation and Variation Notes
Standard Deviation and Variation
Do Now
- Aim: Calculate the standard deviation of a data set to describe the variation of the data.
- Using a box and whisker plot, identify the values: minX, Q1, Med, Q3, maxX.
- Create a box plot for the data: 65, 75, 92, 84, 62, 96, 88, 79, 82.
Homework Check
- Review answers for packet pages 19-20.
- Example data:
- Median = 30
- Min = 67
- Q1 = 78
- Med (Q2) = 85
- Q3 = 90
- Max = 98
Variation
- Two data sets can have the same mean, median, and mode but appear very different due to the variation within the sets.
- Focus on understanding variation.
Standard Deviation
- Standard Deviation: On average, it shows how far away a data point is from the mean of the data set.
- A small standard deviation means the data points tend to be very close to the mean (more consistent data).
- A large standard deviation means the values are more spread out.
Calculating Standard Deviation Using a Calculator
- Press STAT, then 1: Edit. Enter the data into list L1.
- Press STAT, go to CALC, and select 1: 1-Var Stats.
- Press ENTER twice.
Example 1
- Use a calculator to find the standard deviation (Sx) of the two data sets.
- Round answers to the nearest tenth.
- Data Set #1: 3, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8, 9, 9, 10, 10, 11, 11
- Data Set #2: 5, 5, 6, 6, 7, 7, 8, 8, 9, 9
- The data set with a larger standard deviation has more spread out data.
Example 2
- A farm studies the weight of baby chickens after one week.
- The weights (in ounces) of 20 chicks are:
- 2, 1, 3, 4, 2, 2, 3, 1, 5, 3, 4, 4, 5, 6, 3, 8, 5, 4, 6, 3
- Find the mean, interquartile range, and standard deviation.
- Round the answers to the nearest tenth and include units.
Example 3
- A marketing company studies diversity in the age of soft drink consumers.
- Ages of people who prefer Soda A:
- 16, 16, 18, 18, 21, 22, 22, 25, 27, 28, 29, 36, 38, 40, 44
- Ages of people who prefer Soda B:
- 18, 18, 19, 19, 20, 22, 22, 23, 25, 25, 26, 27, 28, 29, 30
- (a) Explain why standard deviation is better than the mean for measuring age diversity.
- (b) Determine which soda has greater age diversity and explain the choice.
- (c) Use a calculator to find the sample standard deviation (Sx) for both data sets and round to the nearest tenth. Check if this result supports the choice from (b).
Example 4
- Which data set has a standard deviation closest to zero? (Answer without a calculator.)
- (1) {-5, -2, -1, 0, 1, 2, 5}
- (2) {5, 8, 10, 16, 20}
- (3) {11, 11, 12, 13, 13}
- (4) {3, 7, 11, 11, 11, 18}
Example 5
- Home run data for the 16 batters with the most home runs in the 2005 MLB season:
- 51, 48, 47, 46, 45, 43, 41, 40, 40, 39, 42, 44, 46, 48, 49, 38
- Identify values for the data set.
Data Set Comparison
- Data Set A has a mean of 8.9 and a standard deviation of 1.
- Data Set B has a mean of 8.9 and a standard deviation of 2.
- The data set with the smaller standard deviation has values closer together.
Homework Questions
- Packet page 24, problems 3-5.
- Use your calculator to find the interquartile range (IQR) and sample standard deviation (Sx).
- Show the calculation for the IQR. Round non-integer values to the nearest tenth.
- (a) 4, 6, 8, 10, 15, 19, 22, 25
- (b) 3, 3, 4, 5, 5, 6, 6, 7, 7, 8
- Given a dot plot, determine the closest population standard deviation (σx).
- What is the IQR of the data set represented in the box plot?
- Which measure best represents the average distance of a data value from the mean?
- Which data set has the largest standard deviation?
Concept Review:
- QUANTitative: Deals with #'s
- QUALITative: description/label
- Non-biased sample must be random
Types of Distribution (Shape)
- Symmetric (bell-shaped)
- Skewed
- Skewed left
- Skewed right
Frequency Table
- Example: Ages and # of people
Box-and-Whisker Plot (Box Plot)
- Use calc. to get Five-Number Summary: Min, Q1, Med, Q3, Max
- Enter data in L₁ (STAT -1: Edit…)
- STAT > CALC 1: 1-Var Stats
- Enter 3x
- IQR = Q3-Q1
Standard Deviation
- S_x: sample standard deviation
- \sigma_x: population standard deviation
- \bar{x}: mean
- small SD-values are close to the mean (less variation)
- large SD-values are more spread apart from the mean
(greater variation)