Exam Notes on Standard Deviation, Distributions, and Z-Scores
General Properties of Standard Deviation
- Measures spread (variability) by looking at how far the observations are from their mean.
- S=0 indicates no spread; all scores are the same.
- S increases as scores become more spread out (further from the mean) indicating more variation.
- S, like the mean, is influenced by extreme values (non-resistant).
- Context varies typically by standard deviation (SD) from the mean.
- Example: The height of power forwards in the NBA typically varies by 1.52 inches from the mean of 80.1 inches.
Distributions and Skewness
- Symmetric Distribution:
- Mean and median are approximately equal.
- Skewed Right Distribution:
- Pulled towards extreme values.
- Mean > Median.
- Center: Median.
- Spread: Range or IQR.
- Skewed Left Distribution:
- Mean < Median.
- Center: Median.
- Spread: Range or IQR.
- The median is a better representation of center in skewed distributions.
Data Types and Visualizations
- Categorical Data:
- Quantitative Data:
- Dot plots, stem plots, histograms.
Frequency Tables
- Counts, relative frequency, marginal frequency, joint frequency, and conditional frequencies.
Measures of Center and Spread
- Mean: Symmetrical data.
- Standard Deviation: Spread, typical distance from the mean.
Calculations and Interpretations
- Five-number summary.
- Determining outliers.
- Describing and comparing distributions.
- Median is the 50th percentile.
- When outliers are removed, the mean decreases if the outlier was larger than the mean and increases if the outlier was smaller than the mean.
- If a value the same as the mean is added, the SD does not change.
- Adding a number higher but close to the mean, the spread decreases, and the mean increases because the new value is higher than the current mean.
- By adding an outlier far away from the mean, the mean increases because it is pulled towards the data, and the SD would increase as well due to the non-resistance.
Interquartile Range (IQR)
- Identifies the middle 50% of the data.
- IQR=Q3−Q1
- Outlier Identification:
- Greater than Q3+1.5×IQR
- Less than Q1−1.5×IQR
Example
- Data set: (1,10,20,30,40,50,120)
- Median (Q2) = 30
- Q1 = 10
- Q3 = 50
- IQR=50−10=40
- Lower outlier boundary: 10−1.5(40)=−50
- Higher outlier boundary: 50+1.5(40)=110
- 120 is an outlier.
Five-Number Summary
- Minimum, Q1, Median (Q2), Q3, Maximum.
- Displayed on boxplots.
- Include outliers.
- Needs title and label.
Statistical Calculations
- Using stats plot in calculator to generate the five-number summary.
- Accessing mean ($\bar{x}$) using calculator functions.
- n−1 describes characteristics of samples.
Parameter vs. Statistic
- Parameter: Describes some characteristic of a population.
- Statistic: Describes some characteristic of a sample.
Variance
- Average squared deviation from the mean.
- s2=n−1∑(x−xˉ)2
- Where:
- x is each value in the data set.
- $\bar{x}$ is the mean.
- n is the number of values in the data set.
Standard Deviation
- Square root of the average squared deviation from the mean (square root of variance).
- If not squared, data can cancel to zero, showing no variance.
- Expressed in the same units as the original data.
Standard Deviation Types
- s<em>x and σ</em>x are standard deviations for samples and populations, respectively.
Percentiles and Relative Location
- The pth percentile of a distribution is the value with p percent of the observations less than or equal to it.
- Example: If Jenny is at the 88th percentile in a class's test score, it means that 88% of scores are below or equal to Jenny's score.
Cumulative Relative Frequency Graph
- Displays the cumulative relative frequency of each class of a frequency distribution.
Z-Scores
- How many standard deviations from the mean an observation falls and in what direction.
- z=σx−μ
- x is the observed value.
- μ is the mean.
- σ is the standard deviation.
- Units of measure for a z-score are standard deviations (SD above or below the mean).
- Adding or subtracting a constant:
- Affects measures of center and location (mean, median, quartiles, percentiles).
- Does not change shape and measures of spread (range, IQR, SD).
- Multiplying or dividing each observation by the same number:
- Affects measures of center and location.
- Affects measures of spread.
- Does not change the shape of the distribution.
Percentiles
- Measure of relative location, described as "at" a certain percentile, not "in".
Standardized Score
- Converting to a comparable number.
- Number of standard deviations above/below the mean.
Standard Normal Distribution
- Adding/subtracting a constant a:
- Shape: Unchanged
- Center: +a
- Spread: Unchanged
- Multiplying/dividing by a constant b:
- Shape: Unchanged
- Center: ×/÷b
- Spread: ×/÷b
Density Curves
- Always on or above the horizontal axis.
- Has area exactly 1 underneath it regardless of mean and SD.
- Describes the overall pattern of a distribution.
- The area under the curve and above any interval of values on the x-axis is the proportion of all observations that fall in the interval.
- Can have different SD and mean from real distribution but it is closed (an idealized description).
Normal Distribution
- Always symmetric, single-peaked, and bell-shaped.
- Any specific normal curve is completely described by giving its mean μ and standard deviation σ: N(μ,σ)
The Empirical Rule (68-95-99.7 Rule)
- 68% of data falls within 1 standard deviation of the mean.
- 95% of data falls within 2 standard deviations of the mean.
- 99.7% of data falls within 3 standard deviations of the mean.
Standard Normal Distribution
Calculations with Normal Distributions
- To find percentile for a score:
- Calculate the z-score.
- Find the area to the left of the z-score on the standard normal table.
Assessing Normality
- If a plot of scores against expected normal scores is a straight line, the data can be considered approximately normal.
Calculations and Z-Scores
- Always draw a normal curve.
- Make calculations and use z-scores to find the area out of the table.