Exam Notes on Standard Deviation, Distributions, and Z-Scores

General Properties of Standard Deviation

  • Measures spread (variability) by looking at how far the observations are from their mean.
  • S=0S = 0 indicates no spread; all scores are the same.
  • SS increases as scores become more spread out (further from the mean) indicating more variation.
  • SS, like the mean, is influenced by extreme values (non-resistant).
  • Context varies typically by standard deviation (SD) from the mean.
    • Example: The height of power forwards in the NBA typically varies by 1.52 inches from the mean of 80.1 inches.

Distributions and Skewness

  • Symmetric Distribution:
    • Mean and median are approximately equal.
  • Skewed Right Distribution:
    • Pulled towards extreme values.
    • Mean > Median.
    • Center: Median.
    • Spread: Range or IQR.
  • Skewed Left Distribution:
    • Mean < Median.
    • Center: Median.
    • Spread: Range or IQR.
    • The median is a better representation of center in skewed distributions.

Data Types and Visualizations

  • Categorical Data:
    • Pie charts, bar graphs.
  • Quantitative Data:
    • Dot plots, stem plots, histograms.

Frequency Tables

  • Counts, relative frequency, marginal frequency, joint frequency, and conditional frequencies.

Measures of Center and Spread

  • Mean: Symmetrical data.
  • Standard Deviation: Spread, typical distance from the mean.

Calculations and Interpretations

  • Five-number summary.
  • Determining outliers.
  • Describing and comparing distributions.

Median and Outliers

  • Median is the 50th percentile.
  • When outliers are removed, the mean decreases if the outlier was larger than the mean and increases if the outlier was smaller than the mean.
  • If a value the same as the mean is added, the SD does not change.
  • Adding a number higher but close to the mean, the spread decreases, and the mean increases because the new value is higher than the current mean.
  • By adding an outlier far away from the mean, the mean increases because it is pulled towards the data, and the SD would increase as well due to the non-resistance.

Interquartile Range (IQR)

  • Identifies the middle 50% of the data.
  • IQR=Q3Q1IQR = Q3 - Q1
  • Outlier Identification:
    • Greater than Q3+1.5×IQRQ3 + 1.5 \times IQR
    • Less than Q11.5×IQRQ1 - 1.5 \times IQR

Example

  • Data set: (1,10,20,30,40,50,120)(1, 10, 20, 30, 40, 50, 120)
  • Median (Q2) = 30
  • Q1 = 10
  • Q3 = 50
  • IQR=5010=40IQR = 50 - 10 = 40
  • Lower outlier boundary: 101.5(40)=5010 - 1.5(40) = -50
  • Higher outlier boundary: 50+1.5(40)=11050 + 1.5(40) = 110
  • 120 is an outlier.

Five-Number Summary

  • Minimum, Q1, Median (Q2), Q3, Maximum.
  • Displayed on boxplots.
  • Include outliers.
  • Needs title and label.

Statistical Calculations

  • Using stats plot in calculator to generate the five-number summary.
  • Accessing mean ($\bar{x}$) using calculator functions.
  • n1n-1 describes characteristics of samples.

Parameter vs. Statistic

  • Parameter: Describes some characteristic of a population.
  • Statistic: Describes some characteristic of a sample.

Variance

  • Average squared deviation from the mean.
  • s2=(xxˉ)2n1s^2 = \frac{\sum(x - \bar{x})^2}{n-1}
    • Where:
      • xx is each value in the data set.
      • $\bar{x}$ is the mean.
      • nn is the number of values in the data set.

Standard Deviation

  • Square root of the average squared deviation from the mean (square root of variance).
  • If not squared, data can cancel to zero, showing no variance.
  • Expressed in the same units as the original data.

Standard Deviation Types

  • s<em>xs<em>x and σ</em>x\sigma</em>x are standard deviations for samples and populations, respectively.

Percentiles and Relative Location

  • The pthp^{th} percentile of a distribution is the value with pp percent of the observations less than or equal to it.
    • Example: If Jenny is at the 88th percentile in a class's test score, it means that 88% of scores are below or equal to Jenny's score.

Cumulative Relative Frequency Graph

  • Displays the cumulative relative frequency of each class of a frequency distribution.

Z-Scores

  • How many standard deviations from the mean an observation falls and in what direction.
  • z=xμσz = \frac{x - \mu}{\sigma}
    • xx is the observed value.
    • μ\mu is the mean.
    • σ\sigma is the standard deviation.
  • Units of measure for a z-score are standard deviations (SD above or below the mean).

Transformations

  • Adding or subtracting a constant:
    • Affects measures of center and location (mean, median, quartiles, percentiles).
    • Does not change shape and measures of spread (range, IQR, SD).
  • Multiplying or dividing each observation by the same number:
    • Affects measures of center and location.
    • Affects measures of spread.
    • Does not change the shape of the distribution.

Percentiles

  • Measure of relative location, described as "at" a certain percentile, not "in".

Standardized Score

  • Converting to a comparable number.
  • Number of standard deviations above/below the mean.

Standard Normal Distribution

  • Mean: 0
  • SD: 1

Transforming Data

  • Adding/subtracting a constant aa:
    • Shape: Unchanged
    • Center: +a+ a
    • Spread: Unchanged
  • Multiplying/dividing by a constant bb:
    • Shape: Unchanged
    • Center: ×/÷b\times / \div b
    • Spread: ×/÷b\times / \div b

Density Curves

  • Always on or above the horizontal axis.
  • Has area exactly 1 underneath it regardless of mean and SD.
  • Describes the overall pattern of a distribution.
  • The area under the curve and above any interval of values on the x-axis is the proportion of all observations that fall in the interval.
  • Can have different SD and mean from real distribution but it is closed (an idealized description).

Normal Distribution

  • Always symmetric, single-peaked, and bell-shaped.
  • Any specific normal curve is completely described by giving its mean μ\mu and standard deviation σ\sigma: N(μ,σ)N(\mu, \sigma)

The Empirical Rule (68-95-99.7 Rule)

  • 68% of data falls within 1 standard deviation of the mean.
  • 95% of data falls within 2 standard deviations of the mean.
  • 99.7% of data falls within 3 standard deviations of the mean.

Standard Normal Distribution

  • N(0,1)N(0, 1)

Calculations with Normal Distributions

  • To find percentile for a score:
    • Calculate the z-score.
    • Find the area to the left of the z-score on the standard normal table.

Assessing Normality

  • If a plot of scores against expected normal scores is a straight line, the data can be considered approximately normal.

Calculations and Z-Scores

  • Always draw a normal curve.
  • Make calculations and use z-scores to find the area out of the table.