Data Analysis Notes

Quantitative Data Analysis

Chapter 1, 2, 3, and 12

Dotplot, Stemplot, Boxplot, Histogram
  • When discussing the shape of data, use "approximately symmetrical" instead of "approximately normal."
Numerical Summary
  • Includes minimum, Q<em>1Q<em>1, Q</em>2Q</em>2 (median), Q3Q_3, and maximum values.
Outlier Rule
  • Any number less than Q11.5ImesIQRQ_1 - 1.5 Imes IQR is considered an outlier.
  • Any number greater than Q3+1.5ImesIQRQ_3 + 1.5 Imes IQR is considered an outlier.
Center
  • Mean
  • Median
Spread
  • Standard deviation
  • Variance
  • IQR (Interquartile Range)
  • Range
  • Variance=(StandardDeviation)2Variance = (Standard Deviation)^2
Skewness
  • Right Skew: The right tail is longer, and the mean is pulled towards the right.
  • Left Skew: The left tail is longer, and the mean is pulled towards the left.

Distributions

  • For symmetrical distributions, use mean and standard deviation.
  • For skewed distributions, use median and IQR.

Bar Graphs

  • Use percentages, especially when comparing categorical data.
  • When comparing graphs, use comparison words and SOCS (Shape, Outliers, Center, Spread).

Transformations of Data

  • Adding or subtracting a constant to the data set will not change measures of spread.
  • Multiplying or dividing by a constant will change all measures.

Median

  • The median is resistant to extreme observations (it doesn't change when the end values change).

Graphing

  • When creating a graph, include a label and a scale.

Percentile Graphs

  • Also known as cumulative relative frequency graphs or ogives (Chapter 2).

Scatterplots

  • Explanatory variable on the x-axis.
  • Response variable on the y-axis.
  • Use DOFS (Direction, Outliers, Form, Strength) to describe scatterplots.

Correlation

  • Represented by "r."
  • Read computer output and describe properties of r.

Computer Output

  • Slope
  • y-intercept
  • s (standard deviation of residuals)

Interpretation

  • Interpret slope, y-intercept, r², r, and residuals.
  • Write the equation from computer output: y^=a+bx\hat{y} = a + bx
  • Calculate predicted values and residuals.

Extrapolation

Transformations for Non-Linear to Linear Forms

  • Only yy is changed to logy\log y or  Iny\ In y: The original model is exponential.
  • Both xx and yy are changed to logx\log x /  Inx\ In x and logy\log y /  Iny\ In y: The original model is a power model.

Logarithmic Transformations

  • For logy\log y, the base is 10.
    • If logy^=35\log \hat{y} = 35, then y^=1035\hat{y} = 10^{35}.
  • For  Iny\ In y, the base is ee.
    • If  Iny^=35\ In \hat{y} = 35, then y^=e35\hat{y} = e^{35}.

Residual Plots

  • A pattern in the residual plot indicates the model is not linear.

Scatter Plot Analysis

  • When you see a curve in the scatter plot, the model is not linear.