Data Analysis Notes
Quantitative Data Analysis
Chapter 1, 2, 3, and 12
Dotplot, Stemplot, Boxplot, Histogram
- When discussing the shape of data, use "approximately symmetrical" instead of "approximately normal."
Numerical Summary
- Includes minimum, Q<em>1, Q</em>2 (median), Q3, and maximum values.
Outlier Rule
- Any number less than Q1−1.5ImesIQR is considered an outlier.
- Any number greater than Q3+1.5ImesIQR is considered an outlier.
Center
Spread
- Standard deviation
- Variance
- IQR (Interquartile Range)
- Range
- Variance=(StandardDeviation)2
Skewness
- Right Skew: The right tail is longer, and the mean is pulled towards the right.
- Left Skew: The left tail is longer, and the mean is pulled towards the left.
Distributions
- For symmetrical distributions, use mean and standard deviation.
- For skewed distributions, use median and IQR.
Bar Graphs
- Use percentages, especially when comparing categorical data.
- When comparing graphs, use comparison words and SOCS (Shape, Outliers, Center, Spread).
- Adding or subtracting a constant to the data set will not change measures of spread.
- Multiplying or dividing by a constant will change all measures.
- The median is resistant to extreme observations (it doesn't change when the end values change).
Graphing
- When creating a graph, include a label and a scale.
Percentile Graphs
- Also known as cumulative relative frequency graphs or ogives (Chapter 2).
Scatterplots
- Explanatory variable on the x-axis.
- Response variable on the y-axis.
- Use DOFS (Direction, Outliers, Form, Strength) to describe scatterplots.
Correlation
- Represented by "r."
- Read computer output and describe properties of r.
Computer Output
- Slope
- y-intercept
- r²
- s (standard deviation of residuals)
Interpretation
- Interpret slope, y-intercept, r², r, and residuals.
- Write the equation from computer output: y^=a+bx
- Calculate predicted values and residuals.
- Only y is changed to logy or Iny: The original model is exponential.
- Both x and y are changed to logx / Inx and logy / Iny: The original model is a power model.
- For logy, the base is 10.
- If logy^=35, then y^=1035.
- For Iny, the base is e.
- If Iny^=35, then y^=e35.
Residual Plots
- A pattern in the residual plot indicates the model is not linear.
Scatter Plot Analysis
- When you see a curve in the scatter plot, the model is not linear.