Definition: Measure of how much the individual data points differ from the mean.
Standard Deviation:
Definition: Square root of variance.
Formulas:
Sample variance notation: $s^2$
Population variance notation: $C3^2$ (sigma squared)
Sample standard deviation notation: $s$
Population standard deviation notation: $C3$
Calculation Procedures
Variance formulas:
For population: ext{Variance} = rac{ ext{sum of } (x - BC)^2}{N}
For sample: ext{Variance} = rac{ ext{sum of } (x - ar{x})^2}{n - 1}
Explanation on degrees of freedom: Using $n-1$ in sample variance accounts for sample size limitations.
Box Plots
Definition: Visual representation emphasizing five-number summary (minimum, first quartile Q1, median Q2, third quartile Q3, maximum).
Quartiles:
Quartile 1 (Q1): 25th percentile
Quartile 2 (Q2): Median
Quartile 3 (Q3): 75th percentile
Interquartile Range (IQR): ext{IQR} = Q3 - Q1
Outlier Representation: Any data points that fall outside 1.5 times the IQR from Q1 and Q3 are considered outliers.
Z Scores and Normal Distribution
Normal Distribution: Bell-shaped curve where 100% of the data lies under the curve.
Properties:
Symmetrical with mean at the center.
Asymptotic: Curve approaches the horizontal axis but never touches it.
Z Score Calculation:
Formula: Z = rac{(X - ext{mean})}{ ext{standard deviation}}
Explanation: Z score represents the number of standard deviations a data point is from the mean.
Empirical Rule:
Approximately 68% of values lie within one standard deviation of the mean, 95% within two, and 99.7% within three.
Assignments and Projects
Instructions on upcoming assignments and expectations.
Emphasis on projects utilizing quantitative data, focused on mean and standard deviation, and avoiding qualitative data unless for regression analysis.
Summary & Key Definitions
Sample vs. Population
Measures of Central Tendency: Mean, Median, Mode
Measures of Dispersion: Range, Variance, Standard Deviation