Lecture_3_1_

Chapter 3: Descriptive Statistics: Numerical Measures

Definition: Measures of location provide information about the central point of data.
Key Measures:
- Mean: Average of data values; calculated as x̄ = ∑(xi) / n. The sample mean is a point estimator of the population mean.
  - Example: Average rent from a sample of 70 efficiency apartments results in x̄ = 590.8.
- Median: Middle value when data is sorted; preferred in skewed distributions or datasets with outliers.
  - Example with an odd set of values: {12, 14, 18, 19, 26, 27, 27}, where the median is 19.
  - Example with an even set of values: Average of two middle values gives a median of 22.5.
- Mode: The most frequent value; datasets can be unimodal, bimodal, or multimodal.
- Weighted Mean: Mean computed by assigning weights; relevant in contexts like GPA.
  - Formula: x̄ = ∑(wi×xi) / ∑(wi).
  - Example: Weighted average wage calculation results in $20.05.
- Geometric Mean: Useful for growth rates; calculated using the nth root of the product of n values.
  - Example: For growth factors resulting in an average growth rate of -2.2%.
- Percentiles: Describe the relative position; pth percentile is a value at which p% of data is below.
  - Example: Calculation of the 80th percentile in rent data yields a value of 646.2.
- Quartiles: Specific percentiles defining data distribution.
  - 1st quartile = 25th percentile; 2nd quartile = median; 3rd quartile = 75th percentile.

Definition: Measures variability (dispersion) indicate how much the data spread or differ.
Key Measures:
- Range: Difference between the maximum and minimum values; simplest measure but sensitive to extremes.
- Interquartile Range (IQR): Difference between Q3 and Q1; it represents the middle 50% of data.
- Variance: Mean of squared deviations from the mean; formula varies for population and sample.
  - Population Variance: σ² = ∑(xi−µ)² / N.
  - Sample Variance: s² = ∑(xi−x̄)² / (n−1).
- Standard Deviation: Square root of variance, more interpretable due to same units.
  - Population and Sample forms: σ = √σ², s = √s².
- Coefficient of Variation (CV): Indicates the ratio of the standard deviation to mean; expressed as a percentage.
  - Population CV: (σ / µ) × 100%; Sample CV: (s / x̄) × 100%.

Skewness: Indicates asymmetry of distribution; calculated using specific skewness formulas.
z-Scores: Standardized value indicating the number of standard deviations a data point is from the mean. - Formula: zi = (xi − x̄) / s; provides means to compare individual observations' positions in a dataset. - Example: z-score calculated for smallest rent yielding -1.2 indicates below-average rent.

Covariance: Measures how two variables change together; positive implies a direct relationship, while negative indicates inverse. - Population Formula: σxy = ∑(xi−µx)(yi−µy) / N; Sample Formula: sxy = ∑(xi−x̄)(yi−ȳ) / (n−1).
Correlation Coefficient: Normalizes covariance, showing strength of linear relationship without implying causation. - Correlation varies between -1 and +1; closer to -1 indicates strong negative, close to +1 indicates strong positive. - Calculation: rxy = sxy / (sx × sy). - Example: Relationship between golf driving distance and scores shows strong negative correlation with calculated values.