Lecture_3_1_
Definition: Measures of location provide information about the central point of data.
Key Measures:
Mean: Average of data values; calculated as x̄ = ∑(xi) / n. The sample mean is a point estimator of the population mean.
Example: Average rent from a sample of 70 efficiency apartments results in x̄ = 590.8.
Median: Middle value when data is sorted; preferred in skewed distributions or datasets with outliers.
Example with an odd set of values: {12, 14, 18, 19, 26, 27, 27}, where the median is 19.
Example with an even set of values: Average of two middle values gives a median of 22.5.
Mode: The most frequent value; datasets can be unimodal, bimodal, or multimodal.
Weighted Mean: Mean computed by assigning weights; relevant in contexts like GPA.
Formula: x̄ = ∑(wi×xi) / ∑(wi).
Example: Weighted average wage calculation results in $20.05.
Geometric Mean: Useful for growth rates; calculated using the nth root of the product of n values.
Example: For growth factors resulting in an average growth rate of -2.2%.
Percentiles: Describe the relative position; pth percentile is a value at which p% of data is below.
Example: Calculation of the 80th percentile in rent data yields a value of 646.2.
Quartiles: Specific percentiles defining data distribution.
1st quartile = 25th percentile; 2nd quartile = median; 3rd quartile = 75th percentile.
Definition: Measures variability (dispersion) indicate how much the data spread or differ.
Key Measures:
Range: Difference between the maximum and minimum values; simplest measure but sensitive to extremes.
Interquartile Range (IQR): Difference between Q3 and Q1; it represents the middle 50% of data.
Variance: Mean of squared deviations from the mean; formula varies for population and sample.
Population Variance: σ² = ∑(xi−µ)² / N.
Sample Variance: s² = ∑(xi−x̄)² / (n−1).
Standard Deviation: Square root of variance, more interpretable due to same units.
Population and Sample forms: σ = √σ², s = √s².
Coefficient of Variation (CV): Indicates the ratio of the standard deviation to mean; expressed as a percentage.
Population CV: (σ / µ) × 100%; Sample CV: (s / x̄) × 100%.
Skewness: Indicates asymmetry of distribution; calculated using specific skewness formulas.
z-Scores: Standardized value indicating the number of standard deviations a data point is from the mean. - Formula: zi = (xi − x̄) / s; provides means to compare individual observations' positions in a dataset. - Example: z-score calculated for smallest rent yielding -1.2 indicates below-average rent.
Covariance: Measures how two variables change together; positive implies a direct relationship, while negative indicates inverse. - Population Formula: σxy = ∑(xi−µx)(yi−µy) / N; Sample Formula: sxy = ∑(xi−x̄)(yi−ȳ) / (n−1).
Correlation Coefficient: Normalizes covariance, showing strength of linear relationship without implying causation. - Correlation varies between -1 and +1; closer to -1 indicates strong negative, close to +1 indicates strong positive. - Calculation: rxy = sxy / (sx × sy). - Example: Relationship between golf driving distance and scores shows strong negative correlation with calculated values.
Definition: Measures of location provide information about the central point of data.
Key Measures:
Mean: Average of data values; calculated as x̄ = ∑(xi) / n. The sample mean is a point estimator of the population mean.
Example: Average rent from a sample of 70 efficiency apartments results in x̄ = 590.8.
Median: Middle value when data is sorted; preferred in skewed distributions or datasets with outliers.
Example with an odd set of values: {12, 14, 18, 19, 26, 27, 27}, where the median is 19.
Example with an even set of values: Average of two middle values gives a median of 22.5.
Mode: The most frequent value; datasets can be unimodal, bimodal, or multimodal.
Weighted Mean: Mean computed by assigning weights; relevant in contexts like GPA.
Formula: x̄ = ∑(wi×xi) / ∑(wi).
Example: Weighted average wage calculation results in $20.05.
Geometric Mean: Useful for growth rates; calculated using the nth root of the product of n values.
Example: For growth factors resulting in an average growth rate of -2.2%.
Percentiles: Describe the relative position; pth percentile is a value at which p% of data is below.
Example: Calculation of the 80th percentile in rent data yields a value of 646.2.
Quartiles: Specific percentiles defining data distribution.
1st quartile = 25th percentile; 2nd quartile = median; 3rd quartile = 75th percentile.
Definition: Measures variability (dispersion) indicate how much the data spread or differ.
Key Measures:
Range: Difference between the maximum and minimum values; simplest measure but sensitive to extremes.
Interquartile Range (IQR): Difference between Q3 and Q1; it represents the middle 50% of data.
Variance: Mean of squared deviations from the mean; formula varies for population and sample.
Population Variance: σ² = ∑(xi−µ)² / N.
Sample Variance: s² = ∑(xi−x̄)² / (n−1).
Standard Deviation: Square root of variance, more interpretable due to same units.
Population and Sample forms: σ = √σ², s = √s².
Coefficient of Variation (CV): Indicates the ratio of the standard deviation to mean; expressed as a percentage.
Population CV: (σ / µ) × 100%; Sample CV: (s / x̄) × 100%.
Skewness: Indicates asymmetry of distribution; calculated using specific skewness formulas.
z-Scores: Standardized value indicating the number of standard deviations a data point is from the mean. - Formula: zi = (xi − x̄) / s; provides means to compare individual observations' positions in a dataset. - Example: z-score calculated for smallest rent yielding -1.2 indicates below-average rent.
Covariance: Measures how two variables change together; positive implies a direct relationship, while negative indicates inverse. - Population Formula: σxy = ∑(xi−µx)(yi−µy) / N; Sample Formula: sxy = ∑(xi−x̄)(yi−ȳ) / (n−1).
Correlation Coefficient: Normalizes covariance, showing strength of linear relationship without implying causation. - Correlation varies between -1 and +1; closer to -1 indicates strong negative, close to +1 indicates strong positive. - Calculation: rxy = sxy / (sx × sy). - Example: Relationship between golf driving distance and scores shows strong negative correlation with calculated values.