chapter 3 Business Statistics: Communicating with Numbers, 5e - Summary Measures
Business Statistics: Communicating with Numbers, 5e
Learning Objectives
LO 3.1 Calculate and interpret measures of central location.
LO 3.2 Interpret a percentile and a boxplot.
LO 3.3 Calculate and interpret measures of dispersion.
LO 3.4 Explain mean-variance analysis and the Sharpe ratio.
LO 3.5 Apply Chebyshev’s theorem, the empirical rule, and z-scores.
LO 3.6 Calculate and interpret measures of association.
LO 3.7 Calculate and interpret a geometric mean return and a compound growth rate.
Introductory Case: Investment Decision
Context: Dorothy, a financial advisor, assists an inexperienced investor with questions about mutual fund investing.
Mutual Funds:
Fidelity’s Growth Index Mutual Fund (Growth)
Fidelity’s Value Index Mutual Fund (Value)
Annual Return Data:
1984: Growth: -5.50%, Value: -8.59%
1985: Growth: 39.91%, Value: 22.10%
2019: Growth: 38.42%, Value: 31.62%
Tasks for Dorothy:
Calculate and interpret the typical return for both mutual funds.
Calculate and interpret the investment risk for both mutual funds.
Determine which mutual fund provides the greater return relative to risk.
3.1 Measures of Central Location
Definition: Central location refers to the clustering of numerical data around a middle or central value.
Objective: To find a central value that describes the data.
Primary Measure: Arithmetic mean
Also known as the mean or average.
Calculation: Add all observations and divide by the number of observations.
3.1.1 Population Mean vs. Sample Mean
Notation Difference:
Sample Mean: Denoted as (ar{x}) (a statistic describing a sample).
Population Mean: Denoted as () (a parameter describing a population).
Outlier Influence: The mean can be misleading in the presence of outliers (extremely small or large observations).
3.1.2 Example: Salaries at Acetech
Salaries:
Administrative Assistant: 40,000
Research Assistant: 40,000
Data Analyst: 65,000
Senior Research Associate: 90,000
Senior Data Analyst: 100,000
Senior Sales Associate: 145,000
Chief Financial Officer: 150,000
President (and owner): 550,000
Population Mean Calculation:
Reflects average salary, but not typical due to outliers—6 of 8 earn less than the mean.
3.1.3 Median
Definition: The median is the middle value, dividing the data in half.
Calculation:
Odd number of observations: middle value.
Even number of observations: average of the two middle values.
Usefulness: Particularly helpful when outliers are present.
3.1.4 Example: Median Salary Calculation
Ordered Salaries:
40,000
40,000
65,000
90,000
100,000
145,000
150,000
550,000
Median Calculation:
Since there are 8 salaries, median = (90,000 + 100,000) / 2 = 95,000.
Comparison: Mean 147,500 vs. Median 95,000.
3.1.5 Mode
Definition: The mode is the value that appears most frequently.
Types:
Unimodal: one mode
Bimodal: two modes
Example: For salaries at Acetech, $40,000 is the mode, being the most common salary despite many employees earning considerably more.
3.1.6 Categorical Variables
Mode as the Only Measure: For categorical data like sweatshirt sizes (S, M, L):
Sizes:
S: 2
M: 3
L: 5
Modal Size: L (most frequent).
3.1.7 Descriptive Measures in Excel
Common Functions:
Mean:
=AVERAGE(array)Median:
=MEDIAN(array)Mode:
=MODE.MULT(array)Minimum:
=MIN(array)Maximum:
=MAX(array)Percentile:
=PERCENTILE.INC(array, p)
3.1.8 Example: Centrality Measures for Growth and Value Funds
Outputs from Excel:
Growth Fund:
Mean: 15.755
Standard Error: 3.966
Median: 15.245
Mode: N/A
Standard Deviation: 23.799
Sample Variance: 566.406
Kurtosis: 0.973
Skewness: -0.029
Range: 120.38
Minimum: -40.9
Maximum: 79.48
Count: 36
Value Fund:
Mean: 12.005
Standard Error: 2.997
Median: 15.38
Mode: N/A
Standard Deviation: 17.979
Sample Variance: 323.251
Kurtosis: 1.853
Skewness: -1.024
Range: 90.6
Minimum: -46.52
Maximum: 44.08
Count: 36
3.1.9 Weighted Mean
Definition: Takes into account the differing impacts of observations; computing the weighted average based on distinct weights.
Calculation: For weighted mean ( ar{x}w ): [ ar{x}w = \frac{\sum{wixi}}{\sum{wi}} ] where ( wi ) are the weights.
3.1.10 Example of Weighted Mean Calculation
Grades Example:
Exam 1: 60 (25% weight), Exam 2: 70 (25% weight), Exam 3: 80 (50% weight)
Weighted mean = (0.25 \times 60 + 0.25 \times 70 + 0.50 \times 80 = 72.50)
Unweighted Mean Comparison: Only considers equal contributions, yielding 70.
3.1.11 Distribution Symmetry
Types of Distribution:
Symmetric: Mean = Median = Mode
Positively Skewed: Mean > Median
Negatively Skewed: Mean < Median
Skewness Coefficient: Indicates skewness direction.
Zero: Symmetric
Positive: Right-skewed
Negative: Left-skewed
3.1.12 Subsetting Observations
Example: Mean spending categorized by sex at an online store: Female and Male average spending values calculated using Excel and R.
3.2 Percentiles and Boxplots
3.2.1 Percentiles
Definition: A percentile divides a variable into two parts, indicating the pth percentile where approximately p% of observations fall below it.
Quartile Breakdown:
25th percentile: Q1
50th percentile: Q2
75th percentile: Q3
Application: Best for larger datasets; methods might vary in results.
3.2.2 Five-Number Summary
Elements: Minimum, Q1, Median (Q2), Q3, Maximum.
Example Calculation for Growth and Value Funds:
Growth: Min -40.90, Q1 2.86, Median 15.25, Q3 36.97, Max 79.48
Value: Min -46.52, Q1 1.70, Median 15.38, Q3 22.44, Max 44.08
3.2.3 Boxplots
Definition: Graphical display of five-number summary.
Construction Steps:
a. Plot values on a horizontal axis
b. Draw a box from Q1 to Q3
c. Mark the median inside the box
d. Extend whiskers to minimum and maximum values up to 1.5*IQR.
e. Identify outliers with an asterisk.
3.2.4 Distribution Shape from Boxplot
Characteristics:
Symmetric: Median in center, balanced whiskers
Positively skewed: Median left of center with longer right whisker
Negatively skewed: Median right of center with longer left whisker
3.2.5 Example of Boxplot Construction
Using Excel or R: Instructions provided for creating boxplots with data for Growth and Value funds.
3.3 Measures of Dispersion
3.3.1 Definition and Importance
Purpose: Analyze variability of data points. Measures include range, interquartile range, variance, and standard deviation.
Range: Difference between maximum and minimum, calculated as Range = Max - Min but isn't exhaustive.
3.3.2 Interquartile Range (IQR)
Definition: Difference between third (Q3) and first (Q1) quartile, capturing middle 50% variability.
Benefit: Not influenced by extreme outliers.
3.3.3 Mean Absolute Deviation (MAD)
Definition: Average of absolute deviations from the mean
For sample observations: ( MAD = \frac{\sum{|x_i - \bar{x}|}}{n} )
3.3.4 Variance and Standard Deviation
Definition: Variance measures average of squared deviations from the mean.
Sample Formula: ( Vars = \frac{\sum{(xi - \bar{x})^2}}{n-1} )
Population Formula: ( Varp = \frac{\sum{(xi - \mu)^2}}{N} )
Standard Deviation: Positive square root of variance, bringing it back into original units of measure.
3.3.5 Example Measures of Dispersion for Growth and Value
Growth:
Range: 120.38
MAD: 17.491
Variance: 566.406
Standard Deviation: 23.799
Value:
Range: 90.6
MAD: 13.667
Variance: 323.251
Standard Deviation: 17.979
3.3.6 Coefficient of Variation (CV)
Definition: Measure of relative dispersion that adjusts for forecasted mean variations, making it unitless and comparable across different datasets.
Sample and Population Formulas:
Sample ( CV_s = \frac{SD}{\bar{x}} )
Population ( CV_p = \frac{σ}{μ} )
3.3.7 Example CV Calculation for Growth and Value
Growth CV: Example numeric assessment and interpretation relative to Value's CV, which is similar indicating similar relative dispersion.
3.4 Mean-Variance Analysis and the Sharpe Ratio
3.4.1 Investment Analysis Overview
Context: Analysis of investments (stocks, bonds, mutual funds).
Average Return: Represents investor rewards, whereas variance and standard deviation correlate to risk.
Mean-Variance Analysis Postulate: Performance is measured by the relationship between reward (mean) and risk (variance).
3.4.2 Sharpe Ratio
Definition: Measures reward relative to risk. Characterizes how well additional returns compensate for the risk undertaken.
Calculation:
On its formulation, let ( Rf ) denote the risk-free return. [ Sharpe Ratio = \frac{Rp - Rf}{\sigmap} ] where ( Rp ) is the asset return and ( \sigmap ) is the asset's standard deviation.
3.4.3 Example: Sharpe Ratio for Growth and Value Funds
Growth Fund:
Mean Return: 15.755
Standard Deviation: 23.799
Value Fund:
Mean Return: 12.005
Standard Deviation: 17.979
Effects of Comparison: Growth shows slightly higher Sharpe ratio indicating a better reward-to-risk compensation than Value.
3.5 Analysis of Relative Location
3.5.1 Understanding Standard Deviation and Relative Location
A low standard deviation indicates observations are trending closely to the mean, while high indicates greater spread.
Chebyshev’s Theorem:
States that at least ( \frac{1}{k^2} ) of the observations lie within k standard deviations of the mean:
For k=2: At least 75% within (x \pm 2s)
For k=3: At least 89% within (x \pm 3s)
3.5.2 Example of Chebyshev’s Theorem Application
Class Context: 280 students, mean score of 74, SD of 8; calculations for students scoring within 58 and 90.
At least 75% scored between this range, leading to an estimate of 210 students falling within.
3.5.3 Using the Empirical Rule
Applicable to symmetric, bell-shaped distributions; provides estimates on percentage observations within standard deviations.
68% within 1 standard deviation
95% within 2 standard deviations
99% within 3 standard deviations
3.5.4 Example of Empirical Rule Calculation
Utilizing the class score data again to ascertain how many scored beyond 90. It infers approx. 2.5% in that bracket due to the distribution's symmetry.
3.5.5 Z-Scores
Definition: Z-score measures how many standard deviations an observation is from the mean.
Application: Detects outliers, observations falling beyond a z-score of 3 or -3 merit further review.
3.6 Measures of Association
3.6.1 Assessing Relationships with Covariance
Covariance:
Measures direction of linear relationships:
Sample: ( Cov(x,y) = \frac{1}{n-1} \sum{(xi - \bar{x})(yi - \bar{y})} )
Population: ( Cov(x,y) = \frac{1}{N} \sum{(xi - \mux)(yi - \muy)} )
3.6.2 Correlation Coefficient
Definition: Describes both direction and strength of linear relationships.
Sample: ( rs = \frac{Cov(x,y)}{SDx \cdot SD_y} )
Range: ([-1, 1])
Interpretations:
-1: Perfect negative correlation
0: No correlation
1: Perfect positive correlation
3.7 The Geometric Mean
3.7.1 Differences from Arithmetic Mean
Arithmetic Mean: Suitable for one-year investment but adds using addition.
Geometric Mean:
Multiplicative average; less sensitive to outliers.
Formula for n multiperiod returns:
[ GM = (R1 imes R2 imes … imes R_n)^{\frac{1}{n}} ]
3.7.2 Example Calculation of Geometric Mean Return
Calculated growth context with initial investment and varying rates:
Year 1: 10% -> Value: 1,100
Year 2: -10% ->
Overall Geometric Mean provides annualized return measures.
Final Result interpretation leading to a negative return.
3.7.3 Compound Growth Rate Calculation
Example: Sales growth for five years for a multinational corporation, 6.15% calculated.
Exam Items
Ability to calculate central tendency measures: mean, mode, median.
Ability to calculate variability measures: variance and standard deviation.
Proficiency in formulas for population and sample data.
Understanding coefficient of variation.
Knowledge of Sharpe ratio.
Familiarity with the empirical rule and its applications.
Differences in arithmetic and geometric returns, with calculations.