chapter 3 Business Statistics: Communicating with Numbers, 5e - Summary Measures

Business Statistics: Communicating with Numbers, 5e

Learning Objectives

LO 3.1 Calculate and interpret measures of central location.
LO 3.2 Interpret a percentile and a boxplot.
LO 3.3 Calculate and interpret measures of dispersion.
LO 3.4 Explain mean-variance analysis and the Sharpe ratio.
LO 3.5 Apply Chebyshev’s theorem, the empirical rule, and z-scores.
LO 3.6 Calculate and interpret measures of association.
LO 3.7 Calculate and interpret a geometric mean return and a compound growth rate.

Introductory Case: Investment Decision

Context: Dorothy, a financial advisor, assists an inexperienced investor with questions about mutual fund investing.
Mutual Funds:
- Fidelity’s Growth Index Mutual Fund (Growth)
- Fidelity’s Value Index Mutual Fund (Value)
Annual Return Data:
- 1984: Growth: -5.50%, Value: -8.59%
- 1985: Growth: 39.91%, Value: 22.10%
- 2019: Growth: 38.42%, Value: 31.62%
Tasks for Dorothy:
1. Calculate and interpret the typical return for both mutual funds.
2. Calculate and interpret the investment risk for both mutual funds.
3. Determine which mutual fund provides the greater return relative to risk.

3.1 Measures of Central Location

Definition: Central location refers to the clustering of numerical data around a middle or central value.
Objective: To find a central value that describes the data.
Primary Measure: Arithmetic mean
- Also known as the mean or average.
- Calculation: Add all observations and divide by the number of observations.

3.1.1 Population Mean vs. Sample Mean

Notation Difference:
- Sample Mean: Denoted as (ar{x}) (a statistic describing a sample).
- Population Mean: Denoted as () (a parameter describing a population).
Outlier Influence: The mean can be misleading in the presence of outliers (extremely small or large observations).

3.1.2 Example: Salaries at Acetech

Salaries:
- Administrative Assistant: 40,000
- Research Assistant: 40,000
- Data Analyst: 65,000
- Senior Research Associate: 90,000
- Senior Data Analyst: 100,000
- Senior Sales Associate: 145,000
- Chief Financial Officer: 150,000
- President (and owner): 550,000
Population Mean Calculation:
- Reflects average salary, but not typical due to outliers—6 of 8 earn less than the mean.

3.1.3 Median

Definition: The median is the middle value, dividing the data in half.
- Calculation:
  - Odd number of observations: middle value.
  - Even number of observations: average of the two middle values.
Usefulness: Particularly helpful when outliers are present.

3.1.4 Example: Median Salary Calculation

Ordered Salaries:
1. 40,000
2. 40,000
3. 65,000
4. 90,000
5. 100,000
6. 145,000
7. 150,000
8. 550,000
Median Calculation:
- Since there are 8 salaries, median = (90,000 + 100,000) / 2 = 95,000.
- Comparison: Mean 147,500 vs. Median 95,000.

3.1.5 Mode

Definition: The mode is the value that appears most frequently.
- Types:
  - Unimodal: one mode
  - Bimodal: two modes
Example: For salaries at Acetech, $40,000 is the mode, being the most common salary despite many employees earning considerably more.

3.1.6 Categorical Variables

Mode as the Only Measure: For categorical data like sweatshirt sizes (S, M, L):
- Sizes:
  - S: 2
  - M: 3
  - L: 5
- Modal Size: L (most frequent).

3.1.7 Descriptive Measures in Excel

Common Functions:
- Mean: =AVERAGE(array)
- Median: =MEDIAN(array)
- Mode: =MODE.MULT(array)
- Minimum: =MIN(array)
- Maximum: =MAX(array)
- Percentile: =PERCENTILE.INC(array, p)

3.1.8 Example: Centrality Measures for Growth and Value Funds

Outputs from Excel:
- Growth Fund:
  - Mean: 15.755
  - Standard Error: 3.966
  - Median: 15.245
  - Mode: N/A
  - Standard Deviation: 23.799
  - Sample Variance: 566.406
  - Kurtosis: 0.973
  - Skewness: -0.029
  - Range: 120.38
  - Minimum: -40.9
  - Maximum: 79.48
  - Count: 36
- Value Fund:
  - Mean: 12.005
  - Standard Error: 2.997
  - Median: 15.38
  - Mode: N/A
  - Standard Deviation: 17.979
  - Sample Variance: 323.251
  - Kurtosis: 1.853
  - Skewness: -1.024
  - Range: 90.6
  - Minimum: -46.52
  - Maximum: 44.08
  - Count: 36

3.1.9 Weighted Mean

Definition: Takes into account the differing impacts of observations; computing the weighted average based on distinct weights.
Calculation: For weighted mean ( ar{x}w ): [ ar{x}w = \frac{\sum{wixi}}{\sum{wi}} ] where ( wi ) are the weights.

3.1.10 Example of Weighted Mean Calculation

Grades Example:
- Exam 1: 60 (25% weight), Exam 2: 70 (25% weight), Exam 3: 80 (50% weight)
- Weighted mean = (0.25 \times 60 + 0.25 \times 70 + 0.50 \times 80 = 72.50)
- Unweighted Mean Comparison: Only considers equal contributions, yielding 70.

3.1.11 Distribution Symmetry

Types of Distribution:
- Symmetric: Mean = Median = Mode
- Positively Skewed: Mean > Median
- Negatively Skewed: Mean < Median
Skewness Coefficient: Indicates skewness direction.
- Zero: Symmetric
- Positive: Right-skewed
- Negative: Left-skewed

3.1.12 Subsetting Observations

Example: Mean spending categorized by sex at an online store: Female and Male average spending values calculated using Excel and R.

3.2 Percentiles and Boxplots

3.2.1 Percentiles

Definition: A percentile divides a variable into two parts, indicating the pth percentile where approximately p% of observations fall below it.
Quartile Breakdown:
- 25th percentile: Q1
- 50th percentile: Q2
- 75th percentile: Q3
Application: Best for larger datasets; methods might vary in results.

3.2.2 Five-Number Summary

Elements: Minimum, Q1, Median (Q2), Q3, Maximum.
Example Calculation for Growth and Value Funds:
- Growth: Min -40.90, Q1 2.86, Median 15.25, Q3 36.97, Max 79.48
- Value: Min -46.52, Q1 1.70, Median 15.38, Q3 22.44, Max 44.08

3.2.3 Boxplots

Definition: Graphical display of five-number summary.
Construction Steps:
a. Plot values on a horizontal axis
b. Draw a box from Q1 to Q3
c. Mark the median inside the box
d. Extend whiskers to minimum and maximum values up to 1.5*IQR.
e. Identify outliers with an asterisk.

3.2.4 Distribution Shape from Boxplot

Characteristics:
- Symmetric: Median in center, balanced whiskers
- Positively skewed: Median left of center with longer right whisker
- Negatively skewed: Median right of center with longer left whisker

3.2.5 Example of Boxplot Construction

Using Excel or R: Instructions provided for creating boxplots with data for Growth and Value funds.

3.3 Measures of Dispersion

3.3.1 Definition and Importance

Purpose: Analyze variability of data points. Measures include range, interquartile range, variance, and standard deviation.
Range: Difference between maximum and minimum, calculated as Range = Max - Min but isn't exhaustive.

3.3.2 Interquartile Range (IQR)

Definition: Difference between third (Q3) and first (Q1) quartile, capturing middle 50% variability.
Benefit: Not influenced by extreme outliers.

3.3.3 Mean Absolute Deviation (MAD)

Definition: Average of absolute deviations from the mean
- For sample observations: ( MAD = \frac{\sum{|x_i - \bar{x}|}}{n} )

3.3.4 Variance and Standard Deviation

Definition: Variance measures average of squared deviations from the mean.
- Sample Formula: ( Vars = \frac{\sum{(xi - \bar{x})^2}}{n-1} )
- Population Formula: ( Varp = \frac{\sum{(xi - \mu)^2}}{N} )
Standard Deviation: Positive square root of variance, bringing it back into original units of measure.

3.3.5 Example Measures of Dispersion for Growth and Value

Growth:
- Range: 120.38
- MAD: 17.491
- Variance: 566.406
- Standard Deviation: 23.799
Value:
- Range: 90.6
- MAD: 13.667
- Variance: 323.251
- Standard Deviation: 17.979

3.3.6 Coefficient of Variation (CV)

Definition: Measure of relative dispersion that adjusts for forecasted mean variations, making it unitless and comparable across different datasets.
Sample and Population Formulas:
- Sample ( CV_s = \frac{SD}{\bar{x}} )
- Population ( CV_p = \frac{σ}{μ} )

3.3.7 Example CV Calculation for Growth and Value

Growth CV: Example numeric assessment and interpretation relative to Value's CV, which is similar indicating similar relative dispersion.

3.4 Mean-Variance Analysis and the Sharpe Ratio

3.4.1 Investment Analysis Overview

Context: Analysis of investments (stocks, bonds, mutual funds).
Average Return: Represents investor rewards, whereas variance and standard deviation correlate to risk.
Mean-Variance Analysis Postulate: Performance is measured by the relationship between reward (mean) and risk (variance).

3.4.2 Sharpe Ratio

Definition: Measures reward relative to risk. Characterizes how well additional returns compensate for the risk undertaken.
Calculation:
On its formulation, let ( Rf ) denote the risk-free return. [ Sharpe Ratio = \frac{Rp - Rf}{\sigmap} ] where ( Rp ) is the asset return and ( \sigmap ) is the asset's standard deviation.

3.4.3 Example: Sharpe Ratio for Growth and Value Funds

Growth Fund:
- Mean Return: 15.755
- Standard Deviation: 23.799
Value Fund:
- Mean Return: 12.005
- Standard Deviation: 17.979
Effects of Comparison: Growth shows slightly higher Sharpe ratio indicating a better reward-to-risk compensation than Value.

3.5 Analysis of Relative Location

3.5.1 Understanding Standard Deviation and Relative Location

A low standard deviation indicates observations are trending closely to the mean, while high indicates greater spread.
Chebyshev’s Theorem:
- States that at least ( \frac{1}{k^2} ) of the observations lie within k standard deviations of the mean:
- For k=2: At least 75% within (x \pm 2s)
- For k=3: At least 89% within (x \pm 3s)

3.5.2 Example of Chebyshev’s Theorem Application

Class Context: 280 students, mean score of 74, SD of 8; calculations for students scoring within 58 and 90.
- At least 75% scored between this range, leading to an estimate of 210 students falling within.

3.5.3 Using the Empirical Rule

Applicable to symmetric, bell-shaped distributions; provides estimates on percentage observations within standard deviations.
- 68% within 1 standard deviation
- 95% within 2 standard deviations
- 99% within 3 standard deviations

3.5.4 Example of Empirical Rule Calculation

Utilizing the class score data again to ascertain how many scored beyond 90. It infers approx. 2.5% in that bracket due to the distribution's symmetry.

3.5.5 Z-Scores

Definition: Z-score measures how many standard deviations an observation is from the mean.
Application: Detects outliers, observations falling beyond a z-score of 3 or -3 merit further review.

3.6 Measures of Association

3.6.1 Assessing Relationships with Covariance

Covariance:
- Measures direction of linear relationships:
- Sample: ( Cov(x,y) = \frac{1}{n-1} \sum{(xi - \bar{x})(yi - \bar{y})} )
- Population: ( Cov(x,y) = \frac{1}{N} \sum{(xi - \mux)(yi - \muy)} )

3.6.2 Correlation Coefficient

Definition: Describes both direction and strength of linear relationships.
- Sample: ( rs = \frac{Cov(x,y)}{SDx \cdot SD_y} )
- Range: ([-1, 1])
- Interpretations:
  - -1: Perfect negative correlation
  - 0: No correlation
  - 1: Perfect positive correlation

3.7 The Geometric Mean

3.7.1 Differences from Arithmetic Mean

Arithmetic Mean: Suitable for one-year investment but adds using addition.
Geometric Mean:
- Multiplicative average; less sensitive to outliers.
- Formula for n multiperiod returns:
  [ GM = (R1 imes R2 imes … imes R_n)^{\frac{1}{n}} ]

3.7.2 Example Calculation of Geometric Mean Return

Calculated growth context with initial investment and varying rates:
- Year 1: 10% -> Value: 1,100
- Year 2: -10% ->
- Overall Geometric Mean provides annualized return measures.
- Final Result interpretation leading to a negative return.

3.7.3 Compound Growth Rate Calculation

Example: Sales growth for five years for a multinational corporation, 6.15% calculated.

Exam Items

Ability to calculate central tendency measures: mean, mode, median.
Ability to calculate variability measures: variance and standard deviation.
Proficiency in formulas for population and sample data.
Understanding coefficient of variation.
Knowledge of Sharpe ratio.
Familiarity with the empirical rule and its applications.
Differences in arithmetic and geometric returns, with calculations.