chapter 3 Business Statistics: Communicating with Numbers, 5e - Summary Measures

Business Statistics: Communicating with Numbers, 5e

Learning Objectives

  • LO 3.1 Calculate and interpret measures of central location.

  • LO 3.2 Interpret a percentile and a boxplot.

  • LO 3.3 Calculate and interpret measures of dispersion.

  • LO 3.4 Explain mean-variance analysis and the Sharpe ratio.

  • LO 3.5 Apply Chebyshev’s theorem, the empirical rule, and z-scores.

  • LO 3.6 Calculate and interpret measures of association.

  • LO 3.7 Calculate and interpret a geometric mean return and a compound growth rate.

Introductory Case: Investment Decision

  • Context: Dorothy, a financial advisor, assists an inexperienced investor with questions about mutual fund investing.

  • Mutual Funds:

    • Fidelity’s Growth Index Mutual Fund (Growth)

    • Fidelity’s Value Index Mutual Fund (Value)

  • Annual Return Data:

    • 1984: Growth: -5.50%, Value: -8.59%

    • 1985: Growth: 39.91%, Value: 22.10%

    • 2019: Growth: 38.42%, Value: 31.62%

  • Tasks for Dorothy:

    1. Calculate and interpret the typical return for both mutual funds.

    2. Calculate and interpret the investment risk for both mutual funds.

    3. Determine which mutual fund provides the greater return relative to risk.

3.1 Measures of Central Location

  • Definition: Central location refers to the clustering of numerical data around a middle or central value.

  • Objective: To find a central value that describes the data.

  • Primary Measure: Arithmetic mean

    • Also known as the mean or average.

    • Calculation: Add all observations and divide by the number of observations.

3.1.1 Population Mean vs. Sample Mean

  • Notation Difference:

    • Sample Mean: Denoted as (ar{x}) (a statistic describing a sample).

    • Population Mean: Denoted as () (a parameter describing a population).

  • Outlier Influence: The mean can be misleading in the presence of outliers (extremely small or large observations).

3.1.2 Example: Salaries at Acetech

  • Salaries:

    • Administrative Assistant: 40,000

    • Research Assistant: 40,000

    • Data Analyst: 65,000

    • Senior Research Associate: 90,000

    • Senior Data Analyst: 100,000

    • Senior Sales Associate: 145,000

    • Chief Financial Officer: 150,000

    • President (and owner): 550,000

  • Population Mean Calculation:

    • Reflects average salary, but not typical due to outliers—6 of 8 earn less than the mean.

3.1.3 Median

  • Definition: The median is the middle value, dividing the data in half.

    • Calculation:

      • Odd number of observations: middle value.

      • Even number of observations: average of the two middle values.

  • Usefulness: Particularly helpful when outliers are present.

3.1.4 Example: Median Salary Calculation

  • Ordered Salaries:

    1. 40,000

    2. 40,000

    3. 65,000

    4. 90,000

    5. 100,000

    6. 145,000

    7. 150,000

    8. 550,000

  • Median Calculation:

    • Since there are 8 salaries, median = (90,000 + 100,000) / 2 = 95,000.

    • Comparison: Mean 147,500 vs. Median 95,000.

3.1.5 Mode

  • Definition: The mode is the value that appears most frequently.

    • Types:

      • Unimodal: one mode

      • Bimodal: two modes

  • Example: For salaries at Acetech, $40,000 is the mode, being the most common salary despite many employees earning considerably more.

3.1.6 Categorical Variables

  • Mode as the Only Measure: For categorical data like sweatshirt sizes (S, M, L):

    • Sizes:

      • S: 2

      • M: 3

      • L: 5

    • Modal Size: L (most frequent).

3.1.7 Descriptive Measures in Excel

  • Common Functions:

    • Mean: =AVERAGE(array)

    • Median: =MEDIAN(array)

    • Mode: =MODE.MULT(array)

    • Minimum: =MIN(array)

    • Maximum: =MAX(array)

    • Percentile: =PERCENTILE.INC(array, p)

3.1.8 Example: Centrality Measures for Growth and Value Funds

  • Outputs from Excel:

    • Growth Fund:

      • Mean: 15.755

      • Standard Error: 3.966

      • Median: 15.245

      • Mode: N/A

      • Standard Deviation: 23.799

      • Sample Variance: 566.406

      • Kurtosis: 0.973

      • Skewness: -0.029

      • Range: 120.38

      • Minimum: -40.9

      • Maximum: 79.48

      • Count: 36

    • Value Fund:

      • Mean: 12.005

      • Standard Error: 2.997

      • Median: 15.38

      • Mode: N/A

      • Standard Deviation: 17.979

      • Sample Variance: 323.251

      • Kurtosis: 1.853

      • Skewness: -1.024

      • Range: 90.6

      • Minimum: -46.52

      • Maximum: 44.08

      • Count: 36

3.1.9 Weighted Mean

  • Definition: Takes into account the differing impacts of observations; computing the weighted average based on distinct weights.

  • Calculation: For weighted mean ( ar{x}w ): [ ar{x}w = \frac{\sum{wixi}}{\sum{wi}} ] where ( wi ) are the weights.

3.1.10 Example of Weighted Mean Calculation

  • Grades Example:

    • Exam 1: 60 (25% weight), Exam 2: 70 (25% weight), Exam 3: 80 (50% weight)

    • Weighted mean = (0.25 \times 60 + 0.25 \times 70 + 0.50 \times 80 = 72.50)

    • Unweighted Mean Comparison: Only considers equal contributions, yielding 70.

3.1.11 Distribution Symmetry

  • Types of Distribution:

    • Symmetric: Mean = Median = Mode

    • Positively Skewed: Mean > Median

    • Negatively Skewed: Mean < Median

  • Skewness Coefficient: Indicates skewness direction.

    • Zero: Symmetric

    • Positive: Right-skewed

    • Negative: Left-skewed

3.1.12 Subsetting Observations

  • Example: Mean spending categorized by sex at an online store: Female and Male average spending values calculated using Excel and R.

3.2 Percentiles and Boxplots

3.2.1 Percentiles

  • Definition: A percentile divides a variable into two parts, indicating the pth percentile where approximately p% of observations fall below it.

  • Quartile Breakdown:

    • 25th percentile: Q1

    • 50th percentile: Q2

    • 75th percentile: Q3

  • Application: Best for larger datasets; methods might vary in results.

3.2.2 Five-Number Summary

  • Elements: Minimum, Q1, Median (Q2), Q3, Maximum.

  • Example Calculation for Growth and Value Funds:

    • Growth: Min -40.90, Q1 2.86, Median 15.25, Q3 36.97, Max 79.48

    • Value: Min -46.52, Q1 1.70, Median 15.38, Q3 22.44, Max 44.08

3.2.3 Boxplots

  • Definition: Graphical display of five-number summary.

  • Construction Steps:
    a. Plot values on a horizontal axis
    b. Draw a box from Q1 to Q3
    c. Mark the median inside the box
    d. Extend whiskers to minimum and maximum values up to 1.5*IQR.
    e. Identify outliers with an asterisk.

3.2.4 Distribution Shape from Boxplot

  • Characteristics:

    • Symmetric: Median in center, balanced whiskers

    • Positively skewed: Median left of center with longer right whisker

    • Negatively skewed: Median right of center with longer left whisker

3.2.5 Example of Boxplot Construction

  • Using Excel or R: Instructions provided for creating boxplots with data for Growth and Value funds.

3.3 Measures of Dispersion

3.3.1 Definition and Importance

  • Purpose: Analyze variability of data points. Measures include range, interquartile range, variance, and standard deviation.

  • Range: Difference between maximum and minimum, calculated as Range = Max - Min but isn't exhaustive.

3.3.2 Interquartile Range (IQR)

  • Definition: Difference between third (Q3) and first (Q1) quartile, capturing middle 50% variability.

  • Benefit: Not influenced by extreme outliers.

3.3.3 Mean Absolute Deviation (MAD)

  • Definition: Average of absolute deviations from the mean

    • For sample observations: ( MAD = \frac{\sum{|x_i - \bar{x}|}}{n} )

3.3.4 Variance and Standard Deviation

  • Definition: Variance measures average of squared deviations from the mean.

    • Sample Formula: ( Vars = \frac{\sum{(xi - \bar{x})^2}}{n-1} )

    • Population Formula: ( Varp = \frac{\sum{(xi - \mu)^2}}{N} )

  • Standard Deviation: Positive square root of variance, bringing it back into original units of measure.

3.3.5 Example Measures of Dispersion for Growth and Value

  • Growth:

    • Range: 120.38

    • MAD: 17.491

    • Variance: 566.406

    • Standard Deviation: 23.799

  • Value:

    • Range: 90.6

    • MAD: 13.667

    • Variance: 323.251

    • Standard Deviation: 17.979

3.3.6 Coefficient of Variation (CV)

  • Definition: Measure of relative dispersion that adjusts for forecasted mean variations, making it unitless and comparable across different datasets.

  • Sample and Population Formulas:

    • Sample ( CV_s = \frac{SD}{\bar{x}} )

    • Population ( CV_p = \frac{σ}{μ} )

3.3.7 Example CV Calculation for Growth and Value

  • Growth CV: Example numeric assessment and interpretation relative to Value's CV, which is similar indicating similar relative dispersion.

3.4 Mean-Variance Analysis and the Sharpe Ratio

3.4.1 Investment Analysis Overview

  • Context: Analysis of investments (stocks, bonds, mutual funds).

  • Average Return: Represents investor rewards, whereas variance and standard deviation correlate to risk.

  • Mean-Variance Analysis Postulate: Performance is measured by the relationship between reward (mean) and risk (variance).

3.4.2 Sharpe Ratio

  • Definition: Measures reward relative to risk. Characterizes how well additional returns compensate for the risk undertaken.

  • Calculation:

  • On its formulation, let ( Rf ) denote the risk-free return. [ Sharpe Ratio = \frac{Rp - Rf}{\sigmap} ] where ( Rp ) is the asset return and ( \sigmap ) is the asset's standard deviation.

3.4.3 Example: Sharpe Ratio for Growth and Value Funds

  • Growth Fund:

    • Mean Return: 15.755

    • Standard Deviation: 23.799

  • Value Fund:

    • Mean Return: 12.005

    • Standard Deviation: 17.979

  • Effects of Comparison: Growth shows slightly higher Sharpe ratio indicating a better reward-to-risk compensation than Value.

3.5 Analysis of Relative Location

3.5.1 Understanding Standard Deviation and Relative Location

  • A low standard deviation indicates observations are trending closely to the mean, while high indicates greater spread.

  • Chebyshev’s Theorem:

    • States that at least ( \frac{1}{k^2} ) of the observations lie within k standard deviations of the mean:

    • For k=2: At least 75% within (x \pm 2s)

    • For k=3: At least 89% within (x \pm 3s)

3.5.2 Example of Chebyshev’s Theorem Application

  • Class Context: 280 students, mean score of 74, SD of 8; calculations for students scoring within 58 and 90.

    • At least 75% scored between this range, leading to an estimate of 210 students falling within.

3.5.3 Using the Empirical Rule

  • Applicable to symmetric, bell-shaped distributions; provides estimates on percentage observations within standard deviations.

    • 68% within 1 standard deviation

    • 95% within 2 standard deviations

    • 99% within 3 standard deviations

3.5.4 Example of Empirical Rule Calculation

  • Utilizing the class score data again to ascertain how many scored beyond 90. It infers approx. 2.5% in that bracket due to the distribution's symmetry.

3.5.5 Z-Scores

  • Definition: Z-score measures how many standard deviations an observation is from the mean.

  • Application: Detects outliers, observations falling beyond a z-score of 3 or -3 merit further review.

3.6 Measures of Association

3.6.1 Assessing Relationships with Covariance

  • Covariance:

    • Measures direction of linear relationships:

    • Sample: ( Cov(x,y) = \frac{1}{n-1} \sum{(xi - \bar{x})(yi - \bar{y})} )

    • Population: ( Cov(x,y) = \frac{1}{N} \sum{(xi - \mux)(yi - \muy)} )

3.6.2 Correlation Coefficient

  • Definition: Describes both direction and strength of linear relationships.

    • Sample: ( rs = \frac{Cov(x,y)}{SDx \cdot SD_y} )

    • Range: ([-1, 1])

    • Interpretations:

      • -1: Perfect negative correlation

      • 0: No correlation

      • 1: Perfect positive correlation

3.7 The Geometric Mean

3.7.1 Differences from Arithmetic Mean

  • Arithmetic Mean: Suitable for one-year investment but adds using addition.

  • Geometric Mean:

    • Multiplicative average; less sensitive to outliers.

    • Formula for n multiperiod returns:
      [ GM = (R1 imes R2 imes … imes R_n)^{\frac{1}{n}} ]

3.7.2 Example Calculation of Geometric Mean Return

  • Calculated growth context with initial investment and varying rates:

    • Year 1: 10% -> Value: 1,100

    • Year 2: -10% ->

    • Overall Geometric Mean provides annualized return measures.

    • Final Result interpretation leading to a negative return.

3.7.3 Compound Growth Rate Calculation

  • Example: Sales growth for five years for a multinational corporation, 6.15% calculated.

Exam Items

  • Ability to calculate central tendency measures: mean, mode, median.

  • Ability to calculate variability measures: variance and standard deviation.

  • Proficiency in formulas for population and sample data.

  • Understanding coefficient of variation.

  • Knowledge of Sharpe ratio.

  • Familiarity with the empirical rule and its applications.

  • Differences in arithmetic and geometric returns, with calculations.