ISL2441 Statistics for Management I - Describing Data

Page 2: Overview of Numeric Descriptive Statistics

  • Measures of Central Tendency and Location:

    • Mean (Arithmetic Mean)

    • Median

    • Mode

    • Geometric Mean

    • Weighted Mean

  • Measures of Dispersion (Variability):

    • Quartiles

    • Percentiles

    • Range

    • Interquartile Range

    • Variance

    • Standard Deviation

    • Coefficient of Variation

  • Measures of Shape:

    • Skewness

    • Kurtosis

Page 3: Measures of Central Tendency

  • Population Mean (µ):

    • Formula: (ar{X} = \frac{\sum_{i=1}^N x_i}{N})

    • (N) = Population size

    • (x_i) = ith value of variable X

  • Sample Mean ((\bar{X})):

    • Formula: (\bar{X} = \frac{\sum_{i=1}^n x_i}{n})

    • (n) = Sample size

Page 4: Example of Mean Calculation

  • Example (Lind et al., 2021):

    • Verizon study on mobile phone usage.

    • Data: Daily usage in hours from 12 customers

    • Calculation:

      • Total Usage = 4.1 + 3.7 + 4.3 + 4.2 + 5.5 + 5.1 + 4.2 + 5.1 + 4.2 + 4.6 + 5.2 + 3.8 = 54.0 hours

      • Mean = (\frac{54.0}{12} = 4.5) hours

Page 5: Understanding the Mean

  • Characteristics of Mean:

    • Only usable for interval or ratio data

    • Sensitive to outliers

    • Implies that all values are included in the calculation

    • Unique: There is only one mean for a given dataset

    • Deviations from the mean sum to zero: (\sum (x_i - \bar{X}) = 0)

Page 6: Median Calculation

  • Definition: The median is the middle value in an ordered dataset.

  • Calculation Rule:

    • For odd observations: Median = middle value

    • For even observations: Median = average of the two middle values

    • Example: Order values: 3.7, 4.1, 4.2, 4.2, 4.3, 5.1, 5.5

    • Median = 4.2

Page 7: Mode Calculation

  • Definition: The mode is the most frequently appearing value in a dataset.

  • Applications: Useful for nominal data, and determining the frequency of occurrences.

  • Example:

    • Server failures: 1, 3, 0, 3, 26, 2, 7, 4, 0, 2, 3, 3, 6, 3

    • Mode = 3

Page 8: Additional Mode Examples

  • Modes can be absent (no value repeats), or multiple (bimodal or multimodal).

  • Example data of system failures showed frequencies of different failure rates.

Page 9: Frequency Distribution

  • Example: Delay complaints and their frequency.

  • Mean calculated as weighted average based on frequencies.

  • Mean: (\bar{X} = \frac{\sum (f_i \cdot x_i)}{n})

Page 10: Numerical Measures in Frequency Distribution

  • Formulas:

    • Mean = (\bar{X} = \frac{\sum f_i imes m_i}{\sum f_i})

    • Median = Calculation involves interval values and cumulative frequencies

    • Mode calculation based on frequency intervals.

Page 11: Class Limits

  • Example of Data Classes with Frequency Distribution:

    • ...

Page 12: Geometric Mean

  • Definition: Used to compute average growth rates.

  • Formula: (\bar{X}_G = \sqrt[n]{x_1 \cdot x_2 \cdot ... \cdot x_n})

  • Example Calculation: Growth rates leading to the geometric mean calculation.

Page 13: Weighted Mean

  • General Formula: (\bar{X} = \frac{\sum w_i x_i}{\sum w_i})

  • Example: Weighted grades from different assessments.

  • Conclusion: The weighted mean reflects the outcome based on the importance (weight) of each assessment.

Page 14: Example of Weighted Mean Calculation

  • Scenario: Total return from different mutual funds.

  • Calculate average total return using weighted mean approach.

Page 15: Calculating Mean Cost

  • Example: Raw material purchases over time to calculate average cost per pound using weighted mean.

Page 16: Understanding Quartiles

  • Definition: Quartiles are measures that divide a dataset into quarters.

    • Q1: Lower quartile

    • Q2: Median

    • Q3: Upper quartile

Page 17: Quartile Positions

  • Calculation of quartile positions based on ordered data values.

    • Formulas for Q1, Q2, Q3.

  • Example: Interpreting values for Q1 and Q3.

Page 18: Example of Quartile Calculation

  • Daily electricity consumption example to calculate Q1, Q2, Q3 for the data set.

Page 19: Understanding Percentiles

  • Definition: Percentiles divide data into 100 equal parts.

  • Procedure for Calculation:

    • Find percentile position for desired percentile rank.

Page 20: Five-number Summary

  • A fundamental method for describing datasets including Min, Max, Q1, Median (Q2), Q3.

Page 21: Dispersion Measures

  • Definitions and importance of measures in analyzing data dispersion:

    • Range

    • Interquartile Range (IQR)

    • Variance

    • Standard Deviation

    • Coefficient of Variation

Page 22: Example of Dispersion Analysis

  • Case study of supplier reliability compared using dispersion analysis.

Page 23: Dispersion Visualization

  • Figures representing varying spreads with identical means.

Page 24: Understanding Range and IQR

  • Formula for Range: (R = X_{max} - X_{min})

  • Purpose of IQR: Measure spread of the central 50% of data.

Page 25: Variance

  • Variance definition and its calculation methods.

  • Population Variance Formula: (σ^2 = \frac{\Sigma_{i=1}^{N} (x_i - µ)^2}{N})

  • Sample Variance Formula: (s^2 = \frac{\Sigma_{i=1}^{n} (x_i - \bar{X})^2}{n - 1})

Page 26: Standard Deviation

  • Definition: Measure of average deviation from the mean.

  • Population and sample standard deviation formulas presented.

Page 27: Variance and Standard Deviation Calculation Example

  • Perform variance and standard deviation calculation with data values.

Page 28: Continued Calculation of Variance and Standard Deviation

  • Detailed example demonstrating the process.

Page 29: Coefficient of Variation (CV)

  • Definition and importance of CV in expressing variability.

  • Formulas for Population and Sample CV:

    • Population: (CV = \frac{σ}{µ} \times 100)

    • Sample: (CV = \frac{s}{\bar{x}} \times 100)

Page 30: CV Example Analysis

  • Example of comparing stock price variability between two companies using CV.

Page 31: Performance Comparison

  • Case study comparing assembly line performances using mean and standard deviation.

Page 32: Supplier Reliability Exercise

  • Analysis of delivery times for two suppliers to ascertain reliability.

Page 33: Exploring Data Shape

  • Importance of understanding data shape in descriptive statistics.

  • Measures of shape: skews or peakedness of distribution.

Page 34: Skewness

  • Definition and importance in understanding distribution symmetry.

Page 35: Kurtosis

  • Defines the 'peakedness' or flatness of a distribution.

Page 36: Skewness and Kurtosis in Normal Distribution

  • In a normally distributed dataset, both skewness and kurtosis equal 0, indicating symmetry and standard bell shape.

Page 37: Properties of Normal Distribution

  • Discusses characteristics of the normal distribution: bell shape, symmetric, and asymptotic behavior.

Page 38: Measures of Central Tendency in Normal Distribution

  • Relations between mean, median, and mode in symmetric distributions.

Page 39: Recap of Numeric Descriptive Statistics

  • Summary of key statistical measures and their categories.

Page 40: References

  • Textbook References:

    • Various sources cited, all supporting foundational statistical concepts studied in the course.