Notes on Variance, Standard Deviation, and Coefficient of Variation

Understanding Variance and Standard Deviation

Population vs. Sample Variance Formulas

  • There are two distinct formulas for variance:

    • Population Variance: Denoted by the Greek letter \sigma^2 (sigma squared).

    • Sample Variance: Denoted by the regular alphabet letter s^2.

  • Mathematical Difference: The numerator is mathematically the same for both. The only difference lies in the denominator:

    • For sample variance, we divide by n-1, where n is the number of observations.

    • For population variance, we divide by N (or n for population size).

  • This distinction is crucial, as using the incorrect formula for the given data type (population or sample) will lead to an incorrect answer, even if the difference seems minor.

Calculating Variance in Excel

  • Excel provides specific functions for both population and sample variance.

  • To access them: Go to the "Insert Function" dialog box and type "variance."

  • Population Variance Function: VAR.P (the p stands for population).

  • Sample Variance Function: VAR.S (the s stands for sample).

  • Usage: For both functions, you only need to tell Excel where your data are located (e.g., highlight the range of cells). Excel then performs all the complex calculations, including counting the values and applying the correct denominator (n or n-1).

  • Example from Transcript: If there are 70 values in the dataset:

    • VAR.P would divide by 70.

    • VAR.S would divide by 70-1 = 69.

  • Importance of Selection: The calculated values often differ enough (e.g., .34 vs. .74 when rounded to two decimal places) that using the wrong function will result in a marked incorrect answer in quizzes or homework.

  • Key Takeaway: Always read problem statements carefully to determine whether you are dealing with population data or sample data.

Interpreting Variance

  • One challenge with variance is its interpretation because its units are squared (e.g., dollars squared, years squared). This makes it less intuitive for practical understanding.

Standard Deviation

  • Definition: The standard deviation is always the positive square root of the variance.

    • Population Standard Deviation: \sigma = \sqrt{\sigma^2}

    • Sample Standard Deviation: s = \sqrt{s^2}

  • Advantage over Variance: The standard deviation is expressed in the same units of measure as the original raw data (e.g., dollars, years). This makes it much easier to interpret and understand in real-world contexts.

  • Prevalence: In research papers and academic literature, authors almost exclusively report standard deviation rather than variance due to its interpretability.

  • Frequency of Use: Standard deviation is a foundational concept and will be used extensively throughout the course, including in more advanced statistical methods such as hypothesis testing and confidence intervals.

Calculating Standard Deviation in Excel

  • Similar to variance, Excel has specific functions for population and sample standard deviation.

  • To access them: Go to the "Insert Function" dialog box and type "standard deviation."

  • Population Standard Deviation Function: STDEV.P (for population data).

  • Sample Standard Deviation Function: STDEV.S (for sample data).

  • Usage: Simply point Excel to the data range. Excel handles all the calculations, including taking the square root of the variance.

  • Historical Context: In the past, when calculations were done by hand, one first had to calculate the variance and then take its square root to find the standard deviation. Excel's built-in functions streamline this process, often eliminating the need to calculate variance separately.

  • Importance of Selection: As with variance, selecting the correct STDEV.P or STDEV.S function is critical to obtaining the correct answer, as results will differ.

Coefficient of Variation (CV)

  • Purpose: The Coefficient of Variation is a valuable measure for comparing the relative variability between two or more datasets, especially when:

    • The datasets have different units of measure (e.g., prices in euros vs. dollars).

    • The datasets have significantly different means, making direct comparison of standard deviations misleading.

  • Formula: Unlike variance and standard deviation, there is no built-in Excel function for the Coefficient of Variation. You must calculate it manually using the formula:
    CV = \frac{\text{Standard Deviation}}{\text{Mean}} \times 100

  • Steps for Manual Calculation in Excel:

    1. Calculate the Mean (using the AVERAGE function).

    2. Calculate the appropriate Standard Deviation (using STDEV.P or STDEV.S based on data type).

    3. Input the formula into a cell, referencing the calculated standard deviation and mean. Multiply by 100 to express it as a percentage.

      • Example: If standard deviation is in cell M6 and mean in M8, the formula would be = (M6/M8)*100.

  • Excel Order of Operations - Caution with Manual Formulas:

    • When manually inputting formulas, Excel follows standard order of operations. Be explicit with parentheses () to ensure calculations are performed in the intended order.

    • Example: (2+3)*4 evaluates to 20 (adds first, then multiplies). 2+3*4 evaluates to 14 (multiplies first, then adds).

    • Excel can color-code parentheses to help identify corresponding pairs, which is useful for complex formulas.

  • Interpretation: When comparing two datasets, the dataset with the higher numerical value for the Coefficient of Variation is considered to have more relative variability.

  • Practical Use Cases:

    • Comparing apartment rental variability between a U.S. city (dollars) and a European city (euros).

    • Comparing the risk of different stocks, especially when their prices or average returns vary significantly.

General Advice for Problems

  • Read Carefully: Always identify whether the problem provides population data or sample data. This determines the correct variance and standard deviation functions to use.

  • Excel as a Tool: While Excel performs the calculations, understanding the underlying formulas and concepts is essential for correct application and interpretation.

  • Practice: Utilize provided practice problems to become proficient with Excel functions for mean, standard deviation, and variance, and for manually calculating the coefficient of variation.

  • Tolerance for Answers: For quizzes, pay attention to rounding instructions and the tolerance set for answers. Incorrect function choice can result in a wrong answer even with correct rounding. It is crucial to determine the data type (sample or population) given in the problem statement, which will always be explicitly mentioned. The forthcoming quizzes and homework will primarily focus on sample data.