Notes on Variance, Standard Deviation, and Coefficient of Variation
Understanding Variance and Standard Deviation
Population vs. Sample Variance Formulas
There are two distinct formulas for variance:
Population Variance: Denoted by the Greek letter \sigma^2 (sigma squared).
Sample Variance: Denoted by the regular alphabet letter s^2.
Mathematical Difference: The numerator is mathematically the same for both. The only difference lies in the denominator:
For sample variance, we divide by n-1, where n is the number of observations.
For population variance, we divide by N (or n for population size).
This distinction is crucial, as using the incorrect formula for the given data type (population or sample) will lead to an incorrect answer, even if the difference seems minor.
Calculating Variance in Excel
Excel provides specific functions for both population and sample variance.
To access them: Go to the "Insert Function" dialog box and type "variance."
Population Variance Function:
VAR.P(thepstands for population).Sample Variance Function:
VAR.S(thesstands for sample).Usage: For both functions, you only need to tell Excel where your data are located (e.g., highlight the range of cells). Excel then performs all the complex calculations, including counting the values and applying the correct denominator (n or n-1).
Example from Transcript: If there are 70 values in the dataset:
VAR.Pwould divide by 70.VAR.Swould divide by 70-1 = 69.
Importance of Selection: The calculated values often differ enough (e.g., .34 vs. .74 when rounded to two decimal places) that using the wrong function will result in a marked incorrect answer in quizzes or homework.
Key Takeaway: Always read problem statements carefully to determine whether you are dealing with population data or sample data.
Interpreting Variance
One challenge with variance is its interpretation because its units are squared (e.g., dollars squared, years squared). This makes it less intuitive for practical understanding.
Standard Deviation
Definition: The standard deviation is always the positive square root of the variance.
Population Standard Deviation: \sigma = \sqrt{\sigma^2}
Sample Standard Deviation: s = \sqrt{s^2}
Advantage over Variance: The standard deviation is expressed in the same units of measure as the original raw data (e.g., dollars, years). This makes it much easier to interpret and understand in real-world contexts.
Prevalence: In research papers and academic literature, authors almost exclusively report standard deviation rather than variance due to its interpretability.
Frequency of Use: Standard deviation is a foundational concept and will be used extensively throughout the course, including in more advanced statistical methods such as hypothesis testing and confidence intervals.
Calculating Standard Deviation in Excel
Similar to variance, Excel has specific functions for population and sample standard deviation.
To access them: Go to the "Insert Function" dialog box and type "standard deviation."
Population Standard Deviation Function:
STDEV.P(for population data).Sample Standard Deviation Function:
STDEV.S(for sample data).Usage: Simply point Excel to the data range. Excel handles all the calculations, including taking the square root of the variance.
Historical Context: In the past, when calculations were done by hand, one first had to calculate the variance and then take its square root to find the standard deviation. Excel's built-in functions streamline this process, often eliminating the need to calculate variance separately.
Importance of Selection: As with variance, selecting the correct
STDEV.PorSTDEV.Sfunction is critical to obtaining the correct answer, as results will differ.
Coefficient of Variation (CV)
Purpose: The Coefficient of Variation is a valuable measure for comparing the relative variability between two or more datasets, especially when:
The datasets have different units of measure (e.g., prices in euros vs. dollars).
The datasets have significantly different means, making direct comparison of standard deviations misleading.
Formula: Unlike variance and standard deviation, there is no built-in Excel function for the Coefficient of Variation. You must calculate it manually using the formula:
CV = \frac{\text{Standard Deviation}}{\text{Mean}} \times 100Steps for Manual Calculation in Excel:
Calculate the Mean (using the
AVERAGEfunction).Calculate the appropriate Standard Deviation (using
STDEV.PorSTDEV.Sbased on data type).Input the formula into a cell, referencing the calculated standard deviation and mean. Multiply by 100 to express it as a percentage.
Example: If standard deviation is in cell
M6and mean inM8, the formula would be= (M6/M8)*100.
Excel Order of Operations - Caution with Manual Formulas:
When manually inputting formulas, Excel follows standard order of operations. Be explicit with parentheses
()to ensure calculations are performed in the intended order.Example:
(2+3)*4evaluates to 20 (adds first, then multiplies).2+3*4evaluates to 14 (multiplies first, then adds).Excel can color-code parentheses to help identify corresponding pairs, which is useful for complex formulas.
Interpretation: When comparing two datasets, the dataset with the higher numerical value for the Coefficient of Variation is considered to have more relative variability.
Practical Use Cases:
Comparing apartment rental variability between a U.S. city (dollars) and a European city (euros).
Comparing the risk of different stocks, especially when their prices or average returns vary significantly.
General Advice for Problems
Read Carefully: Always identify whether the problem provides population data or sample data. This determines the correct variance and standard deviation functions to use.
Excel as a Tool: While Excel performs the calculations, understanding the underlying formulas and concepts is essential for correct application and interpretation.
Practice: Utilize provided practice problems to become proficient with Excel functions for mean, standard deviation, and variance, and for manually calculating the coefficient of variation.
Tolerance for Answers: For quizzes, pay attention to rounding instructions and the tolerance set for answers. Incorrect function choice can result in a wrong answer even with correct rounding. It is crucial to determine the data type (sample or population) given in the problem statement, which will always be explicitly mentioned. The forthcoming quizzes and homework will primarily focus on
sampledata.