Study Notes on Covariance and Correlation Coefficient
Covariance and Correlation Coefficient
Introduction
Covariance and correlation coefficient are two different, yet related measures.
Both involve the examination of relationships between two variables.
Previous statistical measures discussed involved only one variable, whereas these involve two variables.
Understanding Variables
Key question: How do two variables move together?
Example: If price changes, what happens to quantity demanded?
Importance for decision making in fields such as business.
Understanding these relationships provides insights into effects of variable changes.
Parameter Types
Important to distinguish between sample data and population data in calculations.
Most practical applications utilize sample data (99% of the time).
Excel functions focus primarily on sample data due to limitations.
Covariance
Definition: Covariance measures how two variables change together.
Indicates the direction of the linear relationship between variables.
A positive covariance suggests that as one variable increases, the other variable tends to also increase.
A negative covariance suggests that as one variable increases, the other tends to decrease.
Important Understanding:
Covariance does not imply causation.
Just because two variables move together does not mean one causes the other to change.
Common media misinterpretation: correlation or covariance does not equate to causality.
Covariance Formula
There are separate formulas for sample covariance and population covariance:
Sample Covariance Formula:
s{xy} = \frac{\sum{(xi - \bar{x})(y_i - \bar{y})}}{n - 1}Population Covariance Formula:
\sigma{xy} = \frac{\sum{(xi - \mux)(yi - \mu_y)}}{n}The main difference between the two is the denominator (n-1 for sample and n for population).
The numerator is essentially the same for both formulations.
Notation
Sample covariance is denoted by $s{xy}$ and population covariance by $\sigma{xy}$.
Variable x represents one variable and variable y represents the other in all formulas.
Observations on Covariance
Covariance can yield negative numbers, unlike variance or standard deviation, which are always non-negative.
The sign of the covariance is critical:
Indicator of the relationship: positive (same direction) or negative (opposite direction).
Thus, pay attention to the sign when interpreting covariance results.
Correlation Coefficient
The correlation coefficient also measures the linear relationship between two variables.
Definition: A standardized measure of the covariance.
Range: Can only take values from -1 to +1.
Negative values indicate a negative linear relationship.
Positive values indicate a positive linear relationship.
A value of 0 indicates no linear relationship.
Unlike covariance, the correlation coefficient's numerical value is significant:
Closer to 1 or -1 indicates a stronger relationship.
Use correlation calculations for understanding the strength of relationships, while treat covariance more as a directional indicator.
Correlation Coefficient Formulas
Sample Correlation Coefficient Formula:
r{xy} = \frac{s{xy}}{sx sy}Population Correlation Coefficient Formula:
\rho{xy} = \frac{\sigma{xy}}{\sigmax \sigmay}This means correlation coefficients normalize covariance by the product of the standard deviations of both variables.
Significance of the Sign
If the covariance is negative, the correlation coefficient must also be negative (and vice versa).
Important confirmation check: different signs indicate calculation error.
Summary of Key Relationships
Positive covariance indicates that two variables tend to move in the same direction.
Negative covariance indicates variables move in opposite directions.
Zero covariance indicates no relationship.
The correlation coefficient provides not just the direction but also the strength of the linear relationship.
Practical Application with Excel
Excel offers built-in functions for calculating both covariance and correlation coefficients.
Sample Covariance Function:
COVARIANCE.S(uses n-1 in calculations).Population Covariance Function:
COVARIANCE.P(uses n in calculations).Correlation Coefficient Function:
CORREL.Each function in Excel has specific input requirements, typically involving selecting the corresponding data ranges for
[array1]and[array2].
Practical Considerations with Data
Data pairs must be kept consistent; rearranging data can lead to relationship loss.
Excel automatically ignores non-numerical or blank cells while performing calculations.
Correct usage of excel functions is essential for accuracy, especially distinguishing between sample vs population calculations.