Study Notes on Covariance and Correlation Coefficient

Covariance and Correlation Coefficient

Introduction

  • Covariance and correlation coefficient are two different, yet related measures.

  • Both involve the examination of relationships between two variables.

  • Previous statistical measures discussed involved only one variable, whereas these involve two variables.

Understanding Variables

  • Key question: How do two variables move together?

    • Example: If price changes, what happens to quantity demanded?

  • Importance for decision making in fields such as business.

  • Understanding these relationships provides insights into effects of variable changes.

Parameter Types

  • Important to distinguish between sample data and population data in calculations.

  • Most practical applications utilize sample data (99% of the time).

  • Excel functions focus primarily on sample data due to limitations.

Covariance

  • Definition: Covariance measures how two variables change together.

  • Indicates the direction of the linear relationship between variables.

    • A positive covariance suggests that as one variable increases, the other variable tends to also increase.

    • A negative covariance suggests that as one variable increases, the other tends to decrease.

  • Important Understanding:

    • Covariance does not imply causation.

    • Just because two variables move together does not mean one causes the other to change.

    • Common media misinterpretation: correlation or covariance does not equate to causality.

Covariance Formula

  • There are separate formulas for sample covariance and population covariance:

  • Sample Covariance Formula:


    s{xy} = \frac{\sum{(xi - \bar{x})(y_i - \bar{y})}}{n - 1}

  • Population Covariance Formula:


    \sigma{xy} = \frac{\sum{(xi - \mux)(yi - \mu_y)}}{n}

  • The main difference between the two is the denominator (n-1 for sample and n for population).

  • The numerator is essentially the same for both formulations.

Notation

  • Sample covariance is denoted by $s{xy}$ and population covariance by $\sigma{xy}$.

  • Variable x represents one variable and variable y represents the other in all formulas.

Observations on Covariance

  • Covariance can yield negative numbers, unlike variance or standard deviation, which are always non-negative.

  • The sign of the covariance is critical:

    • Indicator of the relationship: positive (same direction) or negative (opposite direction).

  • Thus, pay attention to the sign when interpreting covariance results.

Correlation Coefficient

  • The correlation coefficient also measures the linear relationship between two variables.

  • Definition: A standardized measure of the covariance.

  • Range: Can only take values from -1 to +1.

    • Negative values indicate a negative linear relationship.

    • Positive values indicate a positive linear relationship.

    • A value of 0 indicates no linear relationship.

  • Unlike covariance, the correlation coefficient's numerical value is significant:

    • Closer to 1 or -1 indicates a stronger relationship.

    • Use correlation calculations for understanding the strength of relationships, while treat covariance more as a directional indicator.

Correlation Coefficient Formulas

  • Sample Correlation Coefficient Formula:


    r{xy} = \frac{s{xy}}{sx sy}

  • Population Correlation Coefficient Formula:


    \rho{xy} = \frac{\sigma{xy}}{\sigmax \sigmay}

  • This means correlation coefficients normalize covariance by the product of the standard deviations of both variables.

Significance of the Sign

  • If the covariance is negative, the correlation coefficient must also be negative (and vice versa).

  • Important confirmation check: different signs indicate calculation error.

Summary of Key Relationships

  • Positive covariance indicates that two variables tend to move in the same direction.

  • Negative covariance indicates variables move in opposite directions.

  • Zero covariance indicates no relationship.

  • The correlation coefficient provides not just the direction but also the strength of the linear relationship.

Practical Application with Excel

  • Excel offers built-in functions for calculating both covariance and correlation coefficients.

  • Sample Covariance Function: COVARIANCE.S (uses n-1 in calculations).

  • Population Covariance Function: COVARIANCE.P (uses n in calculations).

  • Correlation Coefficient Function: CORREL.

  • Each function in Excel has specific input requirements, typically involving selecting the corresponding data ranges for [array1] and [array2].

Practical Considerations with Data

  • Data pairs must be kept consistent; rearranging data can lead to relationship loss.

  • Excel automatically ignores non-numerical or blank cells while performing calculations.

  • Correct usage of excel functions is essential for accuracy, especially distinguishing between sample vs population calculations.