2.3

Strength of Linear Relationships

  • A linear relationship is considered strong if data points lie closely to a straight line.
  • Conversely, it is termed weak if the data points are widely scattered around the line.
  • Nonetheless, visual assessment is unreliable for measuring the strength of the relationship.
  • To quantitatively determine the strength of the linear relationship, the statistic known as correlation is utilized.

Correlation

  • In statistics, correlation quantifies the degree to which two variables are related to each other.

Sample Observations

  • Let the sample size of n be defined for two variables, x and y.
  • Sample means are calculated as follows:
    • xˉ=1n<em>i=1nx</em>i\bar{x} = \frac{1}{n} \sum<em>{i=1}^{n} x</em>i
    • yˉ=1n<em>i=1ny</em>i\bar{y} = \frac{1}{n} \sum<em>{i=1}^{n} y</em>i
  • Sample variances are denoted as:
    • s2<em>x=1n1</em>i=1n(xixˉ)2s^2<em>x = \frac{1}{n-1} \sum</em>{i=1}^{n} (x_i - \bar{x})^2
    • s2<em>y=1n1</em>i=1n(yiyˉ)2s^2<em>y = \frac{1}{n-1} \sum</em>{i=1}^{n} (y_i - \bar{y})^2

Pearson Product-Moment Correlation Coefficient

  • The Pearson product-moment correlation coefficient (r) quantifies both the direction and strength of the linear relationship between two quantitative variables. The formula is given by:
    • r=<em>i=1n(x</em>ixˉ)(y<em>iyˉ)(n1)s</em>xsyr = \frac{\sum<em>{i=1}^{n} (x</em>i - \bar{x})(y<em>i - \bar{y})}{(n-1)s</em>x s_y}
  • Where:
    • n = number of observations
    • s_x = standard deviation of x
    • s_y = standard deviation of y

Properties of Correlation Coefficient

  • The correlation coefficient r has several important properties:
    • Range: 1r1-1 \leq r \leq 1
    • Dimensionless: The value of r is unit-free.
    • Negative Association: If r < 0, there is a negative correlation.
    • Positive Association: If r > 0, there is a positive correlation.
    • Sensitivity to Outliers: The value of r is heavily influenced by outliers in the data.
    • Incomplete Description: r alone does not fully describe the relationship between two variables.
    • The square of the correlation coefficient, r2r^2, represents the proportion of variance explained by the linear relationship between x and y.

Example Correlations

  • Example correlation values:
    • r = 0 indicates no correlation.
    • r = -0.3 indicates a weak negative correlation.
    • r = 0.5 signifies a moderate positive correlation.
    • r = -0.7 shows a strong negative correlation.
    • r = 0.9 demonstrates a very strong positive correlation.
    • r = -0.99 signals an extremely strong negative correlation.

Application: Correlation in Coffee Prices and Deforestation

Context

  • A specific example involves analyzing coffee prices and deforestation rates in Indonesia.
  • As coffee prices increase, farmers have a tendency to clear forest land for more coffee plantations.

Data Collected

  • Price (cents per pound) and Deforestation (per cent) data for 5 years:
    • Price: 29 cents/pound, Deforestation: 0.49%
    • Price: 40 cents/pound, Deforestation: 1.59%
    • Price: 54 cents/pound, Deforestation: 1.69%
    • Price: 55 cents/pound, Deforestation: 1.82%
    • Price: 72 cents/pound, Deforestation: 3.10%

Analysis Tasks

  1. Scatterplot Creation: Generate a scatterplot from the collected data.
    • Identify the explanatory variable.
    • Analyze the pattern shown in the scatterplot.
  2. Correlation Calculation: Use R/Rcmdr software to compute the correlation coefficient r for the given data.

Scatterplot Overview

  • The scatterplot presents Deforestation (as the y-axis) against Price (as the x-axis).
  • The computed correlation in this case is Ir = 0.955, indicating a very strong positive linear relationship between coffee prices and deforestation rates.