2.3
Strength of Linear Relationships
- A linear relationship is considered strong if data points lie closely to a straight line.
- Conversely, it is termed weak if the data points are widely scattered around the line.
- Nonetheless, visual assessment is unreliable for measuring the strength of the relationship.
- To quantitatively determine the strength of the linear relationship, the statistic known as correlation is utilized.
Correlation
- In statistics, correlation quantifies the degree to which two variables are related to each other.
Sample Observations
- Let the sample size of n be defined for two variables, x and y.
- Sample means are calculated as follows:
- xˉ=n1∑<em>i=1nx</em>i
- yˉ=n1∑<em>i=1ny</em>i
- Sample variances are denoted as:
- s2<em>x=n−11∑</em>i=1n(xi−xˉ)2
- s2<em>y=n−11∑</em>i=1n(yi−yˉ)2
Pearson Product-Moment Correlation Coefficient
- The Pearson product-moment correlation coefficient (r) quantifies both the direction and strength of the linear relationship between two quantitative variables. The formula is given by:
- r=(n−1)s</em>xsy∑<em>i=1n(x</em>i−xˉ)(y<em>i−yˉ)
- Where:
- n = number of observations
- s_x = standard deviation of x
- s_y = standard deviation of y
Properties of Correlation Coefficient
- The correlation coefficient r has several important properties:
- Range: −1≤r≤1
- Dimensionless: The value of r is unit-free.
- Negative Association: If r < 0, there is a negative correlation.
- Positive Association: If r > 0, there is a positive correlation.
- Sensitivity to Outliers: The value of r is heavily influenced by outliers in the data.
- Incomplete Description: r alone does not fully describe the relationship between two variables.
- The square of the correlation coefficient, r2, represents the proportion of variance explained by the linear relationship between x and y.
Example Correlations
- Example correlation values:
- r = 0 indicates no correlation.
- r = -0.3 indicates a weak negative correlation.
- r = 0.5 signifies a moderate positive correlation.
- r = -0.7 shows a strong negative correlation.
- r = 0.9 demonstrates a very strong positive correlation.
- r = -0.99 signals an extremely strong negative correlation.
Application: Correlation in Coffee Prices and Deforestation
Context
- A specific example involves analyzing coffee prices and deforestation rates in Indonesia.
- As coffee prices increase, farmers have a tendency to clear forest land for more coffee plantations.
Data Collected
- Price (cents per pound) and Deforestation (per cent) data for 5 years:
- Price: 29 cents/pound, Deforestation: 0.49%
- Price: 40 cents/pound, Deforestation: 1.59%
- Price: 54 cents/pound, Deforestation: 1.69%
- Price: 55 cents/pound, Deforestation: 1.82%
- Price: 72 cents/pound, Deforestation: 3.10%
Analysis Tasks
- Scatterplot Creation: Generate a scatterplot from the collected data.
- Identify the explanatory variable.
- Analyze the pattern shown in the scatterplot.
- Correlation Calculation: Use R/Rcmdr software to compute the correlation coefficient r for the given data.
Scatterplot Overview
- The scatterplot presents Deforestation (as the y-axis) against Price (as the x-axis).
- The computed correlation in this case is Ir = 0.955, indicating a very strong positive linear relationship between coffee prices and deforestation rates.