Chapter 3: Describing Relationships - Section 3.1: Scatterplots and Correlation
The Practice of Statistics, 4th edition
Chapter 3: Describing Relationships
3.1 Scatterplots and Correlation
- The section outlines learning objectives:
- IDENTIFY explanatory and response variables.
- CONSTRUCT scatterplots to display relationships.
- INTERPRET scatterplots.
- MEASURE linear association using correlation.
- INTERPRET correlation.
Explanatory and Response Variables
- Most statistical studies examine data on more than one variable.
- Explanatory Variable:
- Definition: A variable that helps explain or influence changes in a response variable.
- Response Variable:
- Definition: A variable that measures an outcome of a study.
- Correlated Relationship:
- Definition: An association between two variables.
- Causal Relationship:
- Definition: A relationship showing that one variable directly affects changes in another variable.
Displaying Relationships: Scatterplots
- Definition of Scatterplot:
- A scatterplot displays the relationship between two quantitative variables measured on the same individuals.
- Axes: The values of one variable are shown on the horizontal axis, and the values of the other variable on the vertical axis.
- Each individual in the dataset appears as a point on the graph.
- Example dataset:
- Explanatory (x): 120, 187, 109, 103, 131, 165, 158, 116
- Response (y): 26, 30, 26, 24, 29, 35, 31, 28
Interpreting Scatterplots
- Follow this basic strategy to interpret scatterplots:
- Look for the overall pattern and striking departures from that pattern.
- Analyze key components:
- Direction: Positive or negative slopes.
- Form: Straight, curved, clustered, or distributed.
- Strength: Strong, weak, very, or moderately.
- Identifying Outliers:
- Definition: A value that falls outside the overall pattern of the data.
Observations and Conclusions from Scatterplots
- Tasks for Observations:
- Strength, Direction, Form, Outliers.
- Example Conclusion:
- Describe the relationship observed, such as between student weight and backpack weight.
Example Analysis of A Scatterplot
- Outlier Observation:
- A possible outlier example: a hiker with a body weight of 187 pounds seems to carry less weight than others.
- Relationship Strength and Direction:
- There is a moderately strong, positive, linear relationship indicating that lighter students carry lighter backpacks.
- Understanding Associations:
- Two variables have a positive association if above-average values of one tend to accompany above-average values of the other, and vice versa for below-average values.
- Negative association occurs when above-average values of one correlate with below-average values of the other.
Analyzing Scatterplots in Context
- Using an example from the SAT scores:
- Relationship indicators include Direction, Form, and Strength.
- There is a moderately strong, negative, curved relationship between the percentage of students in a state who take the SAT and the mean SAT math score.
- Two distinct clusters and possible outliers exist in this data set.
Measuring Linear Association: Correlation
- Importance of Linear Relationships:
- Linear relationships are vital as a straight line is a common simple pattern.
- Correlation Definition (r):
- Correlation measures the direction and strength of the linear relationship between two quantitative variables.
- r is always a number between -1 and 1.
- Values of r > 0 indicate a positive association.
- Values of r < 0 indicate a negative association.
- Values near 0 indicate a very weak linear relationship.
- Values near -1 or 1 indicate a strong linear relationship.
- r = -1 and r = 1 indicate a perfect linear relationship.
Examples of Correlation Values
- Understanding different correlation results:
- Correlation = 0
- Correlation r = -0.3
- Correlation r = 0.5
- Correlation r = −0.7
- Correlation r = 0.9
- Correlation r = -0.99
Correlation Calculation
- Formula for Correlation (r):
- For variables x and y from n individuals:
- Means and standard deviations for values are defined as follows:
- Mean of x: $ar{x}$, Standard Deviation of x: $s_x$
- Mean of y: $ar{y}$, Standard Deviation of y: $s_y$
- The correlation between x and y can be computed but is complex, thus calculators or software are recommended for practical calculations.
Facts about Correlation
- Characteristics:
- Correlation does not differentiate between explanatory and response variables.
- Changes in the units of x or y do not affect r.
- Correlation r itself is unitless.
- Cautions:
- Correlation requires both variables to be quantitative.
- Correlation is ineffective for curved relationships, regardless of strength.
- Correlation is sensitive to outliers and can be strongly impacted by them.
- Correlation does not fully summarize two-variable data.
Practice
- Estimate the correlation for each graph and interpret it in context:
- Example datasets include:
- Manatees killed by boats vs. storms observed.
- Boats registered in Florida (in 1000s).
- Healing rates of limbs both 1 and 2 correlated with storms predicted.
- Percent returns from the current and previous year.
Summary of Section 3.1
- Key takeaways include:
- A scatterplot displays relationships between two quantitative variables.
- The explanatory variable can help explain or predict changes in response variables.
- When interpreting scatterplots, observe overall patterns for strength, direction, form, and look for outliers.
- Correlation (r) quantifies the strength and direction of linear relationships between two quantitative variables.
Looking Ahead
- Upcoming topics in the following section will include:
- Least-squares Regression line.
- Prediction.
- Residuals and residual plots.
- The Role of $r^2$ in Regression.
- Correlation and Regression Wisdom.