Chapter 3: Describing Relationships - Section 3.1: Scatterplots and Correlation

The Practice of Statistics, 4th edition

Chapter 3: Describing Relationships

3.1 Scatterplots and Correlation

  • The section outlines learning objectives:
    • IDENTIFY explanatory and response variables.
    • CONSTRUCT scatterplots to display relationships.
    • INTERPRET scatterplots.
    • MEASURE linear association using correlation.
    • INTERPRET correlation.

Explanatory and Response Variables

  • Most statistical studies examine data on more than one variable.
  • Explanatory Variable:
    • Definition: A variable that helps explain or influence changes in a response variable.
  • Response Variable:
    • Definition: A variable that measures an outcome of a study.
  • Correlated Relationship:
    • Definition: An association between two variables.
  • Causal Relationship:
    • Definition: A relationship showing that one variable directly affects changes in another variable.

Displaying Relationships: Scatterplots

  • Definition of Scatterplot:
    • A scatterplot displays the relationship between two quantitative variables measured on the same individuals.
    • Axes: The values of one variable are shown on the horizontal axis, and the values of the other variable on the vertical axis.
    • Each individual in the dataset appears as a point on the graph.
    • Example dataset:
    • Explanatory (x): 120, 187, 109, 103, 131, 165, 158, 116
    • Response (y): 26, 30, 26, 24, 29, 35, 31, 28

Interpreting Scatterplots

  • Follow this basic strategy to interpret scatterplots:
    1. Look for the overall pattern and striking departures from that pattern.
    2. Analyze key components:
    • Direction: Positive or negative slopes.
    • Form: Straight, curved, clustered, or distributed.
    • Strength: Strong, weak, very, or moderately.
  • Identifying Outliers:
    • Definition: A value that falls outside the overall pattern of the data.

Observations and Conclusions from Scatterplots

  1. Tasks for Observations:
    • Strength, Direction, Form, Outliers.
  2. Example Conclusion:
    • Describe the relationship observed, such as between student weight and backpack weight.

Example Analysis of A Scatterplot

  • Outlier Observation:
    • A possible outlier example: a hiker with a body weight of 187 pounds seems to carry less weight than others.
  • Relationship Strength and Direction:
    • There is a moderately strong, positive, linear relationship indicating that lighter students carry lighter backpacks.
  • Understanding Associations:
    • Two variables have a positive association if above-average values of one tend to accompany above-average values of the other, and vice versa for below-average values.
    • Negative association occurs when above-average values of one correlate with below-average values of the other.

Analyzing Scatterplots in Context

  • Using an example from the SAT scores:
    • Relationship indicators include Direction, Form, and Strength.
    • There is a moderately strong, negative, curved relationship between the percentage of students in a state who take the SAT and the mean SAT math score.
    • Two distinct clusters and possible outliers exist in this data set.

Measuring Linear Association: Correlation

  • Importance of Linear Relationships:
    • Linear relationships are vital as a straight line is a common simple pattern.
  • Correlation Definition (r):
    • Correlation measures the direction and strength of the linear relationship between two quantitative variables.
    • r is always a number between -1 and 1.
    • Values of r > 0 indicate a positive association.
    • Values of r < 0 indicate a negative association.
    • Values near 0 indicate a very weak linear relationship.
    • Values near -1 or 1 indicate a strong linear relationship.
    • r = -1 and r = 1 indicate a perfect linear relationship.

Examples of Correlation Values

  • Understanding different correlation results:
    • Correlation = 0
    • Correlation r = -0.3
    • Correlation r = 0.5
    • Correlation r = −0.7
    • Correlation r = 0.9
    • Correlation r = -0.99

Correlation Calculation

  • Formula for Correlation (r):
    • For variables x and y from n individuals:
    • Means and standard deviations for values are defined as follows:
    • Mean of x: $ar{x}$, Standard Deviation of x: $s_x$
    • Mean of y: $ar{y}$, Standard Deviation of y: $s_y$
  • The correlation between x and y can be computed but is complex, thus calculators or software are recommended for practical calculations.

Facts about Correlation

  • Characteristics:
    1. Correlation does not differentiate between explanatory and response variables.
    2. Changes in the units of x or y do not affect r.
    3. Correlation r itself is unitless.
  • Cautions:
    • Correlation requires both variables to be quantitative.
    • Correlation is ineffective for curved relationships, regardless of strength.
    • Correlation is sensitive to outliers and can be strongly impacted by them.
    • Correlation does not fully summarize two-variable data.

Practice

  • Estimate the correlation for each graph and interpret it in context:
  • Example datasets include:
    • Manatees killed by boats vs. storms observed.
    • Boats registered in Florida (in 1000s).
    • Healing rates of limbs both 1 and 2 correlated with storms predicted.
    • Percent returns from the current and previous year.

Summary of Section 3.1

  • Key takeaways include:
    • A scatterplot displays relationships between two quantitative variables.
    • The explanatory variable can help explain or predict changes in response variables.
    • When interpreting scatterplots, observe overall patterns for strength, direction, form, and look for outliers.
    • Correlation (r) quantifies the strength and direction of linear relationships between two quantitative variables.

Looking Ahead

  • Upcoming topics in the following section will include:
    • Least-squares Regression line.
    • Prediction.
    • Residuals and residual plots.
    • The Role of $r^2$ in Regression.
    • Correlation and Regression Wisdom.