14._Chapter_14_Slides--Scatterplots

Chapter 14: Scatterplots

Introduction to Scatterplots

  • Definition: A scatterplot displays the relationship (association) between two quantitative variables measured on the same individuals.

  • Creating a Scatterplot: To create a scatterplot, measure two variables, X and Y, for each individual, resulting in points (x, y) plotted on a graph.

  • Key Aspects to Analyze:

    • Direction of the association

    • Shape or form of the association

    • Strength of the association

Direction of Association

  • Positive Direction: Occurs when, as X increases, Y also tends to increase. Points slope upward in a scatterplot.

  • Negative Direction: Occurs when, as X increases, Y tends to decrease. Points slope downward in a scatterplot.

  • No Direction: When changes in X do not significantly affect Y; the results appear horizontal indicating no relationship.

Changing Direction

  • Example: A plot showing acceleration of a crash test dummy reveals variable direction—none, negative, positive, and back to none, indicating a complex relationship over time.

Shape or Form of Association

  • Forms: Relationships may be linear or nonlinear. To identify the form, consider the shape of a line that can be drawn through the points.

    • Linear Form: Points align closely to a straight line.

    • Nonlinear Form: Points do not align to a straight line and can take shapes such as quadratic or exponential curves.

Strength of the Relationship

  • Strong Relationship: Points are close to the line; predictions of Y from X are reliable.

  • Moderate Relationship: Points are more spread out but still show some correlation.

  • Weak Relationship: Points are widely dispersed, indicating poor correlation and unreliable predictions.

  • Misleading Strength: Graphs may appear stronger by manipulating the axes to create excessive empty space, misleading interpretation of correlation strength.

Making a Scatterplot: Practical Example

  • Archaeopteryx Fossils: Example data on femur and humerus lengths illustrates a strong, positive linear relationship.

Correlation Coefficients

  • Definition: The correlation coefficient (r) quantifies the strength and direction of a linear relationship between X and Y.

  • Characteristics:

    • The value of r is between -1 and 1.

    • Values near 1 indicate a strong positive relationship; near -1 indicates a strong negative relationship; and near 0 indicates no relationship.

Calculating the Correlation Coefficient

  1. Compute mean and standard deviation for both X and Y.

  2. Calculate z-scores for each X and Y, standardizing the measures.

  3. Use these z-scores to assess the correlations mathematically.

  • Computational Formula for r:

    • r = (Σxᵢyᵢ - (Σxᵢ)(Σyᵢ)/n) / √((Σx²ᵢ - (Σxᵢ)²/n) (Σy²ᵢ - (Σyᵢ)²/n))

  • For accuracy, make a data table to calculate necessary sums.

Non-linear Relationships and Correlation

  • Low Correlation: Low r values indicate weak linear relationships which do not preclude the existence of strong non-linear relationships.

  • Outliers: Significant outliers can distort correlation. An example showed a drop in correlation due to an outlier's influence on the data set.

Causation vs. Correlation

  • Caution: Correlation does not imply causation. The presence of a correlation does not confirm that changes in one variable cause changes in another. Further investigation is required to establish causation.

Examples for Practice

  • Student Performance Data: Analyzing current quiz percentages vs. grades on a test to calculate correlation and illustrate the concepts learned.