Definition: A scatterplot displays the relationship (association) between two quantitative variables measured on the same individuals.
Creating a Scatterplot: To create a scatterplot, measure two variables, X and Y, for each individual, resulting in points (x, y) plotted on a graph.
Key Aspects to Analyze:
Direction of the association
Shape or form of the association
Strength of the association
Positive Direction: Occurs when, as X increases, Y also tends to increase. Points slope upward in a scatterplot.
Negative Direction: Occurs when, as X increases, Y tends to decrease. Points slope downward in a scatterplot.
No Direction: When changes in X do not significantly affect Y; the results appear horizontal indicating no relationship.
Example: A plot showing acceleration of a crash test dummy reveals variable direction—none, negative, positive, and back to none, indicating a complex relationship over time.
Forms: Relationships may be linear or nonlinear. To identify the form, consider the shape of a line that can be drawn through the points.
Linear Form: Points align closely to a straight line.
Nonlinear Form: Points do not align to a straight line and can take shapes such as quadratic or exponential curves.
Strong Relationship: Points are close to the line; predictions of Y from X are reliable.
Moderate Relationship: Points are more spread out but still show some correlation.
Weak Relationship: Points are widely dispersed, indicating poor correlation and unreliable predictions.
Misleading Strength: Graphs may appear stronger by manipulating the axes to create excessive empty space, misleading interpretation of correlation strength.
Archaeopteryx Fossils: Example data on femur and humerus lengths illustrates a strong, positive linear relationship.
Definition: The correlation coefficient (r) quantifies the strength and direction of a linear relationship between X and Y.
Characteristics:
The value of r is between -1 and 1.
Values near 1 indicate a strong positive relationship; near -1 indicates a strong negative relationship; and near 0 indicates no relationship.
Compute mean and standard deviation for both X and Y.
Calculate z-scores for each X and Y, standardizing the measures.
Use these z-scores to assess the correlations mathematically.
Computational Formula for r:
r = (Σxᵢyᵢ - (Σxᵢ)(Σyᵢ)/n) / √((Σx²ᵢ - (Σxᵢ)²/n) (Σy²ᵢ - (Σyᵢ)²/n))
For accuracy, make a data table to calculate necessary sums.
Low Correlation: Low r values indicate weak linear relationships which do not preclude the existence of strong non-linear relationships.
Outliers: Significant outliers can distort correlation. An example showed a drop in correlation due to an outlier's influence on the data set.
Caution: Correlation does not imply causation. The presence of a correlation does not confirm that changes in one variable cause changes in another. Further investigation is required to establish causation.
Student Performance Data: Analyzing current quiz percentages vs. grades on a test to calculate correlation and illustrate the concepts learned.