Study Notes on Correlation, Scatter Plots, and Regression
Class Overview
Time Stamp: 04:10
Welcome Back: Introduction and reminder of class schedule
Upcoming Exam: Details for Exam 2 and review session
Exam Duration: 24 hours, Monday 04:10 PM to Tuesday 04:10 PM
Class Review: Extra credit available during class discussion
Study Guide: Available on Canvas, essential for preparation
Study Guide Details
Structure of Study Guide: Organized by chapters (10 - 15)
Content Requirements: Must be able to describe concepts, identify situations, etc.
Chapters Covered: 10, 11, 12, 13, 14, and 15 (note: chapters 14 and 15 not yet covered)
Chapter 14 and 15 Introduction
Connection Between Chapters: Discussing relationships
Chapter 14: Focus on correlation and scatter plots
Chapter 15: Focus on regression and line of best fit
Key Focus: Relationships between two quantitative variables
Terminology: "Relationship" and "correlation" used interchangeably in this context
Understanding Correlation
Correlation Defined: Refers to a relationship between two quantitative variables
Connection to Categorical Variables: Mention of chi-square tests for two categorical variables (not covered in this course)
Visualization: Use of scatter plots to visualize relationships
Implicit Assumptions in Data Reporting
Example Discussion: SAT/ACT scores rankings by state
Implicit Story: States with higher scores imply better education systems
Caution: Just correlation; does not imply causation
Potential Confounding Variables: The need to explore other variables influencing relationships
Correlation Examples
Short Women and Heart Attacks:
Observed Correlation: Shorter women have fewer heart attacks
Causal Inference Discussion: Height does not directly cause heart attack frequency
Third Variable: Genetics as a potential influencing factor
Car Weight and Fatal Accidents:
Observation: Heavier cars have fewer fatal accidents
Discussion: Consideration of variables like speed and driver experience
Confounding Variables: Driving habits associated with heavier vehicles
Viewing Relationships in Data
Primary Themes:
Causation is not implied by correlation.
Importance of considering lurking variables and confounding factors in relationships.
Visualizing Relationships with Scatter Plots
Scatter Plot Analysis:
Visual examination follows structured steps:
Visualization of data
Summarization of numerical relationships
Key Characteristics to Observe:
Form (linear vs. curvilinear)
Direction (positive vs. negative)
Strength (strong vs. weak)
Outliers: Importance of identifying outliers in scatter plots
Outliers can significantly influence correlation coefficients
Correlation Coefficient
Definition: Symbolized as , it describes the direction and strength of a straight line relationship between two quantitative variables.
Assumptions:
Assumes linearity; applicable primarily in linear relationships.
Influenced by outliers and range restrictions.
Calculation:
Involves converting scores to z-scores and taking averages.
Interpreting Values:
Scale from -1 to +1, indicating various strengths of relationships.
Important Statistical Concepts
Mean and Standard Deviation: Vital when reporting correlation coefficients
Provides context for distribution of data points
Use of Scatter Plots: Important for visualizing and determining relationships
Check for range restrictions in data
Differences in states based on test participation can skew educational assessments
Potential Misuse of Correlation Information
Critical Thinking Needed: Recognizing that correlation does not imply causation
Must consider potential confounding variables when interpreting relationships
Homework and Special Projects: Emphasis on statistical rigor and thorough analysis