Using Scatter Plots and Addressing Regression Assumptions

Restricted Ranges in Data Collection

  • Definition: Restricted range occurs when the sample does not fully represent the population, potentially altering observed relationships between variables.
  • Impact: Initial data might show a strong correlation, which diminishes or reverses as the sample becomes more comprehensive.
  • Can lead to inaccurate conclusions if the initial, limited data is used to represent the entire population.
  • Rectification: Employ random sampling methods and recruit participants as widely as possible to ensure heterogeneous subsamples.

Heterogeneous Subsamples

  • Description: The population comprises distinct subgroups (e.g., males/females, clinical/non-clinical depression).
  • Impact: Analyzing a single subgroup might reveal a strong correlation, while another shows no correlation.
  • Combining subgroups can mask or distort the true relationships present within each.
  • Scatter Plot Usefulness: Scatter plots help visualize these subgroups and assess relationships within each separately.

Assumptions for Simple Regression

  • Independence: Data points must be independent; avoid repeated measures from the same individuals.
  • Design issue to ensure each data point represents a unique observation.
  • Linearity: Regression assumes a straight-line relationship between variables.
  • Curvilinear associations violate this assumption.
  • Normality: Data should be normally distributed.
  • Homoscedasticity: Equal variance of errors across all levels of the independent variable (opposite of heteroscedasticity).

Nonlinear Relationships

  • Example: The relationship between arousal and performance.
  • Moderate arousal leads to peak performance.
  • Excessive arousal results in decreased performance.
  • Graphical Representation: Performance (y-axis) against arousal (x-axis) forms an inverted U-shape.

Non-Normality: Skewness

  • Positively Skewed Variables: Variables where scores cluster at one end, creating a long tail on the right.
  • Impact on Regression: When both X and Y are positively skewed, scores cluster towards the intercept, violating regression assumptions.
  • Solution: Transform data to achieve a more normal distribution before applying regression.

Heteroscedasticity

  • Definition: Unequal variance across the range of predictor variable values.
  • Visualization: In residual plots, heteroscedasticity appears as a fanning effect, where the spread of residuals increases along the diagonal.
  • Ideal Scenario (Homoscedasticity): Equal variance, depicted as a consistent spread of residuals along the diagonal.
  • Assessment: Residual plots are used to assess this assumption.

Interpreting Residual Plots

  • Skewness: If data points fall predominantly above or below the line, this indicates skewness.
  • Curvilinear Association: Data points forming an inverted U-shape suggest a curvilinear relationship, which violates the linearity assumption.
  • Heteroscedasticity (Fanning): Uneven distribution of data points with a "shotgun effect" signifies heteroscedasticity.

Reporting Residual Analysis

  • Describe the distribution of residuals: random distribution, no bends, even distribution around the zero line, and no fanning.
  • Academic Integrity: Use unique wording to avoid plagiarism while conveying the same analytical findings.
  • Report should accurately reflect the analysis.