Using Scatter Plots and Addressing Regression Assumptions
Restricted Ranges in Data Collection
- Definition: Restricted range occurs when the sample does not fully represent the population, potentially altering observed relationships between variables.
- Impact: Initial data might show a strong correlation, which diminishes or reverses as the sample becomes more comprehensive.
- Can lead to inaccurate conclusions if the initial, limited data is used to represent the entire population.
- Rectification: Employ random sampling methods and recruit participants as widely as possible to ensure heterogeneous subsamples.
Heterogeneous Subsamples
- Description: The population comprises distinct subgroups (e.g., males/females, clinical/non-clinical depression).
- Impact: Analyzing a single subgroup might reveal a strong correlation, while another shows no correlation.
- Combining subgroups can mask or distort the true relationships present within each.
- Scatter Plot Usefulness: Scatter plots help visualize these subgroups and assess relationships within each separately.
Assumptions for Simple Regression
- Independence: Data points must be independent; avoid repeated measures from the same individuals.
- Design issue to ensure each data point represents a unique observation.
- Linearity: Regression assumes a straight-line relationship between variables.
- Curvilinear associations violate this assumption.
- Normality: Data should be normally distributed.
- Homoscedasticity: Equal variance of errors across all levels of the independent variable (opposite of heteroscedasticity).
Nonlinear Relationships
- Example: The relationship between arousal and performance.
- Moderate arousal leads to peak performance.
- Excessive arousal results in decreased performance.
- Graphical Representation: Performance (y-axis) against arousal (x-axis) forms an inverted U-shape.
Non-Normality: Skewness
- Positively Skewed Variables: Variables where scores cluster at one end, creating a long tail on the right.
- Impact on Regression: When both X and Y are positively skewed, scores cluster towards the intercept, violating regression assumptions.
- Solution: Transform data to achieve a more normal distribution before applying regression.
Heteroscedasticity
- Definition: Unequal variance across the range of predictor variable values.
- Visualization: In residual plots, heteroscedasticity appears as a fanning effect, where the spread of residuals increases along the diagonal.
- Ideal Scenario (Homoscedasticity): Equal variance, depicted as a consistent spread of residuals along the diagonal.
- Assessment: Residual plots are used to assess this assumption.
Interpreting Residual Plots
- Skewness: If data points fall predominantly above or below the line, this indicates skewness.
- Curvilinear Association: Data points forming an inverted U-shape suggest a curvilinear relationship, which violates the linearity assumption.
- Heteroscedasticity (Fanning): Uneven distribution of data points with a "shotgun effect" signifies heteroscedasticity.
Reporting Residual Analysis
- Describe the distribution of residuals: random distribution, no bends, even distribution around the zero line, and no fanning.
- Academic Integrity: Use unique wording to avoid plagiarism while conveying the same analytical findings.
- Report should accurately reflect the analysis.