Ch 3
The Scatter Diagram
A bivariate distribution shows an individual’s score on two variables at the same time.
A scatter diagram is a visual representation of the relationship between two variables.
Correlation
A correlation is designed to examine a linear relationship between two variables.
Definition: The correlation coefficient is a statistic that depicts the direction and magnitude of the relationship between two variables.
Regression
Definition: Regression is a statistical method used to predict scores on one variable based on knowing the scores of another variable.
Obtained through a regression line, which is the best-fitting straight line through the points on a scatter diagram.
The regression line is calculated using the formula:
Y' = a + bX
where:Y' is the predicted score.
a is the y-intercept.
b is the slope of the regression line.
X is the independent variable.
The formula for calculating the slope (b) of the regression line is given by:
b = \frac{(\sum XY) - \frac{(\sum X)(\sum Y)}{N}}{(\sum X^2)-(\frac{(\sum X)^2}{N})}
where:N is the number of pairs of scores.
The Best-Fitting Line
The best-fitting line through a series of data points involves both predicted and actual scores, which are rarely identical.
Definition: The difference between them is called a residual, expressed as (Y – Y’).
Testing the Statistical Significance of a Correlation Coefficient
Statistical significance determines whether a finding is likely due to chance.
The process begins with a null hypothesis: H_0: r = 0 (no relationship).
If a relationship is found, the null hypothesis is rejected: H_1: r \neq 0 .
Significance of the correlation coefficient is assessed using a t-distribution, with consideration to degrees of freedom:
t = \frac{r\sqrt{N - 2}}{\sqrt{1 - r^2}}
where N is the sample size.A critical value table is used to determine if t exceeds a critical value; if it does, reject the null hypothesis.
Interpretation of Regression Plots
Regression plots visually represent the relationship between variables.
Criterion Validity Evidence: Correlation is used to establish the relationship between a test score and a well-defined criterion.
Normative data (representative groups) may be employed for comparison.
Examples:
Association between a test of job aptitude and actual performance.
Association between an IQ test and academic performance.
Other Correlation Coefficients
Spearman’s rho: Measures association between two sets of ranks.
Biserial correlation: Measures the relationship between one continuous and one artificial dichotomous variable.
Point biserial correlation: Measures the relationship between one continuous and one real dichotomous variable.
Phi coefficient: Measures the relationship between two dichotomous variables, with one being a true dichotomous variable.
Terms and Issues in the Use of Correlation
Residual and Standard Error of Estimate
A regression calls for both estimated values and actual values.
Definition: The difference between them is called the residual (Y-Y’). The sum of the residuals always equals 0.
The standard deviation of the residuals is known as the standard error of estimate:
SE = \sqrt{\frac{\sum(Y - Y')^2}{N - 2}}A lower standard error indicates a more accurate prediction.
Coefficients of Determination and Alienation
Coefficient of determination (r²): The square of the correlation coefficient, indicating the proportion of variation in one variable (X) explained by another variable (Y).
Coefficient of alienation: A measure of nonassociation, represented as \sqrt{1 - r^2} , indicating the inverse of the coefficient of determination.
Shrinkage
This refers to the decrease in the predictive accuracy of a regression equation when applied to a different population from which it was derived.
Amount of shrinkage is calculable based on variance, covariance, and sample size of the original regression equation.
Cross Validation
Definition: Cross-validation involves applying a regression equation to a group other than the one for which it was created, allowing for an estimate of the standard error of estimate for the relationship between predicted and actual values.
The Correlation-Causation Problem
It is crucial to remember that a relationship between two variables does not imply causation.
Further research is necessary to establish any cause-and-effect relationship.
Third Variable Explanation
Explanation highlights that two variables might be correlated, but this correlation can also be influenced by a third variable.
Restricted Range
This refers to when a sample does not adequately represent the entire range of a variable, possibly distorting the perceived correlation between variables.
Knowledge Check (Example Questions)
Which correlation coefficient expresses the relationship between a continuous variable and an artificial dichotomous variable?
Options:
Pearson r
biserial r
Spearman’s rho
phi coefficient
Professor Lopez's study: a significant correlation was found between the number of hours children played violent video games and their level of aggression. However, the colleague indicates that this does not imply causation.
This example demonstrates the correlation-causation problem.
Note: All content referenced is adapted from Robert M. Kaplan and Dennis P. Saccuzzo's "Psychological Testing: Principles, Application, and Issues, 9th Edition, 2023 Cengage."