Ch 3

The Scatter Diagram

  • A bivariate distribution shows an individual’s score on two variables at the same time.

  • A scatter diagram is a visual representation of the relationship between two variables.

Correlation

  • A correlation is designed to examine a linear relationship between two variables.

  • Definition: The correlation coefficient is a statistic that depicts the direction and magnitude of the relationship between two variables.

Regression

  • Definition: Regression is a statistical method used to predict scores on one variable based on knowing the scores of another variable.

  • Obtained through a regression line, which is the best-fitting straight line through the points on a scatter diagram.

  • The regression line is calculated using the formula:

    Y' = a + bX

    where:

    • Y' is the predicted score.

    • a is the y-intercept.

    • b is the slope of the regression line.

    • X is the independent variable.

  • The formula for calculating the slope (b) of the regression line is given by:
    b = \frac{(\sum XY) - \frac{(\sum X)(\sum Y)}{N}}{(\sum X^2)-(\frac{(\sum X)^2}{N})}
    where:

    • N is the number of pairs of scores.

The Best-Fitting Line

  • The best-fitting line through a series of data points involves both predicted and actual scores, which are rarely identical.

  • Definition: The difference between them is called a residual, expressed as (Y – Y’).

Testing the Statistical Significance of a Correlation Coefficient

  • Statistical significance determines whether a finding is likely due to chance.

  • The process begins with a null hypothesis: H_0: r = 0 (no relationship).

  • If a relationship is found, the null hypothesis is rejected: H_1: r \neq 0 .

  • Significance of the correlation coefficient is assessed using a t-distribution, with consideration to degrees of freedom:
    t = \frac{r\sqrt{N - 2}}{\sqrt{1 - r^2}}
    where N is the sample size.

  • A critical value table is used to determine if t exceeds a critical value; if it does, reject the null hypothesis.

Interpretation of Regression Plots

  • Regression plots visually represent the relationship between variables.

  • Criterion Validity Evidence: Correlation is used to establish the relationship between a test score and a well-defined criterion.

  • Normative data (representative groups) may be employed for comparison.

  • Examples:

    • Association between a test of job aptitude and actual performance.

    • Association between an IQ test and academic performance.

Other Correlation Coefficients

  • Spearman’s rho: Measures association between two sets of ranks.

  • Biserial correlation: Measures the relationship between one continuous and one artificial dichotomous variable.

  • Point biserial correlation: Measures the relationship between one continuous and one real dichotomous variable.

  • Phi coefficient: Measures the relationship between two dichotomous variables, with one being a true dichotomous variable.

Terms and Issues in the Use of Correlation

Residual and Standard Error of Estimate
  • A regression calls for both estimated values and actual values.

  • Definition: The difference between them is called the residual (Y-Y’). The sum of the residuals always equals 0.

  • The standard deviation of the residuals is known as the standard error of estimate:
    SE = \sqrt{\frac{\sum(Y - Y')^2}{N - 2}}

  • A lower standard error indicates a more accurate prediction.

Coefficients of Determination and Alienation

  • Coefficient of determination (r²): The square of the correlation coefficient, indicating the proportion of variation in one variable (X) explained by another variable (Y).

  • Coefficient of alienation: A measure of nonassociation, represented as \sqrt{1 - r^2} , indicating the inverse of the coefficient of determination.

Shrinkage

  • This refers to the decrease in the predictive accuracy of a regression equation when applied to a different population from which it was derived.

  • Amount of shrinkage is calculable based on variance, covariance, and sample size of the original regression equation.

Cross Validation

  • Definition: Cross-validation involves applying a regression equation to a group other than the one for which it was created, allowing for an estimate of the standard error of estimate for the relationship between predicted and actual values.

The Correlation-Causation Problem

  • It is crucial to remember that a relationship between two variables does not imply causation.

  • Further research is necessary to establish any cause-and-effect relationship.

Third Variable Explanation

  • Explanation highlights that two variables might be correlated, but this correlation can also be influenced by a third variable.

Restricted Range

  • This refers to when a sample does not adequately represent the entire range of a variable, possibly distorting the perceived correlation between variables.

Knowledge Check (Example Questions)

  1. Which correlation coefficient expresses the relationship between a continuous variable and an artificial dichotomous variable?

    • Options:

      • Pearson r

      • biserial r

      • Spearman’s rho

      • phi coefficient

  2. Professor Lopez's study: a significant correlation was found between the number of hours children played violent video games and their level of aggression. However, the colleague indicates that this does not imply causation.

    • This example demonstrates the correlation-causation problem.


Note: All content referenced is adapted from Robert M. Kaplan and Dennis P. Saccuzzo's "Psychological Testing: Principles, Application, and Issues, 9th Edition, 2023 Cengage."