Ch 3

The Scatter Diagram
  • A bivariate distribution shows an individual’s score on two variables at the same time.

  • A scatter diagram is a visual representation of the relationship between two variables.

Correlation
  • A correlation is designed to examine a linear relationship between two variables.

  • Definition: The correlation coefficient is a statistic that depicts the direction and magnitude of the relationship between two variables.

Regression
  • Definition: Regression is a statistical method used to predict scores on one variable based on knowing the scores of another variable.

  • Obtained through a regression line, which is the best-fitting straight line through the points on a scatter diagram.

  • The regression line is calculated using the formula:
    <br>Y=a+bX<br><br>Y' = a + bX <br>
    where:

    • YY' is the predicted score.

    • aa is the y-intercept.

    • bb is the slope of the regression line.

    • XX is the independent variable.

  • The formula for calculating the slope (b) of the regression line is given by:
    b=(XY)(X)(Y)N(X2)((X)2N)b = \frac{(\sum XY) - \frac{(\sum X)(\sum Y)}{N}}{(\sum X^2)-(\frac{(\sum X)^2}{N})}
    where:

    • NN is the number of pairs of scores.

The Best-Fitting Line
  • The best-fitting line through a series of data points involves both predicted and actual scores, which are rarely identical.

  • Definition: The difference between them is called a residual, expressed as (Y – Y’).

Testing the Statistical Significance of a Correlation Coefficient
  • Statistical significance determines whether a finding is likely due to chance.

  • The process begins with a null hypothesis: H0:r=0H_0: r = 0 (no relationship).

  • If a relationship is found, the null hypothesis is rejected: H1:r0H_1: r \neq 0.

  • Significance of the correlation coefficient is assessed using a t-distribution, with consideration to degrees of freedom:
    t=rN21r2t = \frac{r\sqrt{N - 2}}{\sqrt{1 - r^2}}
    where NN is the sample size.

  • A critical value table is used to determine if tt exceeds a critical value; if it does, reject the null hypothesis.

Interpretation of Regression Plots
  • Regression plots visually represent the relationship between variables.

  • Criterion Validity Evidence: Correlation is used to establish the relationship between a test score and a well-defined criterion.

  • Normative data (representative groups) may be employed for comparison.

  • Examples:

    • Association between a test of job aptitude and actual performance.

    • Association between an IQ test and academic performance.

Other Correlation Coefficients
  • Spearman’s rho: Measures association between two sets of ranks.

  • Biserial correlation: Measures the relationship between one continuous and one artificial dichotomous variable.

  • Point biserial correlation: Measures the relationship between one continuous and one real dichotomous variable.

  • Phi coefficient: Measures the relationship between two dichotomous variables, with one being a true dichotomous variable.

Terms and Issues in the Use of Correlation
Residual and Standard Error of Estimate
  • A regression calls for both estimated values and actual values.

  • Definition: The difference between them is called the residual (Y-Y’). The sum of the residuals always equals 0.

  • The standard deviation of the residuals is known as the standard error of estimate:
    SE=(YY)2N2SE = \sqrt{\frac{\sum(Y - Y')^2}{N - 2}}

  • A lower standard error indicates a more accurate prediction.

Coefficients of Determination and Alienation
  • Coefficient of determination (r²): The square of the correlation coefficient, indicating the proportion of variation in one variable (X) explained by another variable (Y).

  • Coefficient of alienation: A measure of nonassociation, represented as 1r2\sqrt{1 - r^2}, indicating the inverse of the coefficient of determination.

Shrinkage
  • This refers to the decrease in the predictive accuracy of a regression equation when applied to a different population from which it was derived.

  • Amount of shrinkage is calculable based on variance, covariance, and sample size of the original regression equation.

Cross Validation
  • Definition: Cross-validation involves applying a regression equation to a group other than the one for which it was created, allowing for an estimate of the standard error of estimate for the relationship between predicted and actual values.

The Correlation-Causation Problem
  • It is crucial to remember that a relationship between two variables does not imply causation.

  • Further research is necessary to establish any cause-and-effect relationship.

Third Variable Explanation
  • Explanation highlights that two variables might be correlated, but this correlation can also be influenced by a third variable.

Restricted Range
  • This refers to when a sample does not adequately represent the entire range of a variable, possibly distorting the perceived correlation between variables.

Knowledge Check (Example Questions)
  1. Which correlation coefficient expresses the relationship between a continuous variable and an artificial dichotomous variable?

    • Options:

      • Pearson r

      • biserial r

      • Spearman’s rho

      • phi coefficient

  2. Professor Lopez's study: a significant correlation was found between the number of hours children played violent video games and their level of aggression. However, the colleague indicates that this does not imply causation.

    • This example demonstrates the correlation-causation problem.


Note: All content referenced is adapted from Robert M. Kaplan and Dennis P. Saccuzzo's "Psychological Testing: Principles, Application, and Issues, 9th Edition, 2023 Cengage."