Correlation

Chapter 6: Describing the relationship between two variables

Research on Relations Between Variables

  • Many research questions are relational in nature, such as:

    • Whether the occurrence of depression is related to gender.

    • How income relates to one’s psychological well-being.

    • Whether the time spent playing violent video games relates to one’s tendency to be violent.

Data Analysis with Correlation

  • When investigating a relational question (non-causal) between two variables measured on an INTERVAL or RATIO scale, one can conduct correlational analysis.

  • Correlational analysis reveals:

    • The direction of the association between the two variables.

    • The strength of the association (co-variation) between these variables.

Steps in Relational Research

  1. Ask a Relational Question:

    • Example: Is there a relation between (X) length of psychotherapy and (Y) psychological well-being?

  2. Construct the Measures:

    • Length of Psychotherapy (Variable X) = number of months.

    • Psychological Well-Being (Variable Y) = Score on a 100-point scale.

  3. Make Paired Observations:

    • Assess X and Y for each client; two variables to measure for each subject.

  4. Analyze the Relationship:

    • Visually through scatterplots.

    • Mathematically through correlation coefficient (r), coefficient of determination (r²), and regression analysis.

Co-Variation Explained

  • Analyzing the numerical relation between two variables (e.g., the length of therapy and psychological well-being) involves examining "co-variation", which means:

    • One variable changes together (co) with changes in the other variable.

    • For instance, when one variable increases as the other also increases, the relationship demonstrates positive co-variation.

Visualizing Relationships

  • Example of Scatterplot:

    • X-axis: Months of therapy (X)

    • Y-axis: Well-being (Y)

    • Each point on the scatterplot represents an XY pair.

Types of Relationships

  1. Perfect Positive:

    • All points fall on a line that slopes upward.

    • When X increases, Y increases by the same amount.

  2. Imperfect Positive:

    • Points cluster around a line that slopes upward.

    • When X increases, Y tends to increase.

  3. Imperfect Negative:

    • Points cluster around a line that slopes downward.

    • When X increases, Y tends to decrease.

  4. Perfect Negative:

    • All points fall on a line that slopes downward.

    • When X increases, Y decreases by the same amount.

  5. Zero Relationship:

    • Points do not cluster around any line.

Non-Linear Relationships

  • Relationships may involve non-linear relations when data is summarized by a line that is not straight.

Goals of This Chapter

  • Express mathematically the direction and strength of linear relationships.

  • Analyze relationships as positive or negative.

Key Concepts

Pearson Correlation Coefficient (r)
  • An index that describes the strength and direction of the linear relationship between two interval and/or ratio variables.

Coefficient of Determination (r²)
  • Also known as the squared correlation coefficient.

  • An index that describes only the strength of the linear relationship between two interval and/or ratio variables.

  • A technique for describing the strength of co-variation.

Computation Approaches

  • Two approaches to compute correlation will be discussed:

    1. Definitional formula

    2. Computational formula

Definitional Formula of Pearson Correlation Coefficient

  • This formula illustrates correlation as a standardized (Z score based) co-variation between two variables.

  • It indicates that r (correlation) is unaffected by the unit of measurement, making it independent of the measurement units used.

Reminder for Z-scores
  • The Z-scores for variables X and Y can be computed as follows:
    ZX = \frac{X - \bar{X}}{SDX}
    ZY = \frac{Y - \bar{Y}}{SDY}

Types of Relationships in Employment Context

  • Example:

    • Hours Worked (X) and Salary Earned (Y)

    • Points can represent different subjects with textual data of hours worked and respective salaries.

Computation of Correlation Example

  • Data will illustrate a perfect positive relationship; thus, interpretation follows:

    • The “+” indicates that the relationship is positive.

    • A correlation of “1.00” indicates a perfect relationship.

Coefficient of Determination (r²)

  • r² shows the proportion (percentage) of variability in Y explained by the variability in X, and vice versa.

  • For example, if r² = 1.00 between salary earned (Y) and hours worked (X), then variability in salary can be perfectly explained by the variability in hours worked.

Definitional Formula (Second Example)

  • Sample question:

    • What is the relationship between X and Y?

    • Example measures for X and Y should be constructed and paired for analysis.

Summary of Results

  • r ranges from -1.00 to +1.00:

    • Imperfect relations fall within this range.

    • r cannot be less than -1.00 or greater than +1.00.

Understanding the Definition of Correlation

  • Why convert raw scores to Z-scores?

    • To standardize the scores without altering the distribution or relative distances.

Interpretation of the Correlation Coefficient (r)

  • r values indicate strength and direction:

    • Absolute Value: Indicates the strength of the relationship.

    • Sign: Indicates the direction (+ or -).

    • Example Strength Ratings:

      • +0.60 to +0.80: Very Strong

      • +0.40 to +0.60: Strong

      • +0.20 to +0.40: Moderate

Conditions for Appropriate Use of r

  • Use r when:

    1. Evaluating strength and direction of linear relationships between two variables.

    2. Both X and Y are interval and/or ratio—nominal or ordinal cannot be used.

    3. Scatterplot appears linear.

    4. Scatterplot displays homoscedasticity.

Linear vs. Non-Linear Relationships

  • Correlation coefficient r will underestimate the strength of a non-linear relationship.

  • r accurately describes the strength of a linear relationship.

Homoscedasticity vs. Heteroscedasticity

  • Homoscedasticity: Indicates constant variability at all levels of X.

  • Heteroscedasticity: Indicates varying variability at different levels of X.

Proper Uses of Correlation

  • Appropriate when:

    • X and Y are measured on interval or ratio scales.

    • Scatterplots are linear and exhibit homoscedasticity.

  • Questionable or improper use occurs when:

    • Using nominal or ordinal measurements, or when scatterplots are non-linear or heteroscedastic.

  • Conclusion should focus on the relationship rather than causation.