Untitled Notes

Understanding Relationships Between Variables

Abstract

Understanding relationships between variables is fundamental to psychological research and statistical analysis.
This primer aims to provide second-year psychology students with a comprehensive introduction to bivariate descriptive statistics, focusing on:
- Methods for quantifying and visualizing associations between two continuous variables.
Discussion topics include:
- Conceptual foundations of scatterplots.
- Covariance and correlation coefficients.
- Emphasis on Pearson's correlation and Spearman's rank correlation.
Key distinctions addressed:
- Different types of relationships.
- Interpretation frameworks.
- Practical applications in psychological research.
Concrete examples relevant to cognitive and social psychology, such as human perception of AI agents.
Common pitfalls in interpretation and analysis are highlighted, alongside assumptions and limitations.
Visual aids and a worked example demonstrate practical applications of these concepts, preparing students to analyze bivariate relationships confidently in their research.
The primer emphasizes accessibility while maintaining academic rigor, balancing conceptual understanding with technical precision.
Keywords: correlation, scatterplot, bivariate statistics, Pearson correlation, covariance.

Introduction

Psychological research rarely examines variables in isolation. Examples include:
- Investigating how sleep affects mood.
- Understanding how social media use relates to well-being.
- Analyzing how AI agent appearance influences trust judgments.
Bivariate descriptive statistics are essential for:
- Exploring, quantifying, and communicating relationships between two variables simultaneously.
- Fundamental questions: Do these variables tend to change together? If so, in what direction and how strongly?
Importance of bivariate descriptive statistics:
- Foundations for advanced inferential techniques.
- Crucial insights for theory development.
As noted by Madsen (2016), understanding relationships between variables benefits organizational decision-making and scientific progress across disciplines.
Correlation and regression analyses are vital for observational studies where experimental manipulation is impractical or unethical.
This primer systematically introduces core bivariate descriptive statistics concepts:
1. Visual methods for exploring relationships.
2. Quantitative measures capturing association strength and direction.
Emphasis on the power and limitations of these tools, fostering technical proficiency and critical statistical thinking.

Visualizing Bivariate Relationships: The Scatterplot

Visualization begins the exploration of bivariate analysis.
- Scatterplot (or scatter diagram)
- Displays the relationship between two continuous variables by plotting individual observations as points in a two-dimensional space (Madsen, 2016).
- Each point represents one observation, whose position reflects the values of both variables.

The Anatomy of a Scatterplot

One variable is plotted on the horizontal axis (X) and the other on the vertical axis (Y).
Conventionally:
- Independent or predictor variable on the X-axis.
- Dependent or outcome variable on the Y-axis.
Example: Parent Tracking Sleep and Mood (Navarro and Foxcroft, 2025)
- Parent's sleep hours plotted on X-axis and grumpiness rating on Y-axis over 100 days.

Reading Scatterplot Patterns

Scatterplots reveal three features of bivariate relationships:
1. Direction:
- Positive: variables tend to increase together.
- Negative: variables move in opposite directions.
- Example: negative relationship in sleep-mood: as sleep hours increase, grumpiness ratings decrease.
1. Strength:
- Strong relationships show tight clustering of points; weak ones show a more diffuse spread.
- Same direction can differ in strength (e.g., parent sleep vs. infant sleep).
1. Form:
- Indicates whether relationships are linear (straight line) or nonlinear (curved).
- Most correlation methods assume linear relationships, making this diagnostic feature critical.

Quantifying Relationships: From Covariance to Correlation

Numerical measures complement visual insights provided by scatterplots.

Covariance: The Foundation

Covariance extends variance from univariate to bivariate analysis.
- Measures how two variables deviate together from their means.
Formula for sample covariance of variables X and Y:
ext{Cov}(X, Y) = rac{ extstyle oldsymbol{igg(} extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( extstyle igg( (X_{i} - ar{X})(Y_{i} - ar{Y}) extstyle igg){n} extstyle igg){n-1} extstyle igg){n-1} extstyle igg){n-1}
Explanation of components:
- $X_{i}, Y_{i}$ : Individual paired observations of variables.
- $\bar{X}, \bar{Y}$ : Mean values of X and Y.
- $(X_{i} - ar{X})(Y_{i} - ar{Y})$: Product of deviations from the mean for each pair.
- $extstyle \boldsymbol{\bigg(} extstyle \bigg( ∑ extstyle \bigg( extstyle \bigg( extstyle \bigg( (X_{i} - \bar{X})(Y_{i} - \bar{Y}) extstyle \bigg)<em>{n} extstyle \bigg)</em>{n-1} extstyle \bigg)<em>{n-1} extstyle \bigg)</em>{n-1}$ : Average of these co-deviations.
Key points:
- Positive covariance signifies that both variables tend to increase together.
- Negative covariance suggests an inverse relationship.
- Zero covariance indicates no linear relationship.
- Limitation: Magnitude depends on units of measurement, making comparison difficult.

Pearson's Correlation Coefficient

Pearson's product-moment correlation coefficient (denoted r) addresses scaling issues by standardizing covariance.
Formula:
$r = rac{ ext{Cov}(X,Y)}{SD_{X} imes SD_{Y}}$
Explanation of components:
- Cov(X,Y): Covariance between X and Y.
- $SD_{X}, SD_{Y}$ : Standard deviations of X and Y respectively.
Properties of Pearson's r:
- Ranges between -1 to +1.
- Perfect positive correlation (r = +1).
- Perfect negative correlation (r = -1).
- Zero indicates no linear relationship.
- Symmetric: r(X,Y) = r(Y,X).
- Unitless and specifically linear.

Interpreting Correlation Coefficients

Guidelines for interpreting correlation strengths (Table 1):

Correlation Range	Strength	Direction
-1.0 to -0.9	Very strong	Negative
-0.9 to -0.7	Strong	Negative
-0.7 to -0.4	Moderate	Negative
-0.4 to -0.2	Weak	Negative
0.2 to 0.4	Weak	Positive
0.4 to 0.7	Moderate	Positive
0.7 to 0.9	Strong	Positive
0.9 to 1.0	Very strong	Positive
Context matters in interpretation:

A correlation of r = 0.3 is significant in one domain and weak in another.
Practical significance differs from statistical significance.

The Danger of Anscombe's Quartet

Anscombe's quartet illustrates the importance of visualization.
- Four datasets yield identical correlations but vastly different patterns:
1. Linear relationship.
2. Curvilinear pattern.
3. Disrupted linear relationship by an outlier.
4. No relationship except for one extreme point.
Emphasizes that correlation coefficients must not be interpreted without examining scatterplots.

Beyond Pearson: Alternative Correlation Measures

Pearson's correlation is default but has significant limitations. Several alternative measures exist, particularly for different data characteristics or robustness against Pearson's assumptions.

Spearman's Rank Correlation

Spearman's rank correlation coefficient (denoted ρ or rs) is an appropriate alternative when:
- Data are ordinal.
- Relationships are monotonic (not strictly linear).
To compute Spearman's ρ:
1. Convert each variable to ranks (tied values receive mean rank).
2. Calculate Pearson's r on ranks.
Range: -1 to +1 with same interpretation as Pearson's r, but measures monotonic association.
Values skewed or with outliers are less influential; therefore, Spearman is more robust.

Robust Correlation Methods

Modern methods offer robust alternatives to Pearson's correlation:
- Winsorized correlation replaces extreme values for computation.
- Percentage bend correlation ignores extreme values entirely.
These methods are crucial for advanced research but not covered in introductory analysis.

Practical Applications in Psychology

Bivariate descriptive statistics are applied widely in psychological research. Here are illustrative examples:

Example 1: Cognitive Load and Performance

Research on how mental workload affects task performance.
Variables: Workload (number of items to remember) and performance (response time in milliseconds).
Scatterplot reveals:
- Linear increase in response time with workload up to 4 items.
- Nonlinear increase beyond 4 items.
Computing Pearson's r yields r = 0.65, masking relationship complexity. Separate analyses yield more nuanced understanding:
- Low workload: r = 0.82.
- High workload: r = 0.71.

Example 2: Trust in AI Agents

Study on the connection between AI agents' human-likeness and trust judgments.
Variables: Anthropomorphism ratings and trustworthiness ratings.
Moderate positive correlation found (r = 0.45).
Scatterplot reveals:
- Relationship holds for moderately human-like agents.
- Extremely human-like agents rated lower due to residing in the "uncanny valley."

Example 3: Academic Achievement and Study Time

Research examining weekly study hours versus exam performance.
Variable skew present in self-reported study time.
Pearson's r computation yields r = 0.32.
Analyzing extreme cases reveals potential distortions; Spearman's ρ yields 0.48, demonstrating a more robust relationship.

Worked Example: Computing and Interpreting Correlation

Data Collection: 8 participants, perceived stress (1-10 scale) and social hours per week.
Participant
Stress (X)
Social Hours (Y)
1
8
5
2
6
8
3
7
6
4
9
3
5
5
10
6
4
12
7
8
4
8
6
7
Calculate Means:
- $\bar{X} = (8 + 6 + 7 + 9 + 5 + 4 + 8 + 6) / 8 = 6.625$
- $\bar{Y} = (5 + 8 + 6 + 3 + 10 + 12 + 4 + 7) / 8 = 6.875$
Calculate Deviations:
- Compute ( $X_{i} - \bar{X}$ )( $Y_{i} - \bar{Y}$ ) for participants:
- For participant 1:
  - $ ext{Deviation Product} = (8 - 6.625) imes (5 - 6.875) = 1.375 imes -1.875 = -2.578$
- Repeat and sum:
  - $ext{Sum} = extstyle \boldsymbol{\bigg(} extstyle \bigg( Σ(X_{i} - \bar{X})(Y_{i} - \bar{Y}) = -35.375$
Calculate Standard Deviations:
- $SD_{X} = 1.552, SD_{Y} = 3.272$
Final Calculation of Correlation:
- $r = rac{-35.375/(8-1)}{(1.552 imes 3.272)} = -0.99$
Interpretation:
- r = -0.99 indicates a strong negative relationship between perceived stress and social interaction hours.
- Negative sign indicates inverse nature; more social interaction correlates with lower stress levels.
- Important note: Correlation does not imply causation.

Participant	Stress (X)	Social Hours (Y)
1	8	5
2	6	8
3	7	6
4	9	3
5	5	10
6	4	12
7	8	4
8	6	7

Common Pitfalls and Misconceptions

Key errors to avoid:

Confusing Correlation with Causation

Strong correlations can arise from varied scenarios:
- X causes Y
- Y causes X
- Third variable Z causes both
- Coincidental relationship
Observational data alone cannot distinguish these possibilities.

Ignoring Nonlinearity

Pearson's correlation only captures linear relationships, can misrepresent strength for curved relationships.
Always inspect scatterplots for curvilinear patterns before interpreting correlations.

Sensitivity to Outliers

Pearson's correlation is affected significantly by outliers.
Examining scatterplots is vital for identifying influential points.
Utilize robust alternatives like Spearman's correlation when necessary.

Restricted Range Problems

Narrow variable ranges can artificially weaken correlations, often occurring in psychological research.

Ecological Fallacy

Do not generalize individual-level conclusions from aggregate-level correlations.

Statistical vs. Practical Significance

Important to recognize that large samples can yield significant but trivial correlations, while small samples might miss real effects.

Assumptions of Pearson's Correlation

Linearity: Pearson computes linear relationships; nonlinear true relationships may compress correlation.
Level of Measurement: Required to be interval or ratio scale. Ordinal data may misrepresent relationships when using Pearson.
Bivariate Normality: Each variable should ideally follow a normal distribution, becomes critical for inferential procedures.
Homoscedasticity: Variance of Y should remain constant across X values; heteroscedasticity can mislead.
Independence of Observations: Each (X,Y) pair must be independent; violations distort correlation.

Effect Size and Practical Significance

Coefficient of Determination: Defined as r² which measures variance explained by a variable.
- Example: For height-weight correlation of r = 0.70, explains 49% of weight variance.
- Importance of small proportions of variance in complex psychological phenomena.

Context-Dependent Interpretation

Strength of correlations varies greatly; r = 0.30 viewed differently in different domains (social vs. cognitive psychology).
Consider research questions and domain expectations when interpreting correlation significance.

Sample Size Considerations

Correlation significance is heavily influenced by sample size; appropriate context required for interpretation.

Conclusion: Key Takeaways

Start with scatterplots for visual insights.
Pearson's correlation coefficient provides a standardized linear association measure.
Covariance lays the foundation, but correlation standardizes it.
Spearman's correlation is more robust for ordinal and some nonlinear data scenarios.
Correlation does not imply causation.
Do not conflate linear and nonlinear relationships.
Outlier checks are essential before concluding correlations.
Context drives practical significance interpretation.
Range restriction can dampen correlation estimates.
Differentiating statistical significance from practical importance is critical.

Mastering these concepts equips psychology students to explore relationships in research confidently and critically evaluate claims made in literature.

References

Heathcote, A., Brown, S., Wagenmakers, EJ. (2015). An Introduction to Good Practices in Cognitive Modeling. In:
Forstmann, B., Wagenmakers, EJ. (eds) An Introduction to Model-Based Cognitive Neuroscience. Springer.
Madsen, B. S. (2016). Statistics for non-statisticians (2nd ed.). Springer-Verlag.
Navarro, D. J., & Foxcroft, D. R. (2025). Learning statistics with jamovi: A tutorial for beginners in statistical analysis. Open Book Publishers.
Wilcox, R. R. (2023). A guide to robust statistical methods (2nd ed.). Springer.