Correlation and t-Tests Study Guide

Chapter 6 Study Guide: Correlation and Statistics

1. Definition of Correlation

Correlation: A statistical measure that expresses the extent to which two variables are linearly related. It indicates both the strength and direction of a relationship between two variables.

2. Important Uses of Correlation

Prediction: Utilized to predict values of one variable based on the values of another.
Testing Validity: Applied to test the validity of theoretical constructs or methods.
Theory Development: Assists in formulating and refining theories based on observed relationships.

3. Scatterplots

Definition: A visual representation showing the relationship between two numerical variables.
Construction:
- Plot paired values as points on the Cartesian plane, where each point represents a pair of values (X, Y).
Interpretation:
  - Direction: Can be upward (positive correlation) or downward (negative correlation).
  - Form: Identifies whether the relationship is linear (straight line) or nonlinear (curved).
  - Strength: Observed by how closely the points cluster around a line (the tighter the cluster, the stronger the correlation).
  - Outliers: Points that deviate significantly from the overall pattern, which can influence the correlation.

4. Equation of a Straight Line

Correct Formula: $Y = bX + a$
  - Components:
    - Y: Predicted variable (dependent variable).
    - X: Predictor variable (independent variable).
    - b (slope): Indicates the change in Y for each 1-unit increase in X.
    - a (Y-intercept): Value of Y when X equals zero.

5. Types of Relationships

Positive Relationship: Both X and Y increase together.
Negative Relationship: X increases while Y decreases (not necessarily both decreasing).
Perfect Relationship: All points fall exactly on a straight line, indicated when $r = +1.00$ or $r = -1.00$ .
Imperfect Relationship: Points scatter around a line, represented by values of r that fall between -1 and +1.

6. Correlation Coefficient & Pearson’s r

Correlation Coefficient: A numerical measure indicating the strength and direction of the relationship.
  - Pearson’s r: Specifically measures linear relationships and requires interval or ratio data.
  - Range: Values between -1.00 and +1.00.
  - Computational Formula:
$r = \frac{\Sigma[(X - M_x)(Y - M_y)]}{\sqrt{\Sigma(X - M_x)^{2} \cdot \Sigma(Y - M_y)^{2}}}$

7. Second Interpretation of r (r²)

Coefficient of Determination: $r² = ext{proportion of variance explained}$ .
- Example: If $r = 0.60$ , then $r² = 0.36$ indicates that 36% of the variance in Y is explained by X.

8. Choosing the Correct Correlation

Key Factors:
- Level of Measurement: Types include nominal, ordinal, interval, and ratio levels.
- Shape of Relationship: Determine if the relationship is linear or nonlinear.
Other Types of Correlation:
  - Spearman’s rho (ρ): Applicable for ordinal (ranked) data.
  - Phi (φ): Suitable for two dichotomous (binary) variables.
  - Point-biserial correlation: Involves one dichotomous and one continuous variable.

9. Range, Outliers, and Correlation

Restricted Range: Limits variability in data, leading to a weaker apparent correlation than exists in the true data.
Outliers: Extreme scores can either inflate or weaken the correlation; it is vital to review scatterplots for outliers.

10. Key Skills for Exams

Interpret r: Understand both the direction and strength of correlation.
Calculate or recognize r and r²: Be comfortable working with correlation coefficients.
Choose the correct correlation type: Differentiate which correlation analysis is appropriate for the data at hand.

11. Final Quick Summary

Definitions:
- $r = ext{strength + direction}$
- $r² = ext{variance explained}$

12. Inferential vs Descriptive Statistics

Inferential Statistics: Draws conclusions about populations based on sample data, relevant for estimation, prediction, and hypothesis testing.
Descriptive Statistics: Focuses on summarizing and organizing data without generalizing beyond the sample.

Two Goals of Inferential Statistics:

Estimation: Estimating population parameters from sample statistics.
Hypothesis Testing: Evaluating claims about population parameters using sample data.

13. Random Sampling Methods

With Replacement: The selected individual is returned to the population and can be chosen again.
Without Replacement: The selected individual is removed and cannot be chosen again.

14. A Priori vs A Posteriori Probability

A Priori Probability (Theoretical): Based on logical reasoning or established structures, not on experimental data.
- Formula: $P = \frac{\text{favorable outcomes}}{\text{total outcomes}}$
A Posteriori Probability (Empirical): Based on actual observed data from trials or experiments.
- Formula: $P = \frac{\text{number of observed successes}}{\text{total trials}}$
Long-Run Behavior: As the number of trials approaches infinity, a posteriori probability converges toward a priori probability.

15. Addition Rule of Probability

General Formula: For finding the probability that event A or event B occurs.
$P(A ext{ or } B) = P(A) + P(B) - P(A ext{ and } B)$
Terms:
- $P(A)$ , $P(B)$ = probability of respective events.
- $P(A ext{ and } B)$ = probability of both events occurring simultaneously.
Mutually Exclusive Events: If events A and B cannot occur simultaneously, then $P(A ext{ and } B) = 0$ , simplifying the formula to:
$P(A ext{ or } B) = P(A) + P(B)$ .
Exhaustive Events: Set of events that covers all possible outcomes; one of them must occur.

16. Multiplication Rule of Probability

Definition: Used to find the probability that both events A and B occur.
General Formula: $P(A ext{ and } B) = P(A) imes P(B | A)$
Key Terms:
- $P(B | A)$ : Represents the conditional probability of B given A has occurred.
Types of Events:
  - Independent Events: Events where the occurrence of one does not affect the other.
  - Dependent Events: Events where the occurrence of one affects the probability of the other.
  - Mutually Exclusive Events: Cannot occur together.

17. Continuous Variables & Probability

For continuous variables, probabilities are not calculated for exact values but rather over ranges.

18. t Tests Overview

What is a t Test?: A statistical test used to determine whether there is a significant difference between the means of two groups.
When to Use a t Test:
  - Appropriate when calculating probability over a range of values.
  - Connection to z-scores: Raw scores can be converted into z-scores to compare how far a value is from the mean.
  - z-score formula: $z = \frac{(X - M)}{SD}$
  - Probability Finding Steps:
    1. Convert X to z-score.
    2. Utilize the z-table (normal distribution) to find the probability under the curve.

19. Key Skills for Exams on t Tests

Distinguish between inferential and descriptive statistics.
Apply the addition and multiplication rules correctly.
Identify independent vs dependent events.
Convert raw scores to z-scores.
Interpret probability from the normal distribution.

20. Summary of t Tests

Types of t Tests:
  1. One-Sample t Test: compares the sample mean to a known population mean.
  2. Independent-Samples t Test: compares the means of two different groups.
  3. Dependent-Samples t Test (Paired t Test): compares means from the same participants measured twice.
t Statistic Formula: $t = \frac{\text{difference between means}}{\text{variability of the difference}}$

21. Degrees of Freedom (df)

Explanation: Refers to the number of independent values or quantities which can vary in an analysis without breaking any constraints. E.g.,
- Independent t Test: $df = n_1 + n_2 - 2$
- One-sample t Test: $df = n - 1$ .
More degrees of freedom provide a more accurate estimate of the parameter being analyzed.
It impacts the shape of the t distribution, which mimics the normal curve but is wider until the df increases.

22. Assumptions of t Tests

Random Sampling: Selection of subjects must be random.
Independence of Observations: Each measurement must not influence another.
Normal Distribution: Data should be normally distributed, especially crucial for small sample sizes.
Equal Variances: Certain t-tests require equal variances across compared groups.

23. Effect Size (Cohen’s d)

Definition: A measure of practical significance that indicates the magnitude of difference, not merely statistical significance.

24. Study Guide for Selected Questions

Common Statistical Topics:

Z Tests: Implications, when to apply.
Sampling Distributions: Understand distribution of means vs. individual scores.
Power of Tests: Definitions, implications of sample size, effect size, alpha levels, and one-tailed vs two-tailed tests.

25. Final Summary of Assumptions for z Tests

Assumptions: Random sampling, independence, normal distribution (or large sample size), and known population standard deviation.

26. Additional Insights

Understanding Differences in Distributions: Knowledge of population distributions versus sampling distributions, along with standard deviation and standard error distinctions, is vital for accurate statistical analysis.
Statistical Power and its Components: Recognizing the factors influencing power, such as sample size and effect size, allows for improved anticipation of a test's ability to detect true differences while minimizing error risks.

27. Multiple Choice Questions Review

A review of state-based multiple-choice questions following concept definitions, statistical properties, and applications to ensure a comprehensive grasp of the subject matter.