Correlation and t-Tests Study Guide

Chapter 6 Study Guide: Correlation and Statistics

1. Definition of Correlation

  • Correlation: A statistical measure that expresses the extent to which two variables are linearly related. It indicates both the strength and direction of a relationship between two variables.

2. Important Uses of Correlation

  • Prediction: Utilized to predict values of one variable based on the values of another.
  • Testing Validity: Applied to test the validity of theoretical constructs or methods.
  • Theory Development: Assists in formulating and refining theories based on observed relationships.

3. Scatterplots

  • Definition: A visual representation showing the relationship between two numerical variables.
  • Construction:
      - Plot paired values as points on the Cartesian plane, where each point represents a pair of values (X, Y).
  • Interpretation:
      - Direction: Can be upward (positive correlation) or downward (negative correlation).
      - Form: Identifies whether the relationship is linear (straight line) or nonlinear (curved).
      - Strength: Observed by how closely the points cluster around a line (the tighter the cluster, the stronger the correlation).
      - Outliers: Points that deviate significantly from the overall pattern, which can influence the correlation.

4. Equation of a Straight Line

  • Correct Formula: Y=bX+aY = bX + a
      - Components:
        - Y: Predicted variable (dependent variable).
        - X: Predictor variable (independent variable).
        - b (slope): Indicates the change in Y for each 1-unit increase in X.
        - a (Y-intercept): Value of Y when X equals zero.

5. Types of Relationships

  • Positive Relationship: Both X and Y increase together.
  • Negative Relationship: X increases while Y decreases (not necessarily both decreasing).
  • Perfect Relationship: All points fall exactly on a straight line, indicated when r=+1.00r = +1.00 or r=1.00r = -1.00.
  • Imperfect Relationship: Points scatter around a line, represented by values of r that fall between -1 and +1.

6. Correlation Coefficient & Pearson’s r

  • Correlation Coefficient: A numerical measure indicating the strength and direction of the relationship.
      - Pearson’s r: Specifically measures linear relationships and requires interval or ratio data.
      - Range: Values between -1.00 and +1.00.
      - Computational Formula:
    r=Σ[(XMx)(YMy)]Σ(XMx)2Σ(YMy)2r = \frac{\Sigma[(X - M_x)(Y - M_y)]}{\sqrt{\Sigma(X - M_x)^{2} \cdot \Sigma(Y - M_y)^{2}}}

7. Second Interpretation of r (r²)

  • Coefficient of Determination: r2=extproportionofvarianceexplainedr² = ext{proportion of variance explained}.
      - Example: If r=0.60r = 0.60, then r2=0.36r² = 0.36 indicates that 36% of the variance in Y is explained by X.

8. Choosing the Correct Correlation

  • Key Factors:
      - Level of Measurement: Types include nominal, ordinal, interval, and ratio levels.
      - Shape of Relationship: Determine if the relationship is linear or nonlinear.
  • Other Types of Correlation:
      - Spearman’s rho (ρ): Applicable for ordinal (ranked) data.
      - Phi (φ): Suitable for two dichotomous (binary) variables.
      - Point-biserial correlation: Involves one dichotomous and one continuous variable.

9. Range, Outliers, and Correlation

  • Restricted Range: Limits variability in data, leading to a weaker apparent correlation than exists in the true data.
  • Outliers: Extreme scores can either inflate or weaken the correlation; it is vital to review scatterplots for outliers.

10. Key Skills for Exams

  • Interpret r: Understand both the direction and strength of correlation.
  • Calculate or recognize r and r²: Be comfortable working with correlation coefficients.
  • Choose the correct correlation type: Differentiate which correlation analysis is appropriate for the data at hand.

11. Final Quick Summary

  • Definitions:
      - r=extstrength+directionr = ext{strength + direction}
      - r2=extvarianceexplainedr² = ext{variance explained}

12. Inferential vs Descriptive Statistics

  • Inferential Statistics: Draws conclusions about populations based on sample data, relevant for estimation, prediction, and hypothesis testing.
  • Descriptive Statistics: Focuses on summarizing and organizing data without generalizing beyond the sample.
Two Goals of Inferential Statistics:
  1. Estimation: Estimating population parameters from sample statistics.
  2. Hypothesis Testing: Evaluating claims about population parameters using sample data.

13. Random Sampling Methods

  • With Replacement: The selected individual is returned to the population and can be chosen again.
  • Without Replacement: The selected individual is removed and cannot be chosen again.

14. A Priori vs A Posteriori Probability

  • A Priori Probability (Theoretical): Based on logical reasoning or established structures, not on experimental data.
      - Formula: P=favorable outcomestotal outcomesP = \frac{\text{favorable outcomes}}{\text{total outcomes}}
  • A Posteriori Probability (Empirical): Based on actual observed data from trials or experiments.
      - Formula: P=number of observed successestotal trialsP = \frac{\text{number of observed successes}}{\text{total trials}}
  • Long-Run Behavior: As the number of trials approaches infinity, a posteriori probability converges toward a priori probability.

15. Addition Rule of Probability

  • General Formula: For finding the probability that event A or event B occurs.
    P(AextorB)=P(A)+P(B)P(AextandB)P(A ext{ or } B) = P(A) + P(B) - P(A ext{ and } B)
  • Terms:
      - P(A)P(A), P(B)P(B) = probability of respective events.
      - P(AextandB)P(A ext{ and } B) = probability of both events occurring simultaneously.
  • Mutually Exclusive Events: If events A and B cannot occur simultaneously, then P(AextandB)=0P(A ext{ and } B) = 0, simplifying the formula to:
    P(AextorB)=P(A)+P(B)P(A ext{ or } B) = P(A) + P(B).
  • Exhaustive Events: Set of events that covers all possible outcomes; one of them must occur.

16. Multiplication Rule of Probability

  • Definition: Used to find the probability that both events A and B occur.
  • General Formula: P(AextandB)=P(A)imesP(BA)P(A ext{ and } B) = P(A) imes P(B | A)
  • Key Terms:
      - P(BA)P(B | A): Represents the conditional probability of B given A has occurred.
  • Types of Events:
      - Independent Events: Events where the occurrence of one does not affect the other.
      - Dependent Events: Events where the occurrence of one affects the probability of the other.
      - Mutually Exclusive Events: Cannot occur together.

17. Continuous Variables & Probability

  • For continuous variables, probabilities are not calculated for exact values but rather over ranges.

18. t Tests Overview

  1. What is a t Test?: A statistical test used to determine whether there is a significant difference between the means of two groups.
  2. When to Use a t Test:
      - Appropriate when calculating probability over a range of values.
      - Connection to z-scores: Raw scores can be converted into z-scores to compare how far a value is from the mean.
      - z-score formula: z=(XM)SDz = \frac{(X - M)}{SD}
      - Probability Finding Steps:
        1. Convert X to z-score.
        2. Utilize the z-table (normal distribution) to find the probability under the curve.

19. Key Skills for Exams on t Tests

  • Distinguish between inferential and descriptive statistics.
  • Apply the addition and multiplication rules correctly.
  • Identify independent vs dependent events.
  • Convert raw scores to z-scores.
  • Interpret probability from the normal distribution.

20. Summary of t Tests

  • Types of t Tests:
      1. One-Sample t Test: compares the sample mean to a known population mean.
      2. Independent-Samples t Test: compares the means of two different groups.
      3. Dependent-Samples t Test (Paired t Test): compares means from the same participants measured twice.
  • t Statistic Formula: t=difference between meansvariability of the differencet = \frac{\text{difference between means}}{\text{variability of the difference}}

21. Degrees of Freedom (df)

  • Explanation: Refers to the number of independent values or quantities which can vary in an analysis without breaking any constraints. E.g.,
      - Independent t Test: df=n1+n22df = n_1 + n_2 - 2
      - One-sample t Test: df=n1df = n - 1.
  • More degrees of freedom provide a more accurate estimate of the parameter being analyzed.
  • It impacts the shape of the t distribution, which mimics the normal curve but is wider until the df increases.

22. Assumptions of t Tests

  • Random Sampling: Selection of subjects must be random.
  • Independence of Observations: Each measurement must not influence another.
  • Normal Distribution: Data should be normally distributed, especially crucial for small sample sizes.
  • Equal Variances: Certain t-tests require equal variances across compared groups.

23. Effect Size (Cohen’s d)

  • Definition: A measure of practical significance that indicates the magnitude of difference, not merely statistical significance.

24. Study Guide for Selected Questions

Common Statistical Topics:
  • Z Tests: Implications, when to apply.
  • Sampling Distributions: Understand distribution of means vs. individual scores.
  • Power of Tests: Definitions, implications of sample size, effect size, alpha levels, and one-tailed vs two-tailed tests.

25. Final Summary of Assumptions for z Tests

  • Assumptions: Random sampling, independence, normal distribution (or large sample size), and known population standard deviation.

26. Additional Insights

  • Understanding Differences in Distributions: Knowledge of population distributions versus sampling distributions, along with standard deviation and standard error distinctions, is vital for accurate statistical analysis.
  • Statistical Power and its Components: Recognizing the factors influencing power, such as sample size and effect size, allows for improved anticipation of a test's ability to detect true differences while minimizing error risks.

27. Multiple Choice Questions Review

  • A review of state-based multiple-choice questions following concept definitions, statistical properties, and applications to ensure a comprehensive grasp of the subject matter.