GB

Measures of Association

Differences vs Relationships: Key Questions
  • Faculty Satisfaction: Are faculty in one department more satisfied than those in another? To assess this, various metrics such as surveys measuring job satisfaction, work-life balance, and available resources can be employed. The average satisfaction level in the population can be determined using statistical analyses to find mean satisfaction scores.

  • Cookie Consumption: Do children consume more red cookies than blue cookies? This could be surveyed in controlled settings to ensure reliability, while factors such as preferences, color psychology, and peer influence can be accounted for.

Correlation: Definitions and Uses
  • Definition: Correlation is a statistical procedure that describes both the strength and direction of the relationship between two variables. This is crucial in various fields such as psychology, medicine, and social sciences to understand patterns of behavior.

  • Tendency: Two variables may vary together; that is, an increase in one could lead to an increase in the other (positive correlation), or a decrease in one could lead to a decrease in the other (negative correlation).

  • Uses: 1. Describe the pattern of change in two variables (Privitera, 2012), which can highlight significant behavioral trends.

    1. Determine if the observed pattern in a sample is present in the population, thereby allowing for generalizations.

Correlation Coefficient
  • Definition: A strategy for quantifying the relationship between variables, facilitating a clearer understanding of their interaction.

  • Uses: 1. Measure the strength and direction of the relationship, providing insights into possible dependencies between variables.

    1. Validate if the sample pattern is present in the broader population, enhancing the reliability of the conclusions drawn.

  • Types of Correlation Coefficients: - Pearson's r (used for linear relationships among interval/ratio data)

    • Spearman's rho (measuring correlation for ordinal data)

    • Point bi-serial (for continuous vs dichotomous variables)

    • Pearson's chi-square (assessing relationships in categorical data).

Caution: Correlation vs. Causation
  • Important Warning: Correlation does not imply causation. Just because two variables correlate does not mean one causes the other to change. It is crucial to conduct further analyses or controlled studies to establish a causal relationship.

Pearson's r Product Moment Correlation Coefficient
  • Definition: A measure of the linear relationship of two factors with data on an interval or ratio scale, which provides a detailed understanding of their relationship.

  • Assumptions: 1. Linearity: Data should be describable by a straight line.

    1. Normality: Data points should be normally distributed within the sample and the population.

    2. Bivariate Normal Distribution: Together, the data points from both variables form a normal distribution.

  • Interpretation of Pearson's r: - Positive correlation: As one variable increases, the other increases (e.g. height and weight).

    • Negative correlation: As one variable increases, the other decreases (e.g. number of hours studied and amount of time spent on social media).

    • Scale of strength:

      • r = 0.0 to 0.10: Little to no relationship

      • r = 0.10 to 0.30: Weak relationship

      • r = 0.30 to 0.50: Moderate relationship

      • r = 0.50 to 1.0: Strong relationship.

Testing the Null Hypothesis
  • Null Hypothesis: In hypothesis testing for correlation, the null hypothesis typically states that there is no linear relationship between the variables examined.

  • Alternative Hypotheses: - Non-directional alternative: Suggests that some sort of linear relationship exists.

    • Directional alternative: Suggests a specific direction of the relationship, either negative (less than zero) or positive (greater than zero).

Steps for Hypothesis Testing with Pearson's r
  1. Select Statistical Test: Determine significance level (commonly set at 0.05).

  2. Select Sample and Collect Data: Describe the sample used for analysis, ensuring it's representative of the population.

  3. Determine Regions of Rejection: Based on alpha, hypothesis type, and degrees of freedom (calculated as N-2).

  4. Calculate Test Statistic: Analyze the data with Pearson's r, ensuring all assumptions are met.

  5. Make Statistical Decision: Interpret the results based on statistical significance and confidence intervals.

  6. Interpretation: - Strength and direction of relationship along with statistical significance including Pearson’s r, degrees of freedom, and p-value.

Cautions with Correlation Coefficients
  • While Pearson’s r provides valuable insights, it's important to note that it may not be valid for non-linear relationships. In such cases, look for curvilinear relationships to avoid misleading conclusions.

Point Bi-serial Correlation Coefficient
  • Definition: Used for relationships between one continuous variable and one dichotomous variable (e.g., gender, yes/no responses), providing insights into differences in continuous scores across groups.

  • Indicates: Scores in one group tend to be higher than in another, helping to highlight significant disparities.

  • Interpreting Points: Coefficient significance is assessed similarly to Pearson's r, where a higher correlation denotes greater differences between groups, offering a deeper understanding of the impact of a dichotomous variable.

Regression
  • Definition: A statistical procedure used to determine the regression line, which allows us to predict one variable based on another, highlighting real-world applications in forecasting.

  • Types of Regression: 1. Linear Regression: A bivariate technique predicting outcome variable scores based on one predictor variable, useful in settings like sales forecasting.

    1. Multiple Regression: Involves predicting scores based on multiple independent variables, allowing for a multifaceted analysis that reflects real-world complexity.

  • Key Terms: - Independent Variable (X): The known variable used to predict another variable.

    • Dependent Variable (Y): The variable that is predicted by the independent variable, critical for understanding the effects of changes in X.

    • Regression Equation: Defines the relationship between variables, incorporating slope (representing change in Y with a unit change in X) and intercept (baseline value of Y when X is zero).

Assumptions of Regression
  • Independent Random Sampling: Data points should be independent of each other, ensuring unbiased results.

  • Linearity: A straight line should best describe data patterns, as non-linear relationships will distort predictions.

  • Normality: Population data points should be normally distributed, supporting valid inferential statistics.

  • Homoscedasticity: Similar variance for all X-values should be present, ensuring consistent error variance across levels of X.

  • Multicollinearity: Avoid strong correlations between predictor variables, as this can interfere with identifying significant predictors.

Analysis of Regression
  • Definition: Used to test hypotheses about predictor variables and to determine predictive validity of regression equations, playing a pivotal role in data-driven decision-making.

  • Null Hypothesis: The outcome variable is not related to the predictor variable under analysis, asserting no effect.

  • Data Collection Steps: Involves selecting a representative sample, calculating test statistics, and interpreting output including coefficients of determination (R²) and F statistics to evaluate model effectiveness.