Regression

COMG 102: Everyday Communication with Numbers - Regression

1. Overview of Regression

  • Definition of Regression: Regression analysis is a statistical method used to determine the relationship between two or more variables and assess how well one can predict another.

2. Recap: Correlation

  • Definition of Correlation: Correlation refers to the relationship between two continuous variables.

  • Graphical Representation: The relationship is visually represented using a scatterplot.

  • Statistical Representation: A correlation test is employed to quantify the relationship.

  • Correlation Coefficient (r): This statistic measures the strength and direction of the relationship between variables.

  • Calculation: The correlation is often based on the calculation of z-scores.

  • Interpretation: The correlation coefficient indicates both the direction (positive/negative) and strength (weak/strong) of the relationship.

3. Correlation vs. Regression

  • Purpose of Both Analyses: Both correlation and regression are used to assess relationships between two variables.

  • Independent and Dependent Variables: Regression analyses generally involve one independent variable (predictor) and one dependent variable, while correlation does not emphasize this distinction as strongly.

  • Multiple Independent Variables: Regression can include multiple predictors, allowing for more complex analyses.

  • Use of Correlation: While correlation can indicate a potential relationship, it does not allow for strong prediction.

  • Use of Regression: Regression specifically enables predictions based on relationships.

4. Functions of Regression

  • Prediction/Forecasting: Regression can be used to predict outcomes based on the values of independent variables.

  • Explanation of Change: It explains to what extent changes in the independent variable(s) explain changes in the dependent variable.

  • Identifying Spurious Relationships: Regression helps to detect whether independent variables have no actual relationship with observed changes (i.e., identifying spurious relationships).

  • Overlapping Effects: It allows for the examination of overlapping effects of independent variables on the dependent variable.

  • Impact Determination: Regression assesses how much independent variables influence the dependent variable.

5. Correlation vs. Regression Example

  • Example Question: How does study time relate to exam scores?

    • Correlation Analysis: Investigate if there is a correlation between study time and exam scores.

    • Regression Analysis: Analyze to determine how much study time contributes to improvements in exam scores, e.g., predicting the expected exam score for a student who studied for 4 hours.

6. Line of Best Fit

  • Purpose: The line of best fit is used to summarize the relationship between variable X (study time) and variable Y (exam score).

  • Creating a Scatterplot: A scatterplot visualizes paired X and Y scores.

  • Fitting the Line: A straight line is fitted to the data points in the scatterplot, representing the best estimate of the relationship.

  • Importance of the Line Equation: The equation of this line will provide insights into the relationship and aid in answering prediction questions.

7. Linear Regression Formula

  • General Formula: The linear regression equation is given by: y = bx + a

    • y: the predicted value of the dependent variable.

    • b: the slope of the line of best fit, indicating the change in y for each unit change in x.

    • x: a specific value of the independent variable.

    • a: the intercept (the value of y when x = 0).

8. Important Notes on Regression

  • Causation vs. Correlation: It is crucial to understand that regression does not imply causation. Predictions made from data must be interpreted carefully.

  • External Influences: For example, both study time and test scores could be influenced by external factors like motivation and intelligence.

  • Linear Relationships Only: Regression measures typically capture linear relationships; it may not accurately describe curvilinear relationships or ceiling effects (when variable values plateau).

9. Bivariate & Multivariate Regression

  • Bivariate Regression: Analyzes relationships between two variables.

  • Multivariate Regression: Explores relationships among multiple variables.

    • Partial Correlations: This analysis helps understand the unique contributions of different independent variables.

    • Causal Relationships: Multivariate regression provides insights into potential causal relationships and aids in hypothesis testing regarding levels of covariance among variables.

10. Multivariate Regression Example

  • House Price Estimation: To estimate the cost of a house, one may collect various data points such as:

    • Location

    • Number of bedrooms

    • Square footage

    • Availability of facilities

  • Prediction Based on Data: The price of a home is predicted based on the interrelation of these variables.

11. Research Findings in Regression

  • Example Correlation Coefficients:

    • r (shoe ext{ }size, height) = 0.73

    • r (shoe ext{ }size, sex) = 0.83

    • r (sex, height) = 0.43

12. Further Correlation Coefficients in Multivariate Regression

  • Additional Findings:

    • r (shoe ext{ }size, height) = 0.62

    • r (shoe ext{ }size, sex) = 0.75

13. Popularity and Cautions of Regression

  • Popularity: Regression analysis is widely utilized in statistical modeling and analysis.

  • Reporting Findings: Results from regression are frequently reported with the term "associated" but one must be cautious and avoid hastily drawing causal conclusions from correlations observed.