Regression
COMG 102: Everyday Communication with Numbers - Regression
1. Overview of Regression
Definition of Regression: Regression analysis is a statistical method used to determine the relationship between two or more variables and assess how well one can predict another.
2. Recap: Correlation
Definition of Correlation: Correlation refers to the relationship between two continuous variables.
Graphical Representation: The relationship is visually represented using a scatterplot.
Statistical Representation: A correlation test is employed to quantify the relationship.
Correlation Coefficient (r): This statistic measures the strength and direction of the relationship between variables.
Calculation: The correlation is often based on the calculation of z-scores.
Interpretation: The correlation coefficient indicates both the direction (positive/negative) and strength (weak/strong) of the relationship.
3. Correlation vs. Regression
Purpose of Both Analyses: Both correlation and regression are used to assess relationships between two variables.
Independent and Dependent Variables: Regression analyses generally involve one independent variable (predictor) and one dependent variable, while correlation does not emphasize this distinction as strongly.
Multiple Independent Variables: Regression can include multiple predictors, allowing for more complex analyses.
Use of Correlation: While correlation can indicate a potential relationship, it does not allow for strong prediction.
Use of Regression: Regression specifically enables predictions based on relationships.
4. Functions of Regression
Prediction/Forecasting: Regression can be used to predict outcomes based on the values of independent variables.
Explanation of Change: It explains to what extent changes in the independent variable(s) explain changes in the dependent variable.
Identifying Spurious Relationships: Regression helps to detect whether independent variables have no actual relationship with observed changes (i.e., identifying spurious relationships).
Overlapping Effects: It allows for the examination of overlapping effects of independent variables on the dependent variable.
Impact Determination: Regression assesses how much independent variables influence the dependent variable.
5. Correlation vs. Regression Example
Example Question: How does study time relate to exam scores?
Correlation Analysis: Investigate if there is a correlation between study time and exam scores.
Regression Analysis: Analyze to determine how much study time contributes to improvements in exam scores, e.g., predicting the expected exam score for a student who studied for 4 hours.
6. Line of Best Fit
Purpose: The line of best fit is used to summarize the relationship between variable X (study time) and variable Y (exam score).
Creating a Scatterplot: A scatterplot visualizes paired X and Y scores.
Fitting the Line: A straight line is fitted to the data points in the scatterplot, representing the best estimate of the relationship.
Importance of the Line Equation: The equation of this line will provide insights into the relationship and aid in answering prediction questions.
7. Linear Regression Formula
General Formula: The linear regression equation is given by: y = bx + a
y: the predicted value of the dependent variable.
b: the slope of the line of best fit, indicating the change in y for each unit change in x.
x: a specific value of the independent variable.
a: the intercept (the value of y when x = 0).
8. Important Notes on Regression
Causation vs. Correlation: It is crucial to understand that regression does not imply causation. Predictions made from data must be interpreted carefully.
External Influences: For example, both study time and test scores could be influenced by external factors like motivation and intelligence.
Linear Relationships Only: Regression measures typically capture linear relationships; it may not accurately describe curvilinear relationships or ceiling effects (when variable values plateau).
9. Bivariate & Multivariate Regression
Bivariate Regression: Analyzes relationships between two variables.
Multivariate Regression: Explores relationships among multiple variables.
Partial Correlations: This analysis helps understand the unique contributions of different independent variables.
Causal Relationships: Multivariate regression provides insights into potential causal relationships and aids in hypothesis testing regarding levels of covariance among variables.
10. Multivariate Regression Example
House Price Estimation: To estimate the cost of a house, one may collect various data points such as:
Location
Number of bedrooms
Square footage
Availability of facilities
Prediction Based on Data: The price of a home is predicted based on the interrelation of these variables.
11. Research Findings in Regression
Example Correlation Coefficients:
r (shoe ext{ }size, height) = 0.73
r (shoe ext{ }size, sex) = 0.83
r (sex, height) = 0.43
12. Further Correlation Coefficients in Multivariate Regression
Additional Findings:
r (shoe ext{ }size, height) = 0.62
r (shoe ext{ }size, sex) = 0.75
13. Popularity and Cautions of Regression
Popularity: Regression analysis is widely utilized in statistical modeling and analysis.
Reporting Findings: Results from regression are frequently reported with the term "associated" but one must be cautious and avoid hastily drawing causal conclusions from correlations observed.