In-Depth Notes on Regression and Correlation Analysis
Relationships between Continuous Variables
- Understanding the relationship between two continuous variables is crucial in data analysis and research.
- Correlation summarizes the strength and direction of a linear relationship between two variables through coefficients ranging from -1 to +1.
- +1 indicates perfect positive correlation (as one variable increases, the other increases),
- 0 indicates no correlation (changes in one variable do not predict changes in the other),
- -1 indicates perfect negative correlation (as one variable increases, the other decreases).
Associations and Their Importance
- Associations help to identify significant trends and relationships in data.
- Nobel Laureates per 10 Million Population example:
- High correlation noted between happiness and GDP per person in countries like Denmark (r=0.791, p<0.0001).
- Happiness vs GDP scatterplots become essential in visualizing such associations.
Understanding Covariance and Correlation
- Covariance measures how much two variables vary together. However, it is not standardized, making interpretation more challenging.
- Pearson Correlation Coefficient (r) provides a standardized measure of the relationship.
- Formula: r=SD(X)⋅SD(Y)Cov(X,Y)
- Value interpretations:
- Strong Positive: r > 0.5
- Moderate Positive: 0.3 < r < 0.5
- Weak Positive: 0 < r < 0.3
- No Correlation: r=0
- Weak Negative: -0.3 < r < 0
- Moderate Negative: -0.5 < r < -0.3
- Strong Negative: r < -0.5
Regression Analysis Overview
- Linear Regression examines the linear relationship between two variables aiming to predict the dependent variable (Y) based on the independent variable (X).
- Essential components:
- Independent Variable (X): Predictor used to make predictions.
- Dependent Variable (Y): Outcome being predicted.
- The primary equation for regression:
- Y=bX+a
- where:
- b is the slope (indicating change in Y for each unit change in X)
- a is the intercept (value of Y when X = 0)
Determining the Regression Line
- Compute b (Slope): Using covariance and variance of X.
- b=Var(X)Cov(X,Y)
- Compute a (Intercept): Using the means of X and Y:
- a=mean(Y)−bimesmean(X)
- Equation Formulation: Once both coefficients are calculated, formulate the predicted regression equation.
Predictive Analysis Example
- Given example data for stress vs symptoms:
- The derived regression equation helps in making predictions:
- Consider a person with stress level 25:
- Predicted Symptoms:
- Symptoms=0.7831∗25+73.891=93.47
Evaluation of Prediction Accuracy
- Assess prediction accuracy using the Standard Error of Estimate and Coefficient of Determination (r²).
- r2 measures the proportion of the variance in the dependent variable that can be explained by the independent variable. Values range from 0 (no explanation) to 1 (perfect explanation).