Correlation and Regression Analysis Notes
Correlation Analysis
Definition: A statistical method to determine if a relationship exists between variables.
Measures the association or strength of the relationship between two variables (x and y).
"Co" means together and "relation" signifies a connection.
What is Correlation Analysis?
It describes the relationship between two or more variables that are linked together.
In statistics, applying the rectangular system to locate the coordinates of two variables being investigated is known as a scatter diagram.
Examples of Correlation Analysis
The more social distancing, the lower the risk of acquiring COVID-19 virus.
More time on a treadmill leads to more calories burned.
Taller people tend to have larger shoe sizes, and shorter people have smaller shoe sizes.
Longer hair requires more shampoo.
Less marketing results in fewer new customers.
Slower internet speed increases the chances of disconnection from online classes.
Correlation May Indicate:
Degree of Association:
A high value of suggests that students with high grades in English are likely to have high grades in Math.
Example: English and Math grades of a group of students.
Cause and Effect:
Pupil’s nutritional status and their academic performance in Math.
Predictive Ability:
A high degree of relationship could imply that the entrance test could predict the grades of freshmen students.
Example: Entrance test and grades of freshmen students.
Reliability of Test:
If students perform consistently in a test regardless of when it is taken, the test may be considered reliable.
Example: Teacher-made Test
Terminologies
Univariate Data:
Deals with a single variable independently of other variables.
Statistical options: Measures of Central Tendency, Variation, and other descriptive statistics.
Bivariate Data:
Involves two variables.
Describes relationships using Correlation Analysis
Scatter Plot
Also known as "scatter graph" or "scatter diagram."
Shows how each point collected from a set of bivariate data are scattered on the Cartesian plane.
– independent variable
– dependent variable
A graphical representation of two variables that helps in finding the relationship between them.
How to Create a Scatter Plot Diagram?
A graph of observed points plotted, where each point exemplifies the values of and as a coordinate. It portrays the relationship between these two variables graphically.
Types of Scatter Plot Diagram
Positive Linear Correlation:
increases in a perfectly predictable manner as increases.
Negative Linear Correlation:
decreases in a perfectly predictable manner as increases.
No Correlation:
No predictable relationship between and .
Degree of Association (Strength of Relationship)
Closeness of the points on the trend line indicates the strength of the relationship:
Strong correlation (perfect positive or negative)
Moderate correlation
Weak or no correlation
Pearson Product-Moment Correlation ()
First derived by Karl Pearson.
A statistical tool in quantifying the linear relationship between two random variables, and .
Data are parametric (numerical measurement describing a characteristic of a sample).
Interpretation of Correlation Coefficient
The degree of correlation can be determined by the correlation coefficient, which ranges from +1 to -1.
If , then the variables are perfectly positively correlated.
If , then the variables are perfectly negatively correlated.
If , then the variables are uncorrelated.
Value of and Verbal Interpretation:
0.00: No Correlation
to : Slight Correlation
to : Low Correlation
to : Moderate Correlation
to : High Correlation
to : Very High Correlation
: Perfect Correlation
Interpretation of Coefficient of Correlation Table
Value of r | Strength of Correlation |
|---|---|
+1 | Perfect positive correlation |
+0.71 to +0.99 | Strong positive correlation |
+0.51 to +0.70 | Moderately positive correlation |
+0.31 to +0.50 | Weak positive correlation |
+0.01 to +0.30 | Negligible positive correlation |
0 | No correlation |
-0.01 to -0.30 | Negligible negative correlation |
-0.31 to -0.50 | Weak negative correlation |
-0.51 to -0.70 | Moderately negative correlation |
-0.71 to -0.99 | Strong negative correlation |
-1 | Perfect negative correlation |
Examples
Example 1 analyzes the math and English scores of five students to determine the degree of association.
Example 2 involves a survey conducted by nursing students on six women, relating age to systolic blood pressure, where is age and is systolic blood pressure.
Regression Analysis
A tool for predicting the value of one dependent variable () from a given value of another independent variable () when they are related.
It can predict the dependent variable () if independent variable () is known.
Written as an equation:
Determines the trend of two related variables (rising or falling).
Terminologies
Regress:
The act of passing, describing how points are scattered with reference to the trend line.
Regression Line:
Also called the "trend line."
Describes the average distance of points; the line closest to the points in the scatter plot.
Equation of the Regression
= the equation of the trend line
= predictor
= ordinate or the point where the regression line crosses the y-axis
= weight or the slope of the line
Examples
Example 1 asks for the regression equation of a given data set and predicts the English score given a Math score of 15.
Example 2 requires finding the regression equation and predicting the blood pressure of women aged 60 and 50.