Correlation and Regression Analysis Notes

Correlation Analysis

  • Definition: A statistical method to determine if a relationship exists between variables.

  • Measures the association or strength of the relationship between two variables (x and y).

  • "Co" means together and "relation" signifies a connection.

What is Correlation Analysis?

  • It describes the relationship between two or more variables that are linked together.

  • In statistics, applying the rectangular system to locate the coordinates of two variables being investigated is known as a scatter diagram.

Examples of Correlation Analysis

  • The more social distancing, the lower the risk of acquiring COVID-19 virus.

  • More time on a treadmill leads to more calories burned.

  • Taller people tend to have larger shoe sizes, and shorter people have smaller shoe sizes.

  • Longer hair requires more shampoo.

  • Less marketing results in fewer new customers.

  • Slower internet speed increases the chances of disconnection from online classes.

Correlation May Indicate:

  • Degree of Association:

    • A high value of rr suggests that students with high grades in English are likely to have high grades in Math.

    • Example: English and Math grades of a group of students.

  • Cause and Effect:

    • Pupil’s nutritional status and their academic performance in Math.

  • Predictive Ability:

    • A high degree of relationship could imply that the entrance test could predict the grades of freshmen students.

    • Example: Entrance test and grades of freshmen students.

  • Reliability of Test:

    • If students perform consistently in a test regardless of when it is taken, the test may be considered reliable.

    • Example: Teacher-made Test

Terminologies

  • Univariate Data:

    • Deals with a single variable independently of other variables.

    • Statistical options: Measures of Central Tendency, Variation, and other descriptive statistics.

  • Bivariate Data:

    • Involves two variables.

    • Describes relationships using Correlation Analysis

Scatter Plot

  • Also known as "scatter graph" or "scatter diagram."

  • Shows how each point collected from a set of bivariate data are scattered on the Cartesian plane.

  • xx – independent variable

  • yy – dependent variable

  • A graphical representation of two variables that helps in finding the relationship between them.

How to Create a Scatter Plot Diagram?

  • A graph of observed points plotted, where each point exemplifies the values of xx and yy as a coordinate. It portrays the relationship between these two variables graphically.

Types of Scatter Plot Diagram

  • Positive Linear Correlation:

    • yy increases in a perfectly predictable manner as xx increases.

    • r=1r = 1

  • Negative Linear Correlation:

    • yy decreases in a perfectly predictable manner as xx increases.

    • r=1r = -1

  • No Correlation:

    • No predictable relationship between xx and yy.

    • r=0r = 0

Degree of Association (Strength of Relationship)

  • Closeness of the points on the trend line indicates the strength of the relationship:

    • Strong correlation (perfect positive or negative)

    • Moderate correlation

    • Weak or no correlation

Pearson Product-Moment Correlation (rr)

  • First derived by Karl Pearson.

  • A statistical tool in quantifying the linear relationship between two random variables, xx and yy.

  • Data are parametric (numerical measurement describing a characteristic of a sample).

Interpretation of Correlation Coefficient

  • The degree of correlation can be determined by the correlation coefficient, which ranges from +1 to -1.

    • If r=+1r = +1, then the variables are perfectly positively correlated.

    • If r=1r = -1, then the variables are perfectly negatively correlated.

    • If r=0r = 0, then the variables are uncorrelated.

  • Value of rr and Verbal Interpretation:

    • 0.00: No Correlation

    • ±0.01\pm 0.01 to ±0.20\pm 0.20: Slight Correlation

    • ±0.21\pm 0.21 to ±0.40\pm 0.40: Low Correlation

    • ±0.41\pm 0.41 to ±0.70\pm 0.70: Moderate Correlation

    • ±0.71\pm 0.71 to ±0.80\pm 0.80: High Correlation

    • ±0.81\pm 0.81 to ±0.99\pm 0.99: Very High Correlation

    • ±1.0\pm 1.0: Perfect Correlation

Interpretation of Coefficient of Correlation Table

Value of r

Strength of Correlation

+1

Perfect positive correlation

+0.71 to +0.99

Strong positive correlation

+0.51 to +0.70

Moderately positive correlation

+0.31 to +0.50

Weak positive correlation

+0.01 to +0.30

Negligible positive correlation

0

No correlation

-0.01 to -0.30

Negligible negative correlation

-0.31 to -0.50

Weak negative correlation

-0.51 to -0.70

Moderately negative correlation

-0.71 to -0.99

Strong negative correlation

-1

Perfect negative correlation

Examples

  • Example 1 analyzes the math and English scores of five students to determine the degree of association.

  • Example 2 involves a survey conducted by nursing students on six women, relating age to systolic blood pressure, where xx is age and yy is systolic blood pressure.

Regression Analysis

  • A tool for predicting the value of one dependent variable (yy) from a given value of another independent variable (xx) when they are related.

  • It can predict the dependent variable (yy) if independent variable (xx) is known.

  • Written as an equation: y=a+bxy = a + bx

  • Determines the trend of two related variables (rising or falling).

Terminologies

  • Regress:

    • The act of passing, describing how points are scattered with reference to the trend line.

  • Regression Line:

    • Also called the "trend line."

    • Describes the average distance of points; the line closest to the points in the scatter plot.

Equation of the Regression

  • yy = the equation of the trend line

  • xx = predictor

  • aa = ordinate or the point where the regression line crosses the y-axis

  • bb = weight or the slope of the line

Examples

  • Example 1 asks for the regression equation of a given data set and predicts the English score given a Math score of 15.

  • Example 2 requires finding the regression equation and predicting the blood pressure of women aged 60 and 50.