L2 BS2 – Pearson Correlation and Simple Linear Regression

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/21

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

22 Terms

New cards

Pearson Correlation

A measure used to determine the degree of linear relationship (correlation) between two quantitative variables. Indicating how they change together. However, do not prove that one causes another.

New cards

3 types of correlations

- Positive correlation: Variables change in the same direction e.g. if one increases, the other increases.

- Negative correlation: Variables change in opposite direction e.g. if one increases the other decreases.

- Zero correlation: No linear relationship between the variables.

New cards

Pearson correlation coefficient (r):

Used to measure a linear correlation. Measures the strength and direction of the relationship between two variables and is a number between -1 and 1. Calculated using covariance.

New cards

Covariance

Measures how much two random variables change together and shows the direction of the linear relationship. Not standardized therefore harder to compare strengths of relationships. A positive covariance means the variables tend to move in the same direction, while a negative covariance means they move in opposite directions.

New cards

Properties of Pearson Correlation coefficient (r):

- It can be positive or negative, sharing the same sign as the sample covariance between X and Y.

- It varies between -1 (perfect negative correlation) and 1 (perfect positive correlation).

- It exhibits symmetry: the correlation between Y and X is the same as the correlation between X and Y

New cards

Interpretation Rules of Thumb:

If the absolute value of is above 0.3, there is some weak correlation; above 0.5 suggests a more direct correlation; and above 0.7 suggests a higher correlation.

New cards

Example using interest rates:

Using short-term interest rate (x) and long-term interest rate (y), the calculated Pearson correlation coefficient was r=0.97. Since it's close to 1 it shows a strong positive correlation.

New cards

Simple Regression Analysis:

A regression model that estimates the relationship between one independent variable and one dependent variable using a straight line. Both variables should be quantitative.

New cards

Simple Regression Analysis Highlights:

- It estimates how a change in one variable predicts a change in another, showing a relationship that can be modeled as a line.

- It provides an equation (y = a + bx) and allows us to control the other variables to explore potential causal effects. Doesn't prove casuality but can get a clear picture of the relationship between two variables.

- Typically applied instead of just correlation analysis because it allows for the investigation of causal effects (though causality is difficult to claim) and shows how impacts

New cards

The regression model:

- The true population model is Y = B0 + B1x + e, where is hypothesized to be greater than zero in cases like house size (x) increasing price (y).

- The parameters are estimated using b0 and b1, leading to the estimated model: y^ = b0 + b1x.

- The goal of estimation is to minimize the error term (e).

New cards

Error term:

Represents all variables that affect the dependent variable but are not included as independent variables. Represents the difference between an observed value and the value predicted by the model. It accounts for the variability in the dependent variable that is not explained by the independent variables included in the model.

New cards

Reasons for the existence of the error term:

- The vagueness of theory

- Intrinsic randomness in human behavior (e.g., unique preferences for a house's location),

- Poor proxy variables/measurement error

New cards

Data used for regression analysis may be:

- Cross-sectional (variation between individuals/firms)

- Time series data (data over time)

- Panel data (a mix of both)

New cards

Steps in Regression Analysis

- Statement of theory or hypothesis.

- Specify a mathematical model: Y = b0 + b1x, b1 > 0

o Y = Dependent variable

o X = Independent variable

o b0 and b1 = Parameters

- Specify a econometrical model: E.g. Y = b0 + b1x + £

o E: Error term

- Find data (Cross-sectional, Time series, or Panel data).

- Estimate the model (Find and that minimize the error term).

o After this we divide the variation of the dependent variable into two parts:

§ One part of the variation is explained by the regression model.

§ Other is not explained by the regression model:

§ Total variation Equation:{Total variation} = {Explained by regression model} + {Not explained by regression model}.

- Test hypothesis.

- Make predictions

New cards

The coefficient of determination (r2):

Measures how much of the variation in the dependent variable () is explained by the independent variables in the regression model.

- Measures how well a model predicts an outcome (the model's dependent variable).

- The lowest R2 is 0 and highest 1. The closer the model is at making predictions to 1. Where 1 means a perfect fit and the fit increases as r2 increases. If r2 is 0, the independent variable explains none of the variations.

New cards

Independent variable:

Can be manipulated and vary to explore its effects, not influenced by other variables in the study.

New cards

Dependent variable:

Change as a result of the independent variables manipulation and is the outcome.

New cards

Coefficient of determination formula

- Numerator (explained variation): Regression sum of squares (SSR). How much the predicted values vary from the mean of the observed values .

- Denominator (total variation): Total sum of squares (SST). This measures how much the observed values, y, vary from their mean.

Total variation Equation:{Total variation} = {Explained by regression model} + {Not explained by regression model}.

New cards

In regression analysis, the main goal for hypothesis testing is

to evaluate if the causal effect (B1) is statistically different from zero.

New cards

Total variation in Y:

This is how much Y varies overall, regardless of X.

New cards

Explained variation

The part of Y's variation that the regression model accounts for — the predicted values .

New cards

Unexplained variation:

The residuals — the part of Y's variation that the model doesn't explain.