Module 10

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/20

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

21 Terms

1
New cards

Linear regression

A method that is used for analyzing how the variation in one variable can explain the variation in another variable.

2
New cards

Dependent variable (Y)

This is the variable we want to explain its variation

3
New cards

Independent variable (X)

This is the variable used to explain the variation of Y

4
New cards

Simple linear regression

In simple linear regression we assume a linear relationship exists between the dependent and independent variables

5
New cards

Line of best fit

With regression, we can graph the relationship between the variables and graph it using the line of best fit, which is a line that best fits the observations, this helps visualizes the relationship.

This line of best fit is the line the minimizes the sum of the squared critical distances between the observation and the regression line (Sum of squared errors (SSE)). This is also known as the least squares criterion.

6
New cards

Cross-section Regression

A cross-sectional regression is a type of regression analysis used to examine the relationship between variables at a single point in time across multiple entities, such as different companies, countries, or assets.

7
New cards

Time Series Regression

A time-series regression is a statistical method used to analyze the relationship between variables over time for a single entity, such as a company, market index, or economic indicator. It examines how a dependent variable (like stock price or GDP) is influenced by one or more independent variables (such as interest rates or inflation) across different time periods.

8
New cards

Assumptions for linear regression

  1. Linearity: The relationship between X and Y is linear, which implies the independent variable (X) is not random. To check this assumption, the error terms (known as residuals) when plotted against the independent variable should be random and not display any patterns. If they show patterns the relationship is not linear.

  2. Normality assumption: The error term is normally distributed, however this does not mean that the independent and dependent variable need to come from a normal distribution. To detect violation, the histogram for the error term is not bell shaped and has skewness.

  3. Homoscedasticity Assumption: The variance of the error term is constant for all observations, known as homoscedasticity. To determine violation, you will see the residuals increase as the predicted values increase, which is known as heteroscedasticity (see graph).

  4. Independence Assumption: The observation X and Y are independent of each other

<ol><li><p>Linearity: The relationship between X and Y is linear, which implies the independent variable (X) is not random. To check this assumption, the error terms (known as residuals) when plotted against the independent variable should be random and not display any patterns. If they show patterns the relationship is not linear.</p></li><li><p>Normality assumption: The error term is normally distributed, however this does not mean that the independent and dependent variable need to come from a normal distribution. To detect violation, the histogram for the error term is not bell shaped and has skewness.</p></li><li><p>Homoscedasticity Assumption: The variance of the error term is constant for all observations, known as homoscedasticity. To determine violation, you will see the residuals increase as the predicted values increase, which is known as heteroscedasticity (see graph).</p></li><li><p>Independence Assumption: The observation X and Y are independent of each other</p></li></ol><p></p>
9
New cards

Sum of Squares Total

A measure of the total variation of the dependent variable.
SST = Explained Variation (SSR) + Unexplained Variation (SSE)

Explained variation SSR is the green line below the red line, this is the variation explained

Unexplained variation SSE is the green line above the red line, we need to explain this using the predicted y-value.

Total gives you the sum of squares total

<p>A measure of the total variation of the dependent variable.<br>SST = Explained Variation (SSR) + Unexplained Variation (SSE)</p><p></p><p>Explained variation SSR is the green line below the red line, this is the variation explained</p><p>Unexplained variation SSE is the green line above the red line, we need to explain this using the predicted y-value.</p><p>Total gives you the sum of squares total</p>
10
New cards

Coefficient of Determination

Its the percentage of the total variability explained by the model. R2 lies between 0-1, a higher R2 means the model explains variability better

11
New cards

F-Test

The F-test determines how effectively a group of independent variables explain the variation of the dependent variable; a higher F-test suggests the model does a good job of explaining the variation in the dependent variable

The null hypothesis: b1 = 0 (Slope is 0, no linear relationship between X and Y)

Alternative hypothesis: b1 =/ 0 (Slope is other than 0, some sort of relationship, positive or negative,e between Y and X)

To calculate the F-test, you need

  • Total number of observations (n)

  • Total number of independent variables (k)

  • SSE and SSR, which we use to calculate MSE and MSR

  • F Statistic = MSR / MSE

  • Reject the null hypothesis if Fvalue > Fcritical

12
New cards

Standard Error of Estimate (SEE or Se)

Measures the distance between observed dependent variables and the dependent variables predicted by the regression model, measures the fit of the regression line. The smaller the SEE, the better the fit

13
New cards

Hypothesis tests of Regression Coefficients

  1. State the hypothesis (where b1 the slope is 0 or not, this is two sided, we can also do greater or less than for one sided test)

  2. Identify the test-statistic

  3. State the level of significance

  4. State the decision rule

  5. Calculate the test statistic and make a decision

14
New cards

Correlation testing and Linear regression model

The test statistic value for linear regression model will give the same value for a pairwise correlation as these two tests are related.

15
New cards

Dummy Variables

These are known as indicator variables or binary variables, which are used in regression analysis to represent categorical data with two or more categories. Used for qualitative information that requires numerical input.

16
New cards

Analysis of Variance (ANOVA)

A statistical procedure used to partition the total variation of a variable into components that can be ascribed to different sources.

It is used to determine the effectiveness of the independent variable in explaining the variation of the dependent variable

17
New cards

ANOVA Table

knowt flashcard image
18
New cards

F test and t-test for hypothesis testing

The F-Statistic is the squared term of the t-statistic for the slope of coefficient, meaning it infers the same thing as the t-test. Both tests can help hypothesis testing for the slope or intercept coefficient.

19
New cards

Confidence Interval / Prediction interval

This is an interval for a predicted value of the dependent variable

20
New cards

How do we deal with regression if either the dependent or independent variable is not linear

We can transform these types of models using the following transformations

  1. Log-linear model

    • Dependent variable is logarithmic, but the independent variable is linear

  2. Linear-log model

    • Dependent variable is linear but the independent variable is logarithmic

  3. Log-log model

    • Both the independent and independent variable are in logarithmic form

<p>We can transform these types of models using the following transformations</p><ol><li><p>Log-linear model</p><ul><li><p>Dependent variable is logarithmic, but the independent variable is linear</p></li></ul></li><li><p>Linear-log model</p><ul><li><p>Dependent variable is linear but the independent variable is logarithmic</p></li></ul></li><li><p>Log-log model</p><ul><li><p>Both the independent and independent variable are in logarithmic form</p></li></ul></li></ol><p></p>
21
New cards

How do we know to select the correct functional form?

  1. Coefficient of determination (R2): A higher value is better

  2. F-Statistic: A high value is better

  3. Standard error of estimate (Se): a lower value is better

  4. Look at the residuals, they should be random and uncorrelated for the model

These are some measure to see if the model is good, or if transformations may be needed and how you can test if after the transformation if the model improved.