Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

0 Cards0.0(0)

49d ago

Quantitative Methods in Cultural Industries

Model Diagnostics: Homoscedasticity Assumption

Model Assumptions:
- The error term () in a regression model $$yi = \beta0 + \beta1 x{i1} + \beta2 x{i2} + … + \epsilon_i$$ should satisfy several assumptions:
  - Mean of zero: $E(\epsilon_i) = 0$ $$E(\epsilon_i) = 0$$.
  - Constant variance (homoscedasticity): $Var(\epsilon_i) = \sigma^2$ $$Var(\epsilon_i) = \sigma^2$$ for all i.
  - No correlation between error terms.
  - Normally distributed error terms.
- Numerical explanatory variables should not exhibit perfect multicollinearity.
Why Check Model Assumptions?
- To ensure that least squares estimators are reliable.
- To understand the consequences of violating assumptions.
- To detect any assumption violations.
- To modify the model to satisfy the assumptions.
Diagnostics for Assumptions on the Errors:
- Diagnostics are based on residuals.
- Visual inspection of graphs of standardized residuals.
- No violation implies no pattern in the graphs.

Homoscedasticity Assumption

Homoscedasticity: The variance of the error term is constant across all levels of the independent variables: $Var(\epsilon_i) = \sigma^2$ $$Var(\epsilon_i) = \sigma^2$$.

Consequences of Heteroscedasticity

Increase in the standard error of coefficient estimates.
Low t-values.
Potentially leading to the incorrect conclusion that explanatory variables do not have a significant contribution.

Heteroscedasticity Detection

Under homoscedasticity, standardized residuals and standardized predictions are not correlated.
Check: Plot standardized residuals against standardized predictions.
Interpretation: No pattern implies no violation of homoscedasticity.

Patterns Revealing Heteroscedasticity

Specific patterns in the plot of standardized residuals versus standardized predictions, such as a fan shape, indicate heteroscedasticity.

Remedies for Heteroscedasticity

Log transformation of the dependent variable.
Log transformation of one or more explanatory variables.

Model 2: Log Transformation Example

Regress the log of the dependent variable (e.g., log of NArtzero) against the same explanatory variables.
Pay attention to the presence of 0 values in the dependent variable, which may require special handling before applying the log transformation.

Interpretation of Coefficients with Log Transformation

The coefficient of a numerical independent variable represents the average change in the logarithm of the original dependent variable associated with a unit increase in the covariate, controlling for all other covariates.
This can be interpreted as the average percentage change in the original variable associated with a unit increase in the covariate, when all other covariates are kept constant.
- For example, an increase in age is associated with a 0.2% average change in the number of visits to an art museum, controlling for other variables.

Model Diagnostics: No Perfect Collinearity

No Perfect Multicollinearity

Definition:
- Numerical explanatory variables should not have perfect correlation.
- There should be no overlap of information or redundancy among the explanatory variables.

Consequences of Collinearity

Standard errors of the coefficients estimates become large, leading to:
- Small t-statistic values.
- Coefficient signs may not match prior expectations.

Diagnostics for Multicollinearity

Tolerance
Variance Inflation Factor (VIF)
Partial correlations

Tolerance

Computation:
1. Regress the considered explanatory variable against all other independent variables.
2. Compute the R-squared ( $R^2$ $$R^2$$) of the regression.
3. Calculate Tolerance as $Tolerance = 1 - R^2$ $$Tolerance = 1 - R^2$$
Interpretation:
- Tolerance represents the proportion of the explanatory variable's variability that is not explained by all other variables.
- A greater tolerance indicates a lower inter-correlation.

Variance Inflation Factor (VIF)

Computation:
$VIF = \frac{1}{Tolerance}$ $$VIF = \frac{1}{Tolerance}$$
Interpretation:
- VIF measures the amount by which the variance of an explanatory variable's coefficient is increased due to collinearity.
- A VIF greater than 10 suggests severe multicollinearity.

Partial Correlations

Definition: Pearson’s correlation between the dependent variable and the considered explanatory variable, controlling for all other explanatory variables.
Computation:
1. Regress the dependent variable on all covariates, excluding the considered one.
2. Regress the considered explanatory variable on all other covariates.
3. Calculate the residuals of the two regressions.
Partial Correlation: The Pearson’s correlation between the residuals of the two regressions.
Interpretation:
- A high partial correlation (in absolute value) indicates a high correlation between the dependent variable and the independent variable, controlling for all other independent variables.
- Compare the values of the Pearson’s correlation and the corresponding partial correlation to understand the effect of controlling for other variables.

Remedies for Multicollinearity

Remove one (or more than one) of the collinear covariates (e.g., remove Mother Education).
Compute a new variable from the ones affected by collinearity, and replace the multicollinear variables with this new variable.
Transform the numerical covariates into categorical variables by grouping the values.
Continue using the same model, keeping in mind that t-tests might be distorted.

Advanced Regression Techniques

Linearity Assumption

Linear Effect of a Numerical Independent Variable:
- In the standard regression model,$$E(Y) = \beta0 + \beta1 X1 + \beta2 X_2 + … + \epsilon$$, numerical variables are entered without mathematical transformations.
- This assumes a linear relationship between the dependent variable and each numerical independent variable, with a constant effect of a unit increase of a numerical independent variable.

Non-Linear Effects

To model non-linear effects, add polynomial terms in the considered variable (e.g., a quadratic term):$$Y = \beta0 + \beta1 X + \beta_2 X^2 + \epsilon$$

Quadratic Effect

To check for a quadratic effect of a variable (e.g., Family Income), include as an extra explanatory variable the square of the variable (Income2) and check for its significance.

Marginal Impact of Family Income

The marginal impact is determined by taking the partial derivative of the model with respect to Family Income. The equation includes both a linear and a quadratic term.

Moderation

Constant Effect: The standard regression model assumes that the marginal contribution of an independent variable does not depend on the value of any other variable.
Moderation Effect: A covariate $$X1 $has a moderation effect on the relationship between another covariate,$ $$ has a moderation effect on the relationship between another covariate, $$X2 $, and the dependent variable if$ $$, and the dependent variable if $$X1 $affects the size of the marginal effect of$ $$ affects the size of the marginal effect of $$X2 $on the dependent variable. This indicates an interaction between$ $$ on the dependent variable. This indicates an interaction between $$X1 $and$ $$ and $$X2$$.
Consider whether an additional level of education has the same impact on the number of visits to an art museum for those who attended classes in visual art and those who did not attend such classes, controlling for all other variables.

Checking for Interaction

Add as an extra explanatory variable the product of the two variables (e.g., ClassvisualXEducation).
Check for the significance of the interaction term.

Mediation Effect

Direct Effect

Each explanatory variable has a direct effect on the dependent variable, measured by its slope.

Mediation Effect

The mediation effect explores whether the effect on the dependent variable of one independent variable is mediated by another independent variable.
Mediation Effect as Causal Effect: Variable $$X2 $is a mediator in the relationship between$ $$ is a mediator in the relationship between $$X1 $and Y if$ $$ and Y if $$X1 $causes$ $$ causes $$X2 $and$ $$ and $$X_2$$ causes Y.

Path Analysis

Path analysis is used to sketch causal relationships.

Sequence of Regression Models

Regress the dependent variable on all variables but the mediator.
Regress the mediator on the variable with an indirect effect.
Regress the dependent variable on all variables.

Steps in Path Analysis

Step 1: Regress NArtzero on Father Education to estimate the total effect ( $\beta_{f,total}$ $$\beta_{f,total}$$).
Step 2: Regress Personal Education on Father Education to estimate the effect on the mediator ( $\beta_f$ $$\beta_f$$).
Step 3: Regress NArtzero on Father and Personal Education to estimate the direct effect of Father Education ($$\beta{f,direct} $) and the direct effect of Personal Education ($ $$) and the direct effect of Personal Education ($$\beta{p,direct}$$).

Decomposition of the Total Effect

The total effect can be decomposed into the direct effect and the mediated effect:$$\beta{f,total} = \beta{f,direct} + \beta_{f,mediated}$$
The mediated effect is calculated as the product of the coefficients for each leg of the path: $$\beta{f,mediated} = \betaf \cdot \beta_p$$

Sobel Test

The Sobel test is used to test the significance of the mediation effect. The null hypothesis is that the mediated effect is zero: $$H0: \beta{f,mediated} = \betaf \cdot \betap = 0$$
The test statistic can be computed using the formula:

Personal Education as a Significant Mediator

If the test statistic value is large enough, it indicates a significant mediation effect of Personal Education on the relationship between Father Education and NArtzero.

Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

0 Cards0.0(0)

Quantitative Methods in Cultural Industries

Model Diagnostics: Homoscedasticity Assumption

Model Assumptions:
- The error term () in a regression model $yi = \beta0 + \beta1 x{i1} + \beta2 x{i2} + … + \epsilon_i$ should satisfy several assumptions:
  - Mean of zero: $E(\epsilon_i) = 0$ .
  - Constant variance (homoscedasticity): $Var(\epsilon_i) = \sigma^2$ for all i.
  - No correlation between error terms.
  - Normally distributed error terms.
- Numerical explanatory variables should not exhibit perfect multicollinearity.
Why Check Model Assumptions?
- To ensure that least squares estimators are reliable.
- To understand the consequences of violating assumptions.
- To detect any assumption violations.
- To modify the model to satisfy the assumptions.
Diagnostics for Assumptions on the Errors:
- Diagnostics are based on residuals.
- Visual inspection of graphs of standardized residuals.
- No violation implies no pattern in the graphs.

Homoscedasticity Assumption

Homoscedasticity: The variance of the error term is constant across all levels of the independent variables: $Var(\epsilon_i) = \sigma^2$ .

Consequences of Heteroscedasticity

Increase in the standard error of coefficient estimates.
Low t-values.
Potentially leading to the incorrect conclusion that explanatory variables do not have a significant contribution.

Heteroscedasticity Detection

Under homoscedasticity, standardized residuals and standardized predictions are not correlated.
Check: Plot standardized residuals against standardized predictions.
Interpretation: No pattern implies no violation of homoscedasticity.

Patterns Revealing Heteroscedasticity

Specific patterns in the plot of standardized residuals versus standardized predictions, such as a fan shape, indicate heteroscedasticity.

Remedies for Heteroscedasticity

Log transformation of the dependent variable.
Log transformation of one or more explanatory variables.

Model 2: Log Transformation Example

Regress the log of the dependent variable (e.g., log of NArtzero) against the same explanatory variables.
Pay attention to the presence of 0 values in the dependent variable, which may require special handling before applying the log transformation.

Interpretation of Coefficients with Log Transformation

The coefficient of a numerical independent variable represents the average change in the logarithm of the original dependent variable associated with a unit increase in the covariate, controlling for all other covariates.
This can be interpreted as the average percentage change in the original variable associated with a unit increase in the covariate, when all other covariates are kept constant.
- For example, an increase in age is associated with a 0.2% average change in the number of visits to an art museum, controlling for other variables.

Model Diagnostics: No Perfect Collinearity

No Perfect Multicollinearity

Definition:
- Numerical explanatory variables should not have perfect correlation.
- There should be no overlap of information or redundancy among the explanatory variables.

Consequences of Collinearity

Standard errors of the coefficients estimates become large, leading to:
- Small t-statistic values.
- Coefficient signs may not match prior expectations.

Diagnostics for Multicollinearity

Tolerance
Variance Inflation Factor (VIF)
Partial correlations

Tolerance

Computation:
1. Regress the considered explanatory variable against all other independent variables.
2. Compute the R-squared ( $R^2$ ) of the regression.
3. Calculate Tolerance as $Tolerance = 1 - R^2$
Interpretation:
- Tolerance represents the proportion of the explanatory variable's variability that is not explained by all other variables.
- A greater tolerance indicates a lower inter-correlation.

Variance Inflation Factor (VIF)

Computation:
$VIF = \frac{1}{Tolerance}$
Interpretation:
- VIF measures the amount by which the variance of an explanatory variable's coefficient is increased due to collinearity.
- A VIF greater than 10 suggests severe multicollinearity.

Partial Correlations

Definition: Pearson’s correlation between the dependent variable and the considered explanatory variable, controlling for all other explanatory variables.
Computation:
1. Regress the dependent variable on all covariates, excluding the considered one.
2. Regress the considered explanatory variable on all other covariates.
3. Calculate the residuals of the two regressions.
Partial Correlation: The Pearson’s correlation between the residuals of the two regressions.
Interpretation:
- A high partial correlation (in absolute value) indicates a high correlation between the dependent variable and the independent variable, controlling for all other independent variables.
- Compare the values of the Pearson’s correlation and the corresponding partial correlation to understand the effect of controlling for other variables.

Remedies for Multicollinearity

Remove one (or more than one) of the collinear covariates (e.g., remove Mother Education).
Compute a new variable from the ones affected by collinearity, and replace the multicollinear variables with this new variable.
Transform the numerical covariates into categorical variables by grouping the values.
Continue using the same model, keeping in mind that t-tests might be distorted.

Advanced Regression Techniques

Linearity Assumption

Linear Effect of a Numerical Independent Variable:
- In the standard regression model, $E(Y) = \beta0 + \beta1 X1 + \beta2 X_2 + … + \epsilon$ , numerical variables are entered without mathematical transformations.
- This assumes a linear relationship between the dependent variable and each numerical independent variable, with a constant effect of a unit increase of a numerical independent variable.

Non-Linear Effects

To model non-linear effects, add polynomial terms in the considered variable (e.g., a quadratic term): $Y = \beta0 + \beta1 X + \beta_2 X^2 + \epsilon$

Quadratic Effect

To check for a quadratic effect of a variable (e.g., Family Income), include as an extra explanatory variable the square of the variable (Income2) and check for its significance.

Marginal Impact of Family Income

The marginal impact is determined by taking the partial derivative of the model with respect to Family Income. The equation includes both a linear and a quadratic term.

Moderation

Constant Effect: The standard regression model assumes that the marginal contribution of an independent variable does not depend on the value of any other variable.
Moderation Effect: A covariate $X1$ has a moderation effect on the relationship between another covariate, $X2$ , and the dependent variable if $X1$ affects the size of the marginal effect of $X2$ on the dependent variable. This indicates an interaction between $X1$ and $X2$ .
Consider whether an additional level of education has the same impact on the number of visits to an art museum for those who attended classes in visual art and those who did not attend such classes, controlling for all other variables.

Checking for Interaction

Add as an extra explanatory variable the product of the two variables (e.g., ClassvisualXEducation).
Check for the significance of the interaction term.

Mediation Effect

Direct Effect

Each explanatory variable has a direct effect on the dependent variable, measured by its slope.

Mediation Effect

The mediation effect explores whether the effect on the dependent variable of one independent variable is mediated by another independent variable.
Mediation Effect as Causal Effect: Variable $X2$ is a mediator in the relationship between $X1$ and Y if $X1$ causes $X2$ and $X_2$ causes Y.

Path Analysis

Path analysis is used to sketch causal relationships.

Sequence of Regression Models

Regress the dependent variable on all variables but the mediator.
Regress the mediator on the variable with an indirect effect.
Regress the dependent variable on all variables.

Steps in Path Analysis

Step 1: Regress NArtzero on Father Education to estimate the total effect ( $\beta_{f,total}$ ).
Step 2: Regress Personal Education on Father Education to estimate the effect on the mediator ( $\beta_f$ ).
Step 3: Regress NArtzero on Father and Personal Education to estimate the direct effect of Father Education ( $\beta{f,direct}$ ) and the direct effect of Personal Education ( $\beta{p,direct}$ ).

Decomposition of the Total Effect

The total effect can be decomposed into the direct effect and the mediated effect: $\beta{f,total} = \beta{f,direct} + \beta_{f,mediated}$
The mediated effect is calculated as the product of the coefficients for each leg of the path: \beta{f,mediated} = \betaf \cdot \beta_p

Sobel Test

The Sobel test is used to test the significance of the mediation effect. The null hypothesis is that the mediated effect is zero: H0: \beta{f,mediated} = \betaf \cdot \betap = 0
The test statistic can be computed using the formula:

Personal Education as a Significant Mediator

If the test statistic value is large enough, it indicates a significant mediation effect of Personal Education on the relationship between Father Education and NArtzero.