Statistical Hypothesis Testing

Alpha Level (α) Specification
- The alpha level should typically be specified before conducting the hypothesis test.
- Commonly set at 0.05 or 5%.
Rejection Criteria
- If the p-value obtained is less than the alpha level (e.g., 5%), reject the null hypothesis.
- Example in the transcript:
- The peak value found is $0.000$ , which is less than 5%. This leads to the conclusion to reject the null hypothesis.
P-Value and Alpha Comparison
- If the p-value is less than alpha, state: "Since the p-value is less than alpha, we reject the null hypothesis."
Confidence Interval
- Defines the range of values that do not fall into the rejection region.
- If the hypothesized value falls within this range, do not reject the null hypothesis.

Interpreting Coefficients
- Each coefficient in a regression model needs interpretation.
- E.g., beta coefficients (β) signify the relationship between predictor variables and the dependent variable (Y).
- Suggestion: If one variable can be manipulated, it should be the number of children due to its significant impact.
Significance of Coefficients
- It’s essential to determine if all coefficients are significant or not.

Residual Plotting
- Residuals should be fitted against the predictor variables to assess model fit.
- Analysis of residual plots is crucial to check for homoscedasticity (constant variance of residuals).
- Observations showed that residuals were primarily around 10 and 15, suggesting randomness in the distributed values.
Variable Consideration in Models
- Low residual values might suggest insufficient variables are considered in the model.

Types of Regression Models
- Multiple regression is a type of polynomial regression where variables can have various powers.
- For a polynomial regression involving a predictor variable $x$ , it can be represented as:
- $y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + … + ext{error}$
- Discussion of quadratic vs. cubic forms:
- Quadratic (second order) reflects a U-shaped graph, while cubic might cross the x-axis three times, creating more complexity.
Graphical Representation
- When plotting, the form suggests whether the relationship is linear, quadratic, etc.
- An interaction term should be considered if the effect of one variable depends on the level of another variable.

Two Independent Variables
- If $p = 1$ (linear), the model is:
- $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ext{error}$
Interaction Terms
- When there are two independent variables, potential interaction terms might need to be included.
- The model format becomes:
- $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 (x_1 * x_2) + ext{error}$

Family Income and Age of Children
- Tasked to determine the relationship between family income and age of children on spending behavior at a fast-food restaurant.
- Graph plotting revenue against income suggests a quadratic relationship:
- Families with middle income levels tend to spend more than those at the extremes.
Quadratic Model Form
- The model must reflect this quadratic relationship:
- $ext{Revenue} = \beta_0 + \beta_1 ( ext{Income}) + \beta_2 ( ext{Income}^2) + \beta_3 ( ext{Age}) + \beta_4 ( ext{Age}^2) + … + ext{error}$
- Interaction terms can also be included where relevant, reflecting more complex behaviors.

Determining Significance
- All coefficients should be tested for significance to confirm their relevance in the model.
- High p-values (like p > 0.05) on coefficients indicate a lack of significance, possibly hinting at multicollinearity issues.
Use of Indicator Variables
- When categories are present (e.g., colors of cars), an indicator variable can be introduced to account for these categorical influences.