Statistical Hypothesis Testing
Statistical Hypothesis Testing
Alpha Level (α) Specification
- The alpha level should typically be specified before conducting the hypothesis test.
- Commonly set at 0.05 or 5%.
Rejection Criteria
- If the p-value obtained is less than the alpha level (e.g., 5%), reject the null hypothesis.
- Example in the transcript:
- The peak value found is , which is less than 5%. This leads to the conclusion to reject the null hypothesis.
P-Value and Alpha Comparison
- If the p-value is less than alpha, state: "Since the p-value is less than alpha, we reject the null hypothesis."
Confidence Interval
- Defines the range of values that do not fall into the rejection region.
- If the hypothesized value falls within this range, do not reject the null hypothesis.
Regression Analysis
Interpreting Coefficients
- Each coefficient in a regression model needs interpretation.
- E.g., beta coefficients (β) signify the relationship between predictor variables and the dependent variable (Y).
- Suggestion: If one variable can be manipulated, it should be the number of children due to its significant impact.
Significance of Coefficients
- It’s essential to determine if all coefficients are significant or not.
Residuals and Homoscedasticity
Residual Plotting
- Residuals should be fitted against the predictor variables to assess model fit.
- Analysis of residual plots is crucial to check for homoscedasticity (constant variance of residuals).
- Observations showed that residuals were primarily around 10 and 15, suggesting randomness in the distributed values.
Variable Consideration in Models
- Low residual values might suggest insufficient variables are considered in the model.
Polynomial Regression Models
Types of Regression Models
- Multiple regression is a type of polynomial regression where variables can have various powers.
- For a polynomial regression involving a predictor variable , it can be represented as:
- Discussion of quadratic vs. cubic forms:
- Quadratic (second order) reflects a U-shaped graph, while cubic might cross the x-axis three times, creating more complexity.
Graphical Representation
- When plotting, the form suggests whether the relationship is linear, quadratic, etc.
- An interaction term should be considered if the effect of one variable depends on the level of another variable.
Modeling with Multiple Predictors
Two Independent Variables
- If (linear), the model is:
Interaction Terms
- When there are two independent variables, potential interaction terms might need to be included.
- The model format becomes:
Case Study: Regression Model for Fast Food Restaurant
Family Income and Age of Children
- Tasked to determine the relationship between family income and age of children on spending behavior at a fast-food restaurant.
- Graph plotting revenue against income suggests a quadratic relationship:
- Families with middle income levels tend to spend more than those at the extremes.
Quadratic Model Form
- The model must reflect this quadratic relationship:
- Interaction terms can also be included where relevant, reflecting more complex behaviors.
Model Significance Testing
Determining Significance
- All coefficients should be tested for significance to confirm their relevance in the model.
- High p-values (like p > 0.05) on coefficients indicate a lack of significance, possibly hinting at multicollinearity issues.
Use of Indicator Variables
- When categories are present (e.g., colors of cars), an indicator variable can be introduced to account for these categorical influences.