Inference for Regression Notes
Introduction to Inference for Regression
Focus on understanding the statistical process for answering research questions.
Visualization of data using scatter plots is established as the starting point.
Model Types
Two-Parameter Model: Includes both an intercept and a slope (represents linear relationships).
Intercept-Only Model: Represents a flat line, suggesting no relationship between variables.
Model Evaluation
Encouragement to analyze scatter plots to determine the best fit line visually.
Observation of residuals (differences between actual and predicted values) suggests that two-parameter models fit better than flat lines in most cases.
Regression Line Equation
Y = Intercept + (Slope * X)
Y-axis shows the response variable; X-axis shows explanatory variables.
Statistical understanding is required to describe the error around this equation.
Statistical Assumptions
Normal distribution of errors.
Constant variance of errors across all X values (homoscedasticity).
Importance of working with population parameters when analyzing samples.
Hypothesis Testing in Regression
Null Hypothesis (H0): No association between X and Y (slope = 0).
Alternative Hypothesis (H1): There is an association between X and Y (slope ≠ 0).
Importance of phrasing hypotheses clearly as part of the statistical process.
Observed Slope vs. Hypothesized Slope
Calculation of the observed slope based on sample data (example given: observed slope = 1.12).
Use of a test statistic to determine if the observed slope is significantly different from hypothesized slope.
Test Statistics
Comparison of the observed slope with the hypothesized slope using the standard error.
Formulation of a standardized test statistic helps in deciding if the linear model is appropriate.
Distribution of Test Statistics
Use of T-distribution for test statistics; characteristics include symmetry and bell shape, adjusted for degrees of freedom.
Significance of Test Statistics
High values (positive or negative) indicate strong evidence against the null hypothesis.
Values near zero suggest no reason to reject the null hypothesis.
P-Value
Represents the area under the curve corresponding to the observed test statistic.
Small P-values indicate strong evidence against the null hypothesis.
Standard threshold for significance often set at 0.05 (5% risk tolerance for Type I error).
Setting Alpha Level (α)
Decision on significance level (commonly set at 0.05) must be made before data collection.
Varying alpha levels can be employed depending on the context and desired stringency.
Calculating and Interpreting P-Values
Example interpretation of a P-value of 0.035 suggests a 3.5% chance of obtaining a sample that extreme assuming the null hypothesis is true.
The importance of selecting quality samples to avoid misleading conclusions.
Overall Statistical Process
State null hypothesis, assess assumptions, calculate test statistic, determine P-value, and make an informed decision on rejecting or not rejecting the null hypothesis.
Conclude with interpretation of results in the context of the research question and variable relationships.