Inference for Regression Notes

  • Introduction to Inference for Regression

    • Focus on understanding the statistical process for answering research questions.

    • Visualization of data using scatter plots is established as the starting point.

  • Model Types

    • Two-Parameter Model: Includes both an intercept and a slope (represents linear relationships).

    • Intercept-Only Model: Represents a flat line, suggesting no relationship between variables.

  • Model Evaluation

    • Encouragement to analyze scatter plots to determine the best fit line visually.

    • Observation of residuals (differences between actual and predicted values) suggests that two-parameter models fit better than flat lines in most cases.

  • Regression Line Equation

    • Y = Intercept + (Slope * X)

    • Y-axis shows the response variable; X-axis shows explanatory variables.

    • Statistical understanding is required to describe the error around this equation.

  • Statistical Assumptions

    • Normal distribution of errors.

    • Constant variance of errors across all X values (homoscedasticity).

    • Importance of working with population parameters when analyzing samples.

  • Hypothesis Testing in Regression

    • Null Hypothesis (H0): No association between X and Y (slope = 0).

    • Alternative Hypothesis (H1): There is an association between X and Y (slope ≠ 0).

    • Importance of phrasing hypotheses clearly as part of the statistical process.

  • Observed Slope vs. Hypothesized Slope

    • Calculation of the observed slope based on sample data (example given: observed slope = 1.12).

    • Use of a test statistic to determine if the observed slope is significantly different from hypothesized slope.

  • Test Statistics

    • Comparison of the observed slope with the hypothesized slope using the standard error.

    • Formulation of a standardized test statistic helps in deciding if the linear model is appropriate.

  • Distribution of Test Statistics

    • Use of T-distribution for test statistics; characteristics include symmetry and bell shape, adjusted for degrees of freedom.

  • Significance of Test Statistics

    • High values (positive or negative) indicate strong evidence against the null hypothesis.

    • Values near zero suggest no reason to reject the null hypothesis.

  • P-Value

    • Represents the area under the curve corresponding to the observed test statistic.

    • Small P-values indicate strong evidence against the null hypothesis.

    • Standard threshold for significance often set at 0.05 (5% risk tolerance for Type I error).

  • Setting Alpha Level (α)

    • Decision on significance level (commonly set at 0.05) must be made before data collection.

    • Varying alpha levels can be employed depending on the context and desired stringency.

  • Calculating and Interpreting P-Values

    • Example interpretation of a P-value of 0.035 suggests a 3.5% chance of obtaining a sample that extreme assuming the null hypothesis is true.

    • The importance of selecting quality samples to avoid misleading conclusions.

  • Overall Statistical Process

    • State null hypothesis, assess assumptions, calculate test statistic, determine P-value, and make an informed decision on rejecting or not rejecting the null hypothesis.

    • Conclude with interpretation of results in the context of the research question and variable relationships.