10/23/25: SOCI 252 - Data Analysis Class Discussion Notes on Linear Regression, Hypothesis Testing, and Validity

Notes on Class Discussion and Key Concepts

Sizing and Structure of Exam Sheets

  • Size of Handwritten Notes:

    • Students can use handwritten notes on sheets measuring 8.5 by 11 inches.

    • Smaller sizes are acceptable, but 8.5 by 11 inches represent the maximum size allowed.

Nature of Exams

  • Cumulative Nature:

    • Acknowledgment that material builds upon prior knowledge.

    • New topics are introduced while previous topics remain relevant.

Grading Policies

  • Final Exam Adjustments:

    • If a student's final grade exceeds the midterm grade based on performance in the final, the final will take precedence.

    • The final will include material from earlier sections of the course, mixing new and previously covered concepts.

    • If higher scores are achieved in later sections of the final, those will be substituted for midterm scores.

Homework and T-Test Discussion

  • Using Variables in Homework:

    • Clarity about how to handle categorical versus numerical data.

    • The use of binary variables in analysis was confirmed acceptable.

    • Application of t-tests versus z-tests was emphasized.

    • Both tests produce similar results in analytical contexts.

Linear Modeling and Statistical Testing

  • Understanding Linear Models:

    • Explanation of linear regression:

    • Formula:
      Y = eta0 + eta1X
      where $Y$ is the predicted outcome, $X$ is the predictor variable, $eta0$ (intercept) is the predicted value of $Y$ when $X$ = 0, and $eta1$ (slope) indicates the change in $Y$ for a unit change in $X$.

    • Interpretations of coefficients:

    • A positive value of $eta_1$ indicates an increase in $Y$ with an increase in $X$.

Critical Statistical Concepts

  • T-Statistics and Z-Statistics:

    • Difference:

    • T-Test: Used for smaller sample sizes or when population standard deviation is unknown.

    • Z-Test: Used when population standard deviation is known.

    • Example from the discussion emphasized the need for significance testing on linear models adjusted for observational data sets.

Observational Studies vs Experiments

  • Observational Studies Defined:

    • Observational studies collect data without random assignment; reliance on already existing behaviors or pre-existing factors.

    • Limitations:

    • Cannot make causal inferences as robustly as controlled experiments; potential confounding variables exist.

  • Understanding Causation in Observational Data:

    • Emphasized importance of controlling for confounders to support any claims of association.

Hypothesis Testing

  • Understanding Null and Alternate Hypotheses:

    • Null Hypothesis (H0): Assumes no effect or relationship exists (e.g., eta = 0).

    • A p-value below 0.05 implies rejection of the null hypothesis.

    • Alternative Hypothesis (H1): Suggests there is an effect or relationship.

  • Hypothesis Testing Framework:

    • Establishment of benchmarks for significance, often with p-values.

    • Importance of controlling for confounding factors when interpreting results.

Validity in Statistical Analysis

  • Types of Validity:

    • Internal Validity:

    • Assesses whether a causal relationship can be inferred from data. Evaluates the ability to draw cause-effect conclusions.

    • External Validity:

    • Relates to the generalizability of study findings; can the results be applied to broader contexts?

  • Connection Between Validity Types:

    • Internal validity is often stronger in experimental studies, while external validity is often better in observational studies.

Application of Statistical Concepts in Practical Situations

  • Discussion about the importance of understanding relationships and implications in real-world contexts, especially regarding social pressures such as voting behavior.

Summary of Key Learning Objectives

  • Importance of understanding linear regression; its utilitarian application in predicting outcomes.

  • Emphasis placed on causal inference considerations, hypothesis formulation, and adjusting for confounding factors.

  • The distinction between correlation and causation, especially within observational study designs.

  • Understanding hypothesis testing and interpretation of statistical significance, including confidence intervals and p-values.