10/23/25: SOCI 252 - Data Analysis Class Discussion Notes on Linear Regression, Hypothesis Testing, and Validity

Size of Handwritten Notes:
- Students can use handwritten notes on sheets measuring 8.5 by 11 inches.
- Smaller sizes are acceptable, but 8.5 by 11 inches represent the maximum size allowed.

Cumulative Nature:
- Acknowledgment that material builds upon prior knowledge.
- New topics are introduced while previous topics remain relevant.

Final Exam Adjustments:
- If a student's final grade exceeds the midterm grade based on performance in the final, the final will take precedence.
- The final will include material from earlier sections of the course, mixing new and previously covered concepts.
- If higher scores are achieved in later sections of the final, those will be substituted for midterm scores.

Using Variables in Homework:
- Clarity about how to handle categorical versus numerical data.
- The use of binary variables in analysis was confirmed acceptable.
- Application of t-tests versus z-tests was emphasized.
- Both tests produce similar results in analytical contexts.

Understanding Linear Models:
- Explanation of linear regression:
- Formula:
  $Y = \beta<em>0 + \beta</em>1X$
  where $Y$ is the predicted outcome, $X$ is the predictor variable, $eta0$ (intercept) is the predicted value of $Y$ when $X$ = 0, and $eta1$ (slope) indicates the change in $Y$ for a unit change in $X$.
- Interpretations of coefficients:
- A positive value of $eta_1$ indicates an increase in $Y$ with an increase in $X$.

T-Statistics and Z-Statistics:
- Difference:
- T-Test: Used for smaller sample sizes or when population standard deviation is unknown.
- Z-Test: Used when population standard deviation is known.
- Example from the discussion emphasized the need for significance testing on linear models adjusted for observational data sets.

Observational Studies Defined:
- Observational studies collect data without random assignment; reliance on already existing behaviors or pre-existing factors.
- Limitations:
- Cannot make causal inferences as robustly as controlled experiments; potential confounding variables exist.
Understanding Causation in Observational Data:
- Emphasized importance of controlling for confounders to support any claims of association.

Understanding Null and Alternate Hypotheses:
- Null Hypothesis (H0): Assumes no effect or relationship exists (e.g., $\beta = 0$ ).
- A p-value below 0.05 implies rejection of the null hypothesis.
- Alternative Hypothesis (H1): Suggests there is an effect or relationship.
Hypothesis Testing Framework:
- Establishment of benchmarks for significance, often with p-values.
- Importance of controlling for confounding factors when interpreting results.

Types of Validity:
- Internal Validity:
- Assesses whether a causal relationship can be inferred from data. Evaluates the ability to draw cause-effect conclusions.
- External Validity:
- Relates to the generalizability of study findings; can the results be applied to broader contexts?
Connection Between Validity Types:
- Internal validity is often stronger in experimental studies, while external validity is often better in observational studies.

Discussion about the importance of understanding relationships and implications in real-world contexts, especially regarding social pressures such as voting behavior.

Importance of understanding linear regression; its utilitarian application in predicting outcomes.
Emphasis placed on causal inference considerations, hypothesis formulation, and adjusting for confounding factors.
The distinction between correlation and causation, especially within observational study designs.
Understanding hypothesis testing and interpretation of statistical significance, including confidence intervals and p-values.