1/29
Flashcards containing key terms and definitions related to general statistics and hypothesis testing.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Population
The complete set of subjects or observations about which we want to draw conclusions.
Sample
A subset of the population used to make inferences about the population.
Null Hypothesis (H₀)
A default statement to be tested, often representing 'no effect' or 'no difference.'
Alternative Hypothesis (Hₐ or H₁)
A statement contradicting H₀, representing the effect or difference we suspect exists.
Test Statistic
A calculated value (e.g., z-score, t-score) used to determine whether to reject H₀.
Critical Value
A threshold value from a distribution (e.g., t-distribution) that defines the rejection region for H₀.
p-value
The probability of observing a test statistic as extreme as, or more extreme than, the one calculated, assuming H₀ is true; Reject H₀ if p-value < α.
Significance Level (α)
The probability of rejecting H₀ when it is true (Type I error). Common values: 0.01, 0.05, 0.10.
Type I Error
Incorrectly rejecting a true H₀ (false positive).
Type II Error
Failing to reject a false H₀ (false negative).
One-Sided Test
Tests for an effect in one direction (e.g., Hₐ: μ > μ₀).
Two-Sided Test
Tests for any difference (Hₐ: μ ≠ μ₀).
Regression
A statistical method to model the relationship between a dependent variable (Y) and one or more independent variables (X).
Line of Best Fit (OLS Regression Line)
The line that minimizes the sum of squared residuals in a scatterplot. ( \hat{Y} = \hat{\beta}0 + \hat{\beta}1 X ).
Slope (β̂₁)
The estimated change in Y for a one-unit increase in X.
Intercept (β̂₀)
The predicted value of Y when X = 0. May not always have a meaningful interpretation.
Residual (êᵢ)
The difference between the observed (Yi) and the predicted (\hat{Y}i).
Sum of Squared Residuals (SSR)
A measure of how well the regression line fits the data: ( \sum \hat{e}_i^2 ).
R-squared (R²)
The proportion of variance in Y explained by X. Range: 0 (no fit) to 1 (perfect fit).
Homoskedasticity
Constant variance of residuals across all X values. Violation: Heteroskedasticity (uneven spread).
Simple Linear Regression Model
Assumes: ( Y = \beta0 + \beta1 X + \varepsilon ), where ( \mathbb{E}(\varepsilon|X) = 0 ).
Causal Inference
Requires the SLR assumptions to interpret (\beta_1) as the causal effect of X on Y. Challenge: Omitted variable bias.
ANOVA (Analysis of Variance)
Tests whether group means differ by comparing between-group and within-group variability.
MSTR (Mean Square Treatment)
Measures variation between sample means.
MSE (Mean Square Error)
Measures variation within samples.
Pooled-Variance t-test
Compares two means assuming equal variances.
Unequal-Variance t-test (Welch’s Test)
Compares two means without assuming equal variances.
Excel’s Analysis ToolPak
Used to perform regression, ANOVA, and hypothesis tests. Output Includes: Coefficients, standard errors, t-stats, p-values.
Confidence Interval
A range of values likely to contain the population parameter (e.g., slope).
Sampling Variation
Random fluctuations in sample statistics due to drawing different samples. Even if H₀ is true, sample results vary.