Stats Modeling Final

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/54

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

55 Terms

1
New cards

2 Types of Experiments

Randomized experiment and observational study

2
New cards

Randomized Experiment (2 types)

Matched pairs, Randomized Comparative

3
New cards

Type of conclusion that can be made from randomized experiment

Causal (If explanatory is randomly assigned)

4
New cards

Type of conclusion that can be made from observational study

Only association

5
New cards

Confidence Interval Formula

Sample statistic +- Z/T mult X standard error

6
New cards

What uses Z mult vs what uses t mult

Proportion uses Z

averages or rho uses T

7
New cards

SLR Conditions

Linearity

Independence

Normality

Equal Variance

Randomness

8
New cards

What to use to determine each (Note about randomness)

Independence, Randomness: Problem statement

  • Randomness of sampling determines if you can generalize to the pop. or not

Linearity, Equal Variance: Residuals vs Fitted plots

Normality: QQ Plot

9
New cards

Residuals Formula

Observed - Predicted

10
New cards

Empirical Rule

68% - 1 SD

95% - 2 SD

99.7 % - 3 SD

11
New cards

Concavity rule with transforms

If concave up, try power > 1

If concave down, try power < 1

12
New cards

Outlier vs influential

Outlier: Far from regression line

Influential: Has big impact on the regression fit

13
New cards

For an influential point, how does studentized compare to standardized residual

|Studentized| > |Standardized|

14
New cards

What leverage measures

Points potential to be influential

  • Only looks at how far a points x value is from the mean cloud of x

15
New cards

Cook’s distance combines…

Residuals for y distance and leverage for x distance

16
New cards

R² interpretation and formula

Proportion of variability in response explained by the model

SSModel / SSTotal

17
New cards

3 Ways to test for regression

  1. T-test for the slope

    1. Is B1 = 0 or not equal to 0

  2. T-test for correlation

    1. Is there a linear relationship between x and y

  3. Overall F-Test

    1. Are all slopes = 0 or is at least one of them not equal to 0

18
New cards

Confidence interval vs Prediction interval (Interpretations) (Which is wider?)

CI: 95% confident that the true mean y value at this x is within these bounds

PI: 95% a new individual value of y at this x is within these bounds

19
New cards

Three types of Anova Tables

Type 1: Sequential Sum of Squares

Type 2: Hierarchical Sum of Squares

Type 3: Marginal Anova

20
New cards

Sequential Sum of Squares (What is it? df for predictors and residuals? Compresses to what?)

Additional variability explained when new variable is added to the model (sequentially adding)

Predictors df is how many slopes were needed to include it in the model (always 1 for quantitative)

Residuals: n-k-1

Compresses to overall anova table (model row has all the predictors combined, residuals row is just residuals)

21
New cards

Hierarchical (What is it? Matches thing…)

Additional variability explained by adding this new variable to a model containing everything else

P values match the table of coefficients

22
New cards

Maginal Anova

Like hierarchical, but with interaction terms

23
New cards

VIF Above ___ is typically bad

above 5 (r² > 0.8)

24
New cards

Note about predictions from a model with multicollinearity

Predictions are fine, but individual coefficient conclusions are not

25
New cards

“Good” for mallow cp

<= m+1, where m is the number of predictors in the subset model

26
New cards

CP, AIC, BIC preference for small models

CP and AIC are moderate, BIC prefers small ones a lot

27
New cards

Methods of picking models (4)

Best Subsets - fits all 2^k models and picks best basedon criterion

Backwards Elimination - starts with full until deleting a term doesn’t improve it (succeptible to multicollinearity)

Forward Selection - Keep adding until no longer improves

Stepwise Regression - adds stuff with forward, but also checks with backwards if something can be remove

28
New cards

Nested F-test

“Is anything gained by adding these terms to a smaller model”

29
New cards

Experimental Unit

Thing that is assigned treatment (usually a row in the dataset)

30
New cards

Balanced?

If each level of the explanatory factor gets the same number of experimental units

31
New cards

Two ways of writing the equation for an anova model and what each tests (what links?)

Y​=μ_i​+ε_i

​H_0: u_1 = u_2 = … = u_i

H_a: at least one u_i not equal u_j

Y_i=μ+α_i​+ε

H_: alpha_1 = alpha_2 = … = alpha_i

H_a: at least one alpha_i not equal to 0

Link: mu_i = u + alpha_i

32
New cards

group to group vs unit to unit and scales of each if the treatment is important

group to group: different levels

unit to unit: per unit observations

If important, group to group » unit to unit

33
New cards

In anova table, how do you find F value for row?

Divide MS/MSE

34
New cards

More b/w group variability ___ p values. More samples ___ p value

decreases, decreases

35
New cards

Conditions for Anova (3) & how to check

  1. Normality - qq-plot

  2. Equal variance (sd of groups: max/min < 2)

  3. Independence - how data was collected

36
New cards

Interpreting effect size of difference in means

> 0.5: Moderate

> 1: Large

37
New cards

What is FWER and ways to control

FWER: Family Wise Error Rate: Chance of making at least one type 1 error with multiple hypothesis tests

38
New cards

Meaning of an additive effect

If effect of treatment A is the same for all levels of treatment B

39
New cards

Main effects

Effect of one factor averaged over all levels of the other factors

40
New cards

Two-Way Factorial design requirements (2)

  • at least two levels for each factor

  • Every combination is tested

41
New cards

Experimental Design Principles

  • Blocking: Block out a nuisance factor & assign treatments across all levels of that nuisance factor

  • Comparison: More than one group is needed & one is a placebo/control & include all levels that need to be studied

  • Crossing: include all combinations of factor levels

  • Replication: 2+ observations for each cell

  • Randomization: Randomly assign units

42
New cards

Two way main effects model (no interaction)

Y = mu + alpha_i + beta_j + e

43
New cards

Extra condition for two way main effects model

Effects are additive (If not, we would need an interaction term and that’s another model)

44
New cards

Two way main effects model (w/ interaction)

Y = mu + alpha_i + beta_j + alpha*beta_ij + e

45
New cards

Explain each form of randomized block: subdivision, matching, reusing

Subdivision: Divide up by a known nuisance factor

Matching: Match people based on a known nuisance factor, and assign a treatment to each within the pair

Reusing: Randomly reuse same experimental unit under each treatment

46
New cards

Other method of checking condition that’s not sd_max / sd_min (and how to fix)

Checks equal variance. Plot log(sd) / log(mean)

Transform via y^(1-slope)

  • 0 → log(y)

47
New cards

Odds Formula

Pi / (1 - pi)

48
New cards

Logit Form Formula (Key thing about error)

Ln(pi/(1-pi)) = B_0 + B_1X + …

No error term

49
New cards

Logit form converted to a pi= format

pi = e^(B_0 + B_1X) / (1 + e^(B_0 + B_1X)

50
New cards

Logistic regression’s probabiliy is based on a Y= what case?

Y=1, or that the event happens

51
New cards

Odds ratio (2 forms) (And interpretations)

When comparing stuff:

Change from a to be is odds_b / odds_a

“Changing from a to be, odds of Y=1 inc/dec by a factor of __”

Unit to unit change

e^B_1

52
New cards

Theoretical Model for SLR

Y = B_0 + B_1X + e

Plug in appropriate variables for Y and X

53
New cards

What forward selection uses to know what to add next

Takes one with highest correlation

54
New cards

What backward elimination uses to know what to remove next

Takes highest p value to remove

55
New cards

What’s included in a complete second order model?

First order, second order (squared terms), interaction