Stats Modeling Final

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/54

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

55 Terms

New cards

2 Types of Experiments

Randomized experiment and observational study

New cards

Randomized Experiment (2 types)

Matched pairs, Randomized Comparative

New cards

Type of conclusion that can be made from randomized experiment

Causal (If explanatory is randomly assigned)

New cards

Type of conclusion that can be made from observational study

Only association

New cards

Confidence Interval Formula

Sample statistic +- Z/T mult X standard error

New cards

What uses Z mult vs what uses t mult

Proportion uses Z

averages or rho uses T

New cards

SLR Conditions

Linearity

Independence

Normality

Equal Variance

Randomness

New cards

What to use to determine each (Note about randomness)

Independence, Randomness: Problem statement

Randomness of sampling determines if you can generalize to the pop. or not

Linearity, Equal Variance: Residuals vs Fitted plots

Normality: QQ Plot

New cards

Residuals Formula

Observed - Predicted

New cards

Empirical Rule

68% - 1 SD

95% - 2 SD

99.7 % - 3 SD

New cards

Concavity rule with transforms

If concave up, try power > 1

If concave down, try power < 1

New cards

Outlier vs influential

Outlier: Far from regression line

Influential: Has big impact on the regression fit

New cards

For an influential point, how does studentized compare to standardized residual

|Studentized| > |Standardized|

New cards

What leverage measures

Points potential to be influential

Only looks at how far a points x value is from the mean cloud of x

New cards

Cook’s distance combines…

Residuals for y distance and leverage for x distance

New cards

R² interpretation and formula

Proportion of variability in response explained by the model

SSModel / SSTotal

New cards

3 Ways to test for regression

T-test for the slope
1. Is B1 = 0 or not equal to 0
T-test for correlation
1. Is there a linear relationship between x and y
Overall F-Test
1. Are all slopes = 0 or is at least one of them not equal to 0

New cards

Confidence interval vs Prediction interval (Interpretations) (Which is wider?)

CI: 95% confident that the true mean y value at this x is within these bounds

PI: 95% a new individual value of y at this x is within these bounds

New cards

Three types of Anova Tables

Type 1: Sequential Sum of Squares

Type 2: Hierarchical Sum of Squares

Type 3: Marginal Anova

New cards

Sequential Sum of Squares (What is it? df for predictors and residuals? Compresses to what?)

Additional variability explained when new variable is added to the model (sequentially adding)

Predictors df is how many slopes were needed to include it in the model (always 1 for quantitative)

Residuals: n-k-1

Compresses to overall anova table (model row has all the predictors combined, residuals row is just residuals)

New cards

Hierarchical (What is it? Matches thing…)

Additional variability explained by adding this new variable to a model containing everything else

P values match the table of coefficients

New cards

Maginal Anova

Like hierarchical, but with interaction terms

New cards

VIF Above ___ is typically bad

above 5 (r² > 0.8)

New cards

Note about predictions from a model with multicollinearity

Predictions are fine, but individual coefficient conclusions are not

New cards

“Good” for mallow cp

<= m+1, where m is the number of predictors in the subset model

New cards

CP, AIC, BIC preference for small models

CP and AIC are moderate, BIC prefers small ones a lot

New cards

Methods of picking models (4)

Best Subsets - fits all 2^k models and picks best basedon criterion

Backwards Elimination - starts with full until deleting a term doesn’t improve it (succeptible to multicollinearity)

Forward Selection - Keep adding until no longer improves

Stepwise Regression - adds stuff with forward, but also checks with backwards if something can be remove

New cards

Nested F-test

“Is anything gained by adding these terms to a smaller model”

New cards

Experimental Unit

Thing that is assigned treatment (usually a row in the dataset)

New cards

Balanced?

If each level of the explanatory factor gets the same number of experimental units

New cards

Two ways of writing the equation for an anova model and what each tests (what links?)

Y=μ_i+ε_i

H_0: u_1 = u_2 = … = u_i

H_a: at least one u_i not equal u_j

Y_i=μ+α_i+ε

H_: alpha_1 = alpha_2 = … = alpha_i

H_a: at least one alpha_i not equal to 0

Link: mu_i = u + alpha_i

New cards

group to group vs unit to unit and scales of each if the treatment is important

group to group: different levels

unit to unit: per unit observations

If important, group to group » unit to unit

New cards

In anova table, how do you find F value for row?

Divide MS/MSE

New cards

More b/w group variability ___ p values. More samples ___ p value

decreases, decreases

New cards

Conditions for Anova (3) & how to check

Normality - qq-plot
Equal variance (sd of groups: max/min < 2)
Independence - how data was collected

New cards

Interpreting effect size of difference in means

> 0.5: Moderate

> 1: Large

New cards

What is FWER and ways to control

FWER: Family Wise Error Rate: Chance of making at least one type 1 error with multiple hypothesis tests

New cards

Meaning of an additive effect

If effect of treatment A is the same for all levels of treatment B

New cards

Main effects

Effect of one factor averaged over all levels of the other factors

New cards

Two-Way Factorial design requirements (2)

at least two levels for each factor
Every combination is tested

New cards

Experimental Design Principles

Blocking: Block out a nuisance factor & assign treatments across all levels of that nuisance factor
Comparison: More than one group is needed & one is a placebo/control & include all levels that need to be studied
Crossing: include all combinations of factor levels
Replication: 2+ observations for each cell
Randomization: Randomly assign units

New cards

Two way main effects model (no interaction)

Y = mu + alpha_i + beta_j + e

New cards

Extra condition for two way main effects model

Effects are additive (If not, we would need an interaction term and that’s another model)

New cards

Two way main effects model (w/ interaction)

Y = mu + alpha_i + beta_j + alpha*beta_ij + e

New cards

Explain each form of randomized block: subdivision, matching, reusing

Subdivision: Divide up by a known nuisance factor

Matching: Match people based on a known nuisance factor, and assign a treatment to each within the pair

Reusing: Randomly reuse same experimental unit under each treatment

New cards

Other method of checking condition that’s not sd_max / sd_min (and how to fix)

Checks equal variance. Plot log(sd) / log(mean)

Transform via y^(1-slope)

0 → log(y)

New cards

Odds Formula

Pi / (1 - pi)

New cards

Logit Form Formula (Key thing about error)

Ln(pi/(1-pi)) = B_0 + B_1X + …

No error term

New cards

Logit form converted to a pi= format

pi = e^(B_0 + B_1X) / (1 + e^(B_0 + B_1X)

New cards

Logistic regression’s probabiliy is based on a Y= what case?

Y=1, or that the event happens

New cards

Odds ratio (2 forms) (And interpretations)

When comparing stuff:

Change from a to be is odds_b / odds_a

“Changing from a to be, odds of Y=1 inc/dec by a factor of __”

Unit to unit change

e^B_1

New cards

Theoretical Model for SLR

Y = B_0 + B_1X + e

Plug in appropriate variables for Y and X

New cards

What forward selection uses to know what to add next

Takes one with highest correlation

New cards

What backward elimination uses to know what to remove next

Takes highest p value to remove

New cards

What’s included in a complete second order model?

First order, second order (squared terms), interaction