Statistics for Data Science II

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/56

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

57 Terms

1
New cards

Pearsons Chi Squared Test Methodology

This test is used to for a uniform distribution. This has K different possibilities. May need to calculate the mle of likelihood to estimate hypothesized p if distribution is not same for every possibility.

2
New cards

Pearsons Chi Squared Test Statistic

sum [ (Ok-Ek)^2 / Ek ]

3
New cards

Pearson Chi Squared Test Statistic Null Distribution

Chi squared with K-1 degrees of freedom.

4
New cards

Binomial Test Methodology

This is a Bernoulli random sample. Want to see if a certain scenario occurs with probability p0 with n trials.

5
New cards

Binomial Test Statistic

number of successes

6
New cards

Binomial Test Null Distribution

Bin(n,p0)

7
New cards

Sign Test Methodology

Bernoulli random variable. Used to conduct p=0.5 with n trials.

8
New cards

Sign Test Test Statistic

number of successes

9
New cards

Sign Test Null Distribution

Bin(n,0.5)

10
New cards

Paired Samples Sign Test Methodology

2 paired random Bernoulli samples. Define new variable Vi=Xi-Yi, then P(Vi=1)=p and P(Vi=-1)=1-p. Conduct test that p=0.5.

11
New cards

Paired Samples Sign Test Test Statistic

number of successes out of those that are different

12
New cards

Paired Samples Sign Test Null Distribution

Bin(n,0.5) where n is the number of differences

13
New cards

McNemar's Test Methodology

2 paired random Bernoulli samples. Testing whether the probability of success between the two samples are equal. Use with samples greater than 25, otherwise use sign test.

14
New cards

McNemar's Test Statistic

Qn = (V-W)^2/(V+W)

15
New cards
16
New cards

where V and W are the number of successes in each sample.

17
New cards

McNemar's Test Null Distribution

Chi squared with 1 degree of freedom

18
New cards

Fisher's Exact Test Methodology

2 independent random samples. Testing to see if the probability of success is the same for both samples.

19
New cards

Fisher's Exact Test Test Statistic

Sum of the total number of successes in the first sample

20
New cards

Fisher's Exact Test Null Distribution

Hyper(n+m,n,s)

21
New cards
22
New cards

where n+m is total observation in both samples, and s is the number of total successes in the two samples.

23
New cards

Pearson's Chi Squared Test of Homogeneity Methodology

L independent random samples. Want to see if the expected probability of each outcome in the samples are equal to a hypothesized probability.

24
New cards

Pearson's Chi Squared Test of Homogeneity Test Statistic

Double summation over the samples and categories (Okl-Ekl)^2/Ekl

25
New cards

Pearson's Chi Squared Test of Homogeneity Null Distribution

Chi squared with (K-1)(L-1) degrees of freedom

26
New cards

Pearson's Chi Squared Test of Independence Methodology

Two measurements on the same sample. Testing whether the two samples are independent.

27
New cards

Pearson's Chi Squared Test of Independence Test Statistic

double summation over all possible combinations of categories (Okl-Ekl)^2/Ekl

28
New cards

Pearson's Chi Squared Test of Independence Null Distribution

Chi squared distribution with (K-1)(L-1) degrees of freedom

29
New cards

Wilcoxon's One Sample Signed Rank Test Methodology

A random sample with median m. Want to see if the median is equal to the hypothesized median.

30
New cards

Wilcoxon's One Sample Signed Rank Test Test Statistic

sum of sgn(Y)R

31
New cards
32
New cards

where Y is the sign of the observation minus the hypothesized median and R is the rank of absolute values of Y.

33
New cards

Wilcoxon's One Sample Signed Rank Test Null Distribution

The expected value of the test statistic should be equal to 0.

34
New cards

Wilcoxon's Paired Samples Signed-Rank Test Methodology

2 paired random samples with medians mx and my. Want to test if the two medians are equal. Define a new variable where Vi=Xi-Yi

35
New cards

Wilcoxon's Paired Samples Signed-Rank Test Test Statistic

summation of sign of (V)R

36
New cards
37
New cards

where V is the difference between the two samples and R is the ranks of the absolute values of V.

38
New cards

Wilcoxon's Paired Samples Signed-Rank Test Null Distribution

This should equal 0.

39
New cards

Wilcoxon's Rank Sum Test Methodology

Two independent random samples with medians mx and my. Want to conduct a test to see if the medians are equal. Merge samples into one sample V. Assign ranks R. RX and RY are sum of the ranks of each sample.

40
New cards

Wilcoxon's Rank Sum Test Test Statistic

min { nm+(n)(n+1)/2-RX, nm+(m)(m+1)/2-RY

41
New cards

Wilcoxon's Rank Sum Test Null Distribution

Should be equal to 0.

42
New cards

Linear Constraints Tests

Have a linear model. Want to conduct a test for RB=r.

43
New cards

Linear Constraints: Exact F Test Null Distribution

F distribution with k and n-p degrees of freedom, where k represents the number of coefficients p is the number of predictors and n is the number of observations

44
New cards

Linear Constraints: Wald Test Null Distribution

Chi squared with k degrees of freedom, where k represents the number of coefficients.

45
New cards

Wald Test assumptions

Large sample size n>100, uses unbiased estimate of variance under full model

46
New cards

Linear Constraints: Score Test Null Distribution

Chi squared with k degrees of freedom, where k represents the number of coefficients.

47
New cards

Score Test assumptions

Works for smaller sample sizes, utilizes the MLE of the residual variance under the reduced model

48
New cards

Linear Constraints: Likelihood Ratio Test Null Distribution

Chi squared with k degrees of freedom, where k represents the number of coefficients

49
New cards

Likelihood Ratio Test Assumptions

Combination of Wald and Score test

50
New cards

Jackknife

Exclude the ith observation, fit a linear model, get a b hat, and then estimate variance and mean

51
New cards

Nonparametric Bootstrap

Take a bootstrap sample from both Y and X with replacement, regress and calculate b hat and then estimate variance and mean

52
New cards

Semiparametric Bootstrap

Calculate the least squares estimate of B hat and the residual vector, take a bootstrap sample of the residuals from the residual vector, utilize the b hat and X to find Y, regress Y on X, get b hat and the estimate variance and mean

53
New cards

Parametric Bootstrap

Calculate the least squares estimate of b hat and unbiased estimate of S^2 of the residual variance, simulate residual vector from normal distribution, calculate Y from X and b hat and then regress Y on X. Estimate the variance and mean of b hat

54
New cards

Permutation F Test Algorithm

Calculate the observed value of the F test statistics on the observed sample, take a random permutation of the response vector Y, and calculate the observed value of the f test statistic, estimate p value

55
New cards

Permutation t Test Algorithm

Calculate the observed value of the test statistic for b hat on the observed sample, regress Y on all the covariates except for Xj, calculate the fitted values and residuals. Take a random permutation of the residuals and add it to the calculated fitted values, regress the new data and calculate the new test statistic. estimate the p value of the t test.

56
New cards

Breusch-Pagan Test Algorithm

Regress Y on X and calculate the residual vector. Define the auxiliary variable Y = residuals^2 and regress it on X and calculate the R2. Calculate the observed value of the test statistic as nR^2 which then goes to a chi squared distribution under H0 and calculate corresponding p value

57
New cards

Brown-Forsythe Test Algorithm

Calculate the median Y value within the level. Define the auxiliary variable |Yaux= Y-med|, regress Yaux on X and conduct an F test.