Objective: Determine if the oil price is different in January and June, assuming the distributions of the oil prices in January and June samples are of the same shape.
Significance Level: 5%.
Assumption: Samples are not normally distributed.
Hypotheses:
Data and Ranks:
Oil Price | Month | Actual Ranks |
---|---|---|
66.7 | Jan | 1 |
68 | Jan | 2 |
68.9 | Jan | 3 |
69.5 | Jan | 4 |
70.3 | Jan | 5 |
70.9 | June | 6 |
71 | June | 7 |
72 | June | 8 |
72.1 | Jan | 9 |
72.5 | June | 10 |
73.1 | June | 11 |
75.5 | June | 12 |
Calculations:
R_1 (Sum of ranks for January) = 1 + 2 + 3 + 4 + 5 + 9 = 24
R_2 (Sum of ranks for June) = 6 + 7 + 8 + 10 + 11 + 12 = 54
n_1 = 6 (Sample size for January)
n_2 = 6 (Sample size for June)
u1 = R1 - \frac{n1(n1 + 1)}{2} = 24 - \frac{6 \times 7}{2} = 24 - 21 = 3
u2 = R2 - \frac{n2(n2 + 1)}{2} = 54 - \frac{6 \times 7}{2} = 54 - 21 = 33
u = Min(u1, u2) = Min(3, 33) = 3
Critical Value:
Decision:
Conclusion:
Find the values of \chi^2{0.05, 16} and \chi^2{0.025, 9} using tables.
R code to calculate these values:
qchisq(0.05, 16, lower.tail = F)
qchisq(0.025, 9, lower.tail = F)
For X \sim \chi^2_v:
E(X) = v
Var(X) = 2v
SD(X) = \sqrt{2v}
If X \sim \chi^2_7:
Data:
Infected | Not Infected | Totals | |
---|---|---|---|
Inoculated | 5 | 241 | 246 |
Not Inoculated | 90 | 292 | 382 |
Totals | 95 | 533 | 628 |
Assumptions:
Expected Frequencies:
Yates' Continuity Correction:
Degrees of Freedom:
Critical Value:
Decision:
Conclusion:
Effect Size (Phi Coefficient):
If the expected frequencies assumption for using the \chi^2 test of association fails, Fisher's exact test may be used.
R Output Analysis:
Pearson's Chi-squared test
data: Personality Matrix
X-squared = 71.2, df = 3, p-value = 2.362e-15
Post Hoc Method:
Ordinal Data:
The linear-by-linear association procedure is used when we have ordinal data in contingency tables.
Typical Use:
Probability Calculation:
Linear Predictors in GLMs:
Log Link Function for Count Data:
R Code for Poisson Regression and Wald Test:
R code for performing a Poisson regression on the Surgery data where the dependent variable is Surgery Visits and the independent variable is Location:
SurgeryPoissonReg <- glm(SurgeryVisits ~ Location, data = Surgery, family = "poisson")
summary(SurgeryPoissonReg)
R code for performing a Wald test in this scenario:
library(lmtest)
waldtest(SurgeryPoissonReg, test = "Chisq")
Distribution of Dependent Variable:
Linearity:
Methods of Evaluating a Logistic Regression Model:
Odds and Logit Calculation:
odds = \frac{p}{1 - p} = \frac{0.85}{0.15} = 5.67
logit(p) = ln(odds) = ln(5.67) = 1.73
Logit Equation and Probability Calculation:
Let y = logit(p') = ln(\frac{p'}{1-p'})
e^y = \frac{p'}{1-p'} = e^{3 + 2x_1}
p' = (1 - p')e^y
p' = e^y - p'e^y
p' + p'e^y = e^y
p'(1 + e^y) = e^y
p' = \frac{e^y}{1 + e^y} = \frac{e^{3 + 2x1}}{1 + e^{3 + 2x1}}