Inferences About the Difference of Two Populations
Inferential Statistics
- Inferential statistics uses sample statistics to make decisions or predictions about population parameters.
- Focuses on two main areas: Estimation and Tests of Hypothesis.
Statistical Examples
- Example 1: Estimating the difference between the mean salaries of male and female executives involves two samples from two different populations.
- Example 2: Estimating the difference in the weight of participants before and after a weight loss program involves two samples from one population.
Dependent versus Independent Samples
- Independent Samples: Drawn from two different populations where elements of the first sample have no relation to elements of the second sample.
- Dependent Samples: Elements of the two samples are related; for each value collected from one sample, there is a corresponding data value from the second sample.
- Comparison can only be made within pairs.
Inferences about the Difference Between Two Population Means
- Two Sample Population Means
- Independent Samples: Comparing Group 1 vs. Group 2 independently.
- Examples:
- Use a z-value when σ1 and σ2 are known.
- Use a t-value when σ1 and σ2 are unknown, and it's assumed they are equal or unequal.
- Examples:
- Dependent Samples: Comparing the Same group before vs. after treatment.
- Independent Samples: Comparing Group 1 vs. Group 2 independently.
Inferences About the Difference Between Two Population Means (µ1 - µ2) for Independent Samples
- Sample 1: Random sample of size n1 from population 1, with mean µ1 and variance σ_1^2.
- Sample 2: Random sample of size n2 from population 2, with mean µ2 and variance σ_2^2.
- Interval Estimation for µ1 - µ2
- Hypothesis testing about µ1 - µ2
- Test statistic = \frac{\text{Point estimate – Parameter value at } H_0}{\text{Standard deviation of point estimator (Estimated or true)}}
- Point estimate ± Margin of error
Inferences About the Difference Between Two Population Means for Independent Samples: σ1 and σ2 are known
*Large samples: n1 ≥ 30 and n2 ≥ 30
- Interval Estimation for µ1 - µ2: (\bar{x1} - \bar{x2}) ± z{\frac{\alpha}{2}} \sqrt{\frac{σ1^2}{n1} + \frac{σ2^2}{n_2}}
- Hypothesis testing about µ1-µ2:
- Test statistics (Z-statistics): z = \frac{\bar{x1} - \bar{x2} - d0}{\sqrt{\frac{σ1^2}{n1} + \frac{σ2^2}{n_2}}}
Example 1
- A survey of credit card holders revealed that Americans carried an average credit card balance of $4600 in city A and $4000 in city B.
- These averages are based on random samples of 500 credit card holders in city A and 400 credit card holders in city B, with population standard deviations of $1000 in city A and $800 in city B.
- Construct a 95% confidence interval for the difference between the mean credit card balance for all credit card holders in city A and B.
- Test at the 5% significance level if mean credit card balances for credit card holders in city A and city B were different.
- Given data:
- n1 = 500, \bar{x1} = 4600, σ_1 = 1000
- n2 = 400, \bar{x2} = 4000, σ_2 = 800
- H0: µ1 – µ2 = 0 (µ1 = µ_2)
- H1: µ1 – µ2 ≠ 0 (µ1 ≠ µ_2)
- \alpha = 0.05
- Test statistic (\bar{x1} - \bar{x2}) ± z{\frac{\alpha}{2}} \sqrt{\frac{σ1^2}{n1} + \frac{σ2^2}{n_2}}
- Solution to Example 1:
- 600 482.4 717.6 => reject Но
Inferences About the Difference Between Two Population Means for Independent Samples: σ1 and σ2 are Unknown
- The t-distribution is used to make inferences about µ1 - µ2 when the following assumptions hold true:
- The two populations from which the two samples are drawn are (approximately) normally distributed.
- The two population standard deviations are unknown but
- They are equal (σ1 = σ2).
- They are not equal (σ1 ≠ σ2).
Inferences About the Difference Between Two Population Means for Independent Samples: Unknown but Equal Standard Deviations
- The Pooled Standard Variance for two Samples:
- sp^2 = \frac{(n1-1)s1^2 + (n2-1)s2^2}{n1 + n_2 - 2}
- Interval Estimation for µ1 - µ2:
- (\bar{x1} - \bar{x2}) ± t{\frac{\alpha}{2}, df} sp \sqrt{\frac{1}{n1} + \frac{1}{n2}}
- df = n1 + n2 - 2
- Hypothesis testing about µ1 - µ2:
- Test statistics (t-statistics): t = \frac{\bar{x1} - \bar{x2} - d0}{sp \sqrt{\frac{1}{n1} + \frac{1}{n2}}}
Example 2
- The following information was obtained from two independent samples selected from two normally distributed populations with unknown but equal standard deviations.
- n1 = 16, \bar{x1} = 27, S_1 = 4
- n2 = 9, \bar{x2} = 23, S_2 = 3
- Construct a 95% confidence interval for µ1 - µ2.
- Test at the 5% significance level if µ1 is not equal to µ2.
- H0: µ1 – µ2 = 0 (µ1 = µ_2)
- H1: µ1 – µ2 ≠ 0 (µ1 ≠ µ_2)
- \alpha = 0.05
- (\bar{x1} - \bar{x2}) ± t{\frac{\alpha}{2}, df} sp \sqrt{\frac{1}{n1} + \frac{1}{n2}}
- sp^2 = \frac{(n1-1)s1^2 + (n2-1)s2^2}{n1 + n_2 - 2}
- df = n1 + n2 - 2
SOLUTION TO EXAMPLE 2
- 4 0.82 7.18
- Non-rejection region: => reject Но
Inferences About the Difference Between Two Population Means for Independent Samples: unknown and Unequal Standard Deviations
- Interval Estimation for µ1 - µ2:
- (\bar{x1} - \bar{x2}) ± t{\frac{\alpha}{2}, df} \sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}
- (df is always rounded down)
- Hypothesis testing about µ1 - µ2:
- Test statistics (t-statistics): t = \frac{\bar{x1} - \bar{x2} - d0}{\sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}}
- df = \frac{(\frac{s1^2}{n1} + \frac{s2^2}{n2})^2}{\frac{(\frac{s1^2}{n1})^2}{n1-1} + \frac{(\frac{s2^2}{n2})^2}{n2-1}}
EXAMPLE 3
- A manufacturing company is interested in buying one of two different kinds of machines. The company tested two kinds of machines for production purposes. The first machine was run for 8 hours. It produced an average of 126 items per hour with a standard deviation of 9 items. The second machine was run for 10 hours. It produced an average of 117 items per hour with a standard deviation of 6 items. Assume that the production per hour for each machine is (approximately) normally distributed. Further assume that the standard deviations of the hourly production of the two populations are unequal.
- Using the 5% significance level, can you conclude that the mean number of items produced per hour by the first machine is higher than that of the second machine?
- Using the 5% significance level, can you conclude that the mean number of items produced per hour by the first machine is more then 5 items per hour than that of the second machine?
- df=11.71
SOLUTION TO EXAMPLE 3
- n1 = 8, \bar{x1} = 126, S_1 = 9
- n2 = 10, \bar{x2} = 117, S_2 = 6
- Using the 5% significance level, can you conclude that the mean number of items produced per hour by the first machine is higher than that of the second machine?
- Using the 5% significance level, can you conclude that the mean number of items produced per hour by the first machine is more then 5 items per hour than that of the second machine?
- H0: µ1 – µ2 ≤ 0, µ1 ≤ µ_2
- H1: µ1 – µ2 > 0, µ1 > µ_2
- H0: µ1 – µ_2 ≤ 5
- H1: µ1 – µ_2 > 5
- t = \frac{\bar{x1} - \bar{x2} - d0}{\sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}}
- 1.796
- df = 11.71 => df ≈ 11
- \alpha = 0.05
- = 2.43 = 1.08 => reject Но, accept H1 => fail to reject Но
- t = \frac{\bar{x1} - \bar{x2} - d0}{\sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}}
- t = \frac{\bar{x1} - \bar{x2} - d0}{\sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}}
Inferences About the Difference Between Two Population Means for Dependent (or Matched Paired) Samples
- Mean and Standard Deviation of the paired differences for two samples:
- We treat the differences as a random sample of size n from a population having mean µ_d.
- Sampling Distribution of \bar{d}:
- if n ≥ 30: t-distribution or normal distribution.
- if n < 30 and the population of paired differences is (approximately) normally distributed: t-distribution.
- Interval Estimation for µ_D : (t-distribution: df = n - 1)
- \bar{d} ± t{\frac{\alpha}{2},df} \frac{sd}{\sqrt{n}}
- Hypothesis Testing about µ_d :
- t = \frac{\bar{d}}{ \frac{sd}{\sqrt{n}}} - d0
EXAMPLE 4
- A researcher wanted to find the effect of a special diet on systolic blood pressure. He selected a sample of seven adults and put them on this dietary plan for three months. The following table gives the systolic blood pressure of these seven adults before and after the completion of this plan.
- Using the 5% significance level, can you conclude that the mean reduction in the systolic blood pressure is a result of attending this special dietary program? Assume that the population of paired differences is (approximately) normally distributed.
- Table:
- Before: 210, 180, 195, 220, 231, 199, 224
- After: 193, 186, 186, 223, 220, 183, 233
Summary Table
- Before, After, d, (d – d)2
- 210, 193, 210 – 193 = 17, (17 – 5)2 = 144
- 180, 186, -6, 121
- 195, 186, 9, 16
- 220, 223, -3, 64
- 231, 220, 11, 36
- 199, 183, 16, 121
- 244, 233, -9, 196
- ∑d = 35
- ∑(d – d)2 = 698
SOLUTION: EXAMPLE 4
- H0: mD = 0, H1: mD > 0
- df = 7 1 = 6 and = 0.05
- This is a matched-pairs experiment where d = X1 – X2
- Fail to reject H0 .
- critical t =1.943
- t = \frac{5 - 0}{\sqrt{\frac{698}{7*6}}} = 1.23
Inferences Concerning Variance
- Hypothesis Test Concerning Two Variances
Hypothesis Test Concerning Two Variances
- In this section we describe a test of the null hypothesis: , which applies to independent samples from normal populations.
- If S1^2 and S2^2 are the variances of two independent random samples of size n1 and n2, respectively, taken from two normal populations, then
- F = \frac{S1^2}{S2^2} is a random variable having the F-distribution with the parameters df1 = n1 – 1 and df2 = n2 – 1.
F - Distribution
- The parameters df1 and df2 are called the numerator and denominator degree of freedom.
F-Distribution Table
- F corresponding to the left- tailed probabilities (1-α):
- F{\alpha (df1,df_2)}
- F{1-\alpha (df2,df1)} = \frac{1}{F{\alpha (df1,df2)}}
A One-Tailed F-test of the Equality of Two Variances
- Null hypothesis: σ1^2 = σ2^2
- Alternative hypothesis: σ1^2 > σ2^2
- Level of significance: \alpha
- Test statistics: F = \frac{S1^2}{S2^2}
- Reject null hypothesis if: F > F{\alpha (n1-1, n_2-1)}
- Null hypothesis: σ1^2 = σ2^2
- Alternative hypothesis: σ1^2 < σ2^2
- Level of significance: \alpha
- Test statistics: F = \frac{S2^2}{S1^2}
- Reject null hypothesis if: F > F{\alpha (n2-1, n_1-1)}
A One-Tailed F-test of the Equality of Two Variances - Decision Rules
- Reject H0 Do not reject H0
- H0: σ1 2 ≤ σ2 2 H1: σ1 2 > σ2 2
- Fn11,n2 1,α
- Reject H if F Fn11,n2 1,α
A Two-Tailed F-test of the Equality of Two Variances
- Null hypothesis: σ1^2 = σ2^2
- Alternative hypothesis: σ1^2 ≠ σ2^2
- Level of significance: \alpha
- Test statistics: F = \frac{S1^2}{S2^2}
- Critical regions F < F{1-\alpha/2}, or F > F{\alpha/2}
- In Practice:
- the larger variance,
- the smaller variance
- Test statistics: F = \frac{SM^2}{Sm^2}
- Reject null hypothesis if: F > F{\alpha/2 (nM-1, n_m-1)}
A Two-Tailed F-test of the Equality of Two Variances - Decision Rules
- Reject H0 Do not reject H0
- H0: σ1 2 = σ2 2 H1: σ1 2 ≠ σ2 2
- where s1 2 is the larger of the two sample variances, s2 2 is the smaller of the two sample variances
- Reject H if F Fα / 2 n11,n2 1
Example
- The following information was obtained from two independent random samples of size 26 and 25 selected from two normally distributed populations with S1 = 1.30 and S2 = 1.16.
- Test the null hypothesis σ1 2 = σ2 2 against the alternative hypothesis σ1 2 ≠ σ2 2 at the 0.10 level of significance.
F Test: Example Solution
- The test statistic is: \alpha/2 = .05
- Reject H0 Do not reject H0
- H0: σ1 2 = σ2 2 H1: σ1 2 ≠ σ2 2
- F = 1.256 is not in the rejection region, so we do not reject H0
- Conclusion: There is not sufficient evidence of a difference in variances at = .10
- F = \frac{sx^2}{sy^2} = \frac{1.30^2}{1.16^2} = 1.256
- F_{\alpha/2 (25, 24)} = 1.97
Two-Sample Tests in EXCEL
- For paired samples (t test):
- – Data | data analysis | t-test: paired two sample for means
- For independent samples:
- Independent sample z test with variances known:
- – Data | data analysis | z-test: two sample for means
- Independent sample z test with variances known:
- For independent samples:
- Independent sample t test with variances unknown:
- – Data | data analysis | t-test: two sample assuming equal/unequal variances
- Independent sample t test with variances unknown:
- For variances:
- F test for two variances:
- – Data | data analysis | F-test: two sample for variances
- F test for two variances: