Inferences About the Difference of Two Populations

Inferential Statistics

Inferential statistics uses sample statistics to make decisions or predictions about population parameters.
Focuses on two main areas: Estimation and Tests of Hypothesis.

Statistical Examples

Example 1: Estimating the difference between the mean salaries of male and female executives involves two samples from two different populations.
Example 2: Estimating the difference in the weight of participants before and after a weight loss program involves two samples from one population.

Dependent versus Independent Samples

Independent Samples: Drawn from two different populations where elements of the first sample have no relation to elements of the second sample.
Dependent Samples: Elements of the two samples are related; for each value collected from one sample, there is a corresponding data value from the second sample.
- Comparison can only be made within pairs.

Inferences about the Difference Between Two Population Means

Two Sample Population Means
- Independent Samples: Comparing Group 1 vs. Group 2 independently.
 - Examples:
 - Use a z-value when $σ1$ and $σ2$ are known.
 - Use a t-value when $σ1$ and $σ2$ are unknown, and it's assumed they are equal or unequal.
- Dependent Samples: Comparing the Same group before vs. after treatment.

Inferences About the Difference Between Two Population Means ( $µ1 - µ2$ ) for Independent Samples

Sample 1: Random sample of size $n1$ from population 1, with mean $µ1$ and variance $σ_1^2$ .
Sample 2: Random sample of size $n2$ from population 2, with mean $µ2$ and variance $σ_2^2$ .
Interval Estimation for $µ1 - µ2$
Hypothesis testing about $µ1 - µ2$
- Test statistic = $\frac{\text{Point estimate – Parameter value at } H_0}{\text{Standard deviation of point estimator (Estimated or true)}}$
- Point estimate $±$ Margin of error

Inferences About the Difference Between Two Population Means for Independent Samples: $σ1$ and $σ2$ are known

*Large samples: $n1 ≥ 30$ and $n2 ≥ 30$

Interval Estimation for $µ1 - µ2$ : $(\bar{x1} - \bar{x2}) ± z{\frac{\alpha}{2}} \sqrt{\frac{σ1^2}{n1} + \frac{σ2^2}{n_2}}$
Hypothesis testing about $µ1-µ2$ :
Test statistics (Z-statistics): $z = \frac{\bar{x1} - \bar{x2} - d0}{\sqrt{\frac{σ1^2}{n1} + \frac{σ2^2}{n_2}}}$

Example 1

A survey of credit card holders revealed that Americans carried an average credit card balance of $4600 in city A and $4000 in city B.
These averages are based on random samples of 500 credit card holders in city A and 400 credit card holders in city B, with population standard deviations of $1000 in city A and $800 in city B.
1. Construct a 95% confidence interval for the difference between the mean credit card balance for all credit card holders in city A and B.
2. Test at the 5% significance level if mean credit card balances for credit card holders in city A and city B were different.
Given data:
- $n1 = 500$ , $\bar{x1} = 4600$ , $σ_1 = 1000$
- $n2 = 400$ , $\bar{x2} = 4000$ , $σ_2 = 800$
- $H0: µ1 – µ2 = 0$ ( $µ1 = µ_2$ )
- $H1: µ1 – µ2 ≠ 0$ ( $µ1 ≠ µ_2$ )
- $\alpha = 0.05$
Test statistic $(\bar{x1} - \bar{x2}) ± z{\frac{\alpha}{2}} \sqrt{\frac{σ1^2}{n1} + \frac{σ2^2}{n_2}}$
Solution to Example 1:
- 600 482.4 717.6 => reject Но

Inferences About the Difference Between Two Population Means for Independent Samples: $σ1$ and $σ2$ are Unknown

The t-distribution is used to make inferences about $µ1 - µ2$ when the following assumptions hold true:
1. The two populations from which the two samples are drawn are (approximately) normally distributed.
2. The two population standard deviations are unknown but
 - They are equal ( $σ1 = σ2$ ).
 - They are not equal ( $σ1 ≠ σ2$ ).

Inferences About the Difference Between Two Population Means for Independent Samples: Unknown but Equal Standard Deviations

The Pooled Standard Variance for two Samples:
- $sp^2 = \frac{(n1-1)s1^2 + (n2-1)s2^2}{n1 + n_2 - 2}$
Interval Estimation for $µ1 - µ2$ :
- $(\bar{x1} - \bar{x2}) ± t{\frac{\alpha}{2}, df} sp \sqrt{\frac{1}{n1} + \frac{1}{n2}}$
- $df = n1 + n2 - 2$
Hypothesis testing about $µ1 - µ2$ :
- Test statistics (t-statistics): $t = \frac{\bar{x1} - \bar{x2} - d0}{sp \sqrt{\frac{1}{n1} + \frac{1}{n2}}}$

Example 2

The following information was obtained from two independent samples selected from two normally distributed populations with unknown but equal standard deviations.
- $n1 = 16$ , $\bar{x1} = 27$ , $S_1 = 4$
- $n2 = 9$ , $\bar{x2} = 23$ , $S_2 = 3$
1. Construct a 95% confidence interval for $µ1 - µ2$ .
2. Test at the 5% significance level if $µ1$ is not equal to $µ2$ .
 - $H0: µ1 – µ2 = 0$ ( $µ1 = µ_2$ )
 - $H1: µ1 – µ2 ≠ 0$ ( $µ1 ≠ µ_2$ )
 - $\alpha = 0.05$
$(\bar{x1} - \bar{x2}) ± t{\frac{\alpha}{2}, df} sp \sqrt{\frac{1}{n1} + \frac{1}{n2}}$
$sp^2 = \frac{(n1-1)s1^2 + (n2-1)s2^2}{n1 + n_2 - 2}$
$df = n1 + n2 - 2$

SOLUTION TO EXAMPLE 2

4 0.82 7.18
Non-rejection region: => reject Но

Inferences About the Difference Between Two Population Means for Independent Samples: unknown and Unequal Standard Deviations

Interval Estimation for $µ1 - µ2$ :
- $(\bar{x1} - \bar{x2}) ± t{\frac{\alpha}{2}, df} \sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}$
(df is always rounded down)
Hypothesis testing about $µ1 - µ2$ :
- Test statistics (t-statistics): $t = \frac{\bar{x1} - \bar{x2} - d0}{\sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}}$
- $df = \frac{(\frac{s1^2}{n1} + \frac{s2^2}{n2})^2}{\frac{(\frac{s1^2}{n1})^2}{n1-1} + \frac{(\frac{s2^2}{n2})^2}{n2-1}}$

EXAMPLE 3

A manufacturing company is interested in buying one of two different kinds of machines. The company tested two kinds of machines for production purposes. The first machine was run for 8 hours. It produced an average of 126 items per hour with a standard deviation of 9 items. The second machine was run for 10 hours. It produced an average of 117 items per hour with a standard deviation of 6 items. Assume that the production per hour for each machine is (approximately) normally distributed. Further assume that the standard deviations of the hourly production of the two populations are unequal.
1. Using the 5% significance level, can you conclude that the mean number of items produced per hour by the first machine is higher than that of the second machine?
2. Using the 5% significance level, can you conclude that the mean number of items produced per hour by the first machine is more then 5 items per hour than that of the second machine?
df=11.71

SOLUTION TO EXAMPLE 3

$n1 = 8$ , $\bar{x1} = 126$ , $S_1 = 9$
$n2 = 10$ , $\bar{x2} = 117$ , $S_2 = 6$
1. Using the 5% significance level, can you conclude that the mean number of items produced per hour by the first machine is higher than that of the second machine?
2. Using the 5% significance level, can you conclude that the mean number of items produced per hour by the first machine is more then 5 items per hour than that of the second machine?
$H0: µ1 – µ2 ≤ 0, µ1 ≤ µ_2$
H1: µ1 – µ2 > 0, µ1 > µ_2
$H0: µ1 – µ_2 ≤ 5$
H1: µ1 – µ_2 > 5
$t = \frac{\bar{x1} - \bar{x2} - d0}{\sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}}$
1.796
df = 11.71 => df ≈ 11
$\alpha = 0.05$
= 2.43 = 1.08 => reject Но, accept H1 => fail to reject Но
$t = \frac{\bar{x1} - \bar{x2} - d0}{\sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}}$
$t = \frac{\bar{x1} - \bar{x2} - d0}{\sqrt{\frac{s1^2}{n1} + \frac{s2^2}{n_2}}}$

Inferences About the Difference Between Two Population Means for Dependent (or Matched Paired) Samples

Mean and Standard Deviation of the paired differences for two samples:
- We treat the differences as a random sample of size n from a population having mean $µ_d$ .
Sampling Distribution of $\bar{d}$ :
- if $n ≥ 30$ : t-distribution or normal distribution.
- if n < 30 and the population of paired differences is (approximately) normally distributed: t-distribution.
Interval Estimation for $µ_D$ : (t-distribution: df = n - 1)
- $\bar{d} ± t{\frac{\alpha}{2},df} \frac{sd}{\sqrt{n}}$
Hypothesis Testing about $µ_d$ :
- $t = \frac{\bar{d}}{ \frac{sd}{\sqrt{n}}} - d0$

EXAMPLE 4

A researcher wanted to find the effect of a special diet on systolic blood pressure. He selected a sample of seven adults and put them on this dietary plan for three months. The following table gives the systolic blood pressure of these seven adults before and after the completion of this plan.
Using the 5% significance level, can you conclude that the mean reduction in the systolic blood pressure is a result of attending this special dietary program? Assume that the population of paired differences is (approximately) normally distributed.
Table:
- Before: 210, 180, 195, 220, 231, 199, 224
- After: 193, 186, 186, 223, 220, 183, 233

Summary Table

Before, After, d, (d – d)2
210, 193, 210 – 193 = 17, (17 – 5)2 = 144
180, 186, -6, 121
195, 186, 9, 16
220, 223, -3, 64
231, 220, 11, 36
199, 183, 16, 121
244, 233, -9, 196
∑d = 35
∑(d – d)2 = 698

SOLUTION: EXAMPLE 4

H0: mD = 0, H1: mD > 0
df = 7  1 = 6 and  = 0.05
This is a matched-pairs experiment where d = X1 – X2
Fail to reject H0 .
critical t =1.943
$t = \frac{5 - 0}{\sqrt{\frac{698}{7*6}}} = 1.23$

Inferences Concerning Variance

Hypothesis Test Concerning Two Variances

Hypothesis Test Concerning Two Variances

In this section we describe a test of the null hypothesis: , which applies to independent samples from normal populations.
If $S1^2$ and $S2^2$ are the variances of two independent random samples of size $n1$ and $n2$ , respectively, taken from two normal populations, then
$F = \frac{S1^2}{S2^2}$ is a random variable having the F-distribution with the parameters $df1 = n1 – 1$ and $df2 = n2 – 1$ .

F - Distribution

The parameters df1 and df2 are called the numerator and denominator degree of freedom.

F-Distribution Table

F corresponding to the left- tailed probabilities (1-α):
- $F{\alpha (df1,df_2)}$
- $F{1-\alpha (df2,df1)} = \frac{1}{F{\alpha (df1,df2)}}$

A One-Tailed F-test of the Equality of Two Variances

Null hypothesis: $σ1^2 = σ2^2$
Alternative hypothesis: $σ1^2 > σ2^2$
Level of significance: $\alpha$
Test statistics: $F = \frac{S1^2}{S2^2}$
Reject null hypothesis if: F > F{\alpha (n1-1, n_2-1)}
Null hypothesis: $σ1^2 = σ2^2$
Alternative hypothesis: $σ1^2 < σ2^2$
Level of significance: $\alpha$
Test statistics: $F = \frac{S2^2}{S1^2}$
Reject null hypothesis if: F > F{\alpha (n2-1, n_1-1)}

A One-Tailed F-test of the Equality of Two Variances - Decision Rules

Reject H0 Do not reject H0
H0: σ1 2 ≤ σ2 2 H1: σ1 2 > σ2 2
Fn11,n2 1,α
Reject H if F  Fn11,n2 1,α

A Two-Tailed F-test of the Equality of Two Variances

Null hypothesis: $σ1^2 = σ2^2$
Alternative hypothesis: $σ1^2 ≠ σ2^2$
Level of significance: $\alpha$
Test statistics: $F = \frac{S1^2}{S2^2}$
Critical regions F < F{1-\alpha/2}, or $F > F{\alpha/2}$
In Practice:
- the larger variance,
- the smaller variance
Test statistics: $F = \frac{SM^2}{Sm^2}$
Reject null hypothesis if: F > F{\alpha/2 (nM-1, n_m-1)}

A Two-Tailed F-test of the Equality of Two Variances - Decision Rules

Reject H0 Do not reject H0
H0: σ1 2 = σ2 2 H1: σ1 2 ≠ σ2 2
where s1 2 is the larger of the two sample variances, s2 2 is the smaller of the two sample variances
Reject H if F  Fα / 2 n11,n2 1

Example

The following information was obtained from two independent random samples of size 26 and 25 selected from two normally distributed populations with S1 = 1.30 and S2 = 1.16.
Test the null hypothesis σ1 2 = σ2 2 against the alternative hypothesis σ1 2 ≠ σ2 2 at the 0.10 level of significance.

F Test: Example Solution

The test statistic is: $\alpha/2 = .05$
Reject H0 Do not reject H0
H0: σ1 2 = σ2 2 H1: σ1 2 ≠ σ2 2
F = 1.256 is not in the rejection region, so we do not reject H0
Conclusion: There is not sufficient evidence of a difference in variances at  = .10
$F = \frac{sx^2}{sy^2} = \frac{1.30^2}{1.16^2} = 1.256$
$F_{\alpha/2 (25, 24)} = 1.97$

Two-Sample Tests in EXCEL

For paired samples (t test):
- – Data | data analysis | t-test: paired two sample for means
For independent samples:
- Independent sample z test with variances known:
  - – Data | data analysis | z-test: two sample for means
For independent samples:
- Independent sample t test with variances unknown:
  - – Data | data analysis | t-test: two sample assuming equal/unequal variances
For variances:
- F test for two variances:
  - – Data | data analysis | F-test: two sample for variances

Inferences About the Difference of Two Populations

Inferential Statistics

Statistical Examples

Dependent versus Independent Samples

Inferences about the Difference Between Two Population Means

Inferences About the Difference Between Two Population Means ( $µ<em>1 - µ</em>2$ ) for Independent Samples

Inferences About the Difference Between Two Population Means for Independent Samples: $σ<em>1$ and $σ</em>2$ are known

Example 1

Inferences About the Difference Between Two Population Means for Independent Samples: $σ<em>1$ and $σ</em>2$ are Unknown

Inferences About the Difference Between Two Population Means for Independent Samples: Unknown but Equal Standard Deviations

Example 2

SOLUTION TO EXAMPLE 2

Inferences About the Difference Between Two Population Means for Independent Samples: unknown and Unequal Standard Deviations

EXAMPLE 3

SOLUTION TO EXAMPLE 3

Inferences About the Difference Between Two Population Means for Dependent (or Matched Paired) Samples

EXAMPLE 4

Summary Table

SOLUTION: EXAMPLE 4

Inferences Concerning Variance

Hypothesis Test Concerning Two Variances

F - Distribution

F-Distribution Table

A One-Tailed F-test of the Equality of Two Variances

A One-Tailed F-test of the Equality of Two Variances - Decision Rules

A Two-Tailed F-test of the Equality of Two Variances

A Two-Tailed F-test of the Equality of Two Variances - Decision Rules

Example

F Test: Example Solution

Two-Sample Tests in EXCEL

Inferences About the Difference of Two Populations

Inferential Statistics

Statistical Examples

Dependent versus Independent Samples

Inferences about the Difference Between Two Population Means

Inferences About the Difference Between Two Population Means (µ<em>1−µ</em>2µ<em>1 - µ</em>2µ<em>1−µ</em>2) for Independent Samples

Inferences About the Difference Between Two Population Means for Independent Samples: σ<em>1σ<em>1σ<em>1 and σ</em>2σ</em>2σ</em>2 are known

Example 1

Inferences About the Difference Between Two Population Means for Independent Samples: σ<em>1σ<em>1σ<em>1 and σ</em>2σ</em>2σ</em>2 are Unknown

Inferences About the Difference Between Two Population Means for Independent Samples: Unknown but Equal Standard Deviations

Example 2

SOLUTION TO EXAMPLE 2

Inferences About the Difference Between Two Population Means for Independent Samples: unknown and Unequal Standard Deviations

EXAMPLE 3

SOLUTION TO EXAMPLE 3

Inferences About the Difference Between Two Population Means for Dependent (or Matched Paired) Samples

EXAMPLE 4

Summary Table

SOLUTION: EXAMPLE 4

Inferences Concerning Variance

Hypothesis Test Concerning Two Variances

F - Distribution

F-Distribution Table

A One-Tailed F-test of the Equality of Two Variances

A One-Tailed F-test of the Equality of Two Variances - Decision Rules

A Two-Tailed F-test of the Equality of Two Variances

A Two-Tailed F-test of the Equality of Two Variances - Decision Rules

Example

F Test: Example Solution

Two-Sample Tests in EXCEL

Inferences About the Difference Between Two Population Means ( $µ<em>1 - µ</em>2$ ) for Independent Samples

Inferences About the Difference Between Two Population Means for Independent Samples: $σ<em>1$ and $σ</em>2$ are known

Inferences About the Difference Between Two Population Means for Independent Samples: $σ<em>1$ and $σ</em>2$ are Unknown