1/11
Topic 7
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
The F distribution
A continuous distribution
Cannot be negative
Asymmetric - POSITIVELY skewed
Is family of distributions with two degrees of freedom parameters, one in the numerator, one in the denominator
Asymptotic - the curve approaches the x-axis but doesn’t reach it
Equal variances test - two assumptions
Populations are normally distributed
The level of measurements is interval or ratio in order to calculate the variances
Also typically run two-tailed tests, as one tailed-tests not very useful
Equal variances test - steps
Step 1: set the hypotheses:
H_0:\sigma_1^2=\sigma_2^2
H_1:\sigma_1^2\ne\sigma_2^2
Step 2: Select the significance level
Step 3: Select the appropriate test statistic (F statistic) using:
F=\frac{\sigma_1^2}{\sigma_2^2} - where the larger sample variance goes in the numerator - meaning only large values of the F statistic reject the null
Step 4: Formulate the decision rule - we reject the null hypothesis if the test statistic is in the critical region (F > Fcrit)
Step 5: Make your decision - find F and the critical value from the tables, and make conclusion within context
ANOVA - analysis of variance
Enables us to test the equality of several means simultaneously
Previously z/t tests only allowed us to compare two means simultaneously
Assume the sampled populations must follow a normal distribution, have equal standard deviations, and the samples are randomly selected and independent
ANOVA - steps
Step 1: The null hypothesis is that the population means are all the same, and the alternate hypothesis is that at least one of the means is different
H0: =\mu_1=\mu_2=\mu_3=\mu_4
H1: the means are not all equal
Step 2: Choose significance level which is usually \alpha = 0.01
Step 3: Use the F distribution (table) to find Fcritical value
Step 4: The decision rule is to reject the null hypothesis if the F statistic is in the critical region of the F distribution (if the computed F value is LARGER than Fcritical value we can then REJECT null hypothesis)
The degrees of freedom in the numerator = the number of populations sampled (k) minus 1
The degrees of freedom in the denominator are the total number of observations (n) minus k
Step 5: Compute the test statistic and make a decision using
F=\frac{\frac{SST}{\left(k-1\right)}}{\frac{SSE}{\left(n-k\right)}} - SST: treatment sum of squares, and SSE: error sum of squares
Calculate SSTotal (each individual xvalue - overall mean) and SSE (each individual xvalue - mean for that particular sample the value is in)- then you can get SST by doing SSTotal - SSE (easier)
Analysis of variance - sum of squares
We need to calculate 3 suANOVA, m of squares terms:
Total sum of squares: SSTotal = SST + SSE - Sum of squares of all deviations from the overall averageSSTotal=\Sigma\left(x_{i}-\overline{x}_{G}\right)^2
Treatment sum of squares: SST - Sum of squares of all deviations of factory means from the overall average - add/multiply n (sample) number to the front of each ()2 brackets which is each individual mean - the overall mean
Random/Error sum of squares: SSE - This is the sum of squares of deviations within each factory from the factory average - sum of all ()2 which includes each observation minus the mean for that particular sample
Mean squared error formula
MSE = SSE/n-k
WHERE Random/Error sum of squares: SSE - This is the sum of squares of deviations within each factory from the factory average - sum of all ()2 which includes each observation minus the mean for that particular sample
Confidence interval for finding the DIFFERENCE in treatment means
We may know there is a difference but we don’t know which ‘treatment’/sample is different
However ONLY do this test if the null hypothesis is REJECTED
Use formula:
\left(\overline{x}_1-\overline{x}_2\right)\pm t_{\alpha,n-k}\sqrt{MSE\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}
x1 and x2 = means of first and second samples
t_{\alpha,n-k} - critical value obtained from t-distribution tables for the required alpha with n-k degrees of freedom - USING entire sample size for n
Then if the confidence interval INCLUDES 0, the difference between means is NOT statistically significant
Therefore if both ends of the confidence interval have the same sign e.g. both negative then we can conclude these treatment means differ
Blocking variable
A second treatment variable that when included in the ANOVA analysis will have the effect of reducing the SSE term
Formula:
SSB=k\!\sum_{i=1}^{b}\left(\overline{x}_{i}-\overline{x}_{G}\right)^2 - k = number of treatments, b = number of blocks, xG = overall/grand mean
e.g. A test to compare the mean travel time on 4 different routes from point A to B - we can assume differences in travel time are either due to the routes or random
But if we then get 5 different drivers to drive each route - the drivers would be the blocking variable
Then find SSE = SSTotal - SST - SSB
Should be lower than SSE before adding blocking variable
Two-factor experiment
Conducting a hypothesis test for the difference in BLOCK means
First we conduct test for treatment means first:
H0 = block means are all equal (write with mew)
H1= the block means are not all equal
use k-1 for degrees of freedom in numerator
(b-1)(k-1) for denominator
Find critical value, and F statistic (using F = MST/MSE)
MST = SST/(k-1) AND MSE = SSE/[(k-1)(b-1)]
Decide if reject if F > Fcritical value
Then conduct test for block means
use b-1 for degrees of freedom in numerator
and (b-1)(k-1) for denominator
Find new critical value and F statistic (using F = MSB/MSE)
MSB = SSB/(b-1)
Decide if reject if F>Fcritical value - then conclude
Interaction effect
The effect of one factor on a response variable differs depending on the value of another factor
E.g. differences in mean travel time may depend on the COMBINED effect of driver and route
If means are plotted with NO interaction effect the lines would be parallel
Testing for interaction effect
First find and plot the means for each driver/route (the two factors) combination
If lines are NOT parallel there are likely interaction effects
Need a hypothesis test to see if observed interactions are SIGNFICANT
H0: There is no interaction between drivers and routes
H1: There is interaction between drivers and routes
Use ANOVA With Interaction Table - and use Excel to find p-value - to know if we reject null hypothesis
If there is interaction present test for differences in factor means using a one-way ANOVA for each level of the other factor
Then conduct a one-way ANOVA for each route