1/15
Topic 6
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Hypothesis
Statement about the value of a population parameter that is subject to verification
Hypothesis testing
A procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement
One and two sample testing
One: One sample against population
Two: Two samples against each other
Three assumptions needed to conduct one sample hypothesis testing
Random sampling is employed
Level of measurement is interval or ratio - in order to calculate the mean
Sampling distribution is normal - we can be sure of this if the sample size is large enough - as per the central limit theorem
Null and alternate hypothesis
Null: Statement about the value of a population parameter developed for the purpose of testing numerical evidence - equalities are always part of the null (=, \le,\ge ) - NEVER use the word accept for a null hypothesis must say ‘fail to reject the null hypothesis’
Alternate: Statement that is ACCEPTED if the sample data provides sufficient evidence that H0 is false - inequalities are always part of the alternate (\ne,<,> )
Always assume H0 is true
Null hypothesis is NOT the same as a research hypothesis
Steps for one sample hypothesis test
State null (H0) - and Alternate (H1) hypothesis
Select a level of significance = the probability of rejecting the null hypothesis when it is true
Typically choose \alpha=0.05 (same as 95% confidence level for CI)
Select the test statistic - A value determined from sample info used to decide whether to reject the null hypothesis
If the population standard deviation is known we use standard normal distribution (z)
If population standard deviation is unknown but the sample s large s is used to substitute population s.d. and we still use standard normal distribution (z)
If the population standard deviation is unknown and the sample is small we use the t-distribution - use table with degrees of freedom for 0.05 sig. level to get t value - which we use to decide if we reject H0
Formulate the decision rule - involves determining the rejection area of the sampling distribution of the test statistic e.g. find the cut off values where 5% of the area under the distribution is in the tails - or for two-tailed test 2.5% in each tail If test statistic falls in the rejection region - below 0.05 for one-tail, and below 0.025/above 0.975 for two-tail we reject H0
Make a decision - and state conclusion e.g. on average there is/isn’t a statistical different… in context - and interpret results
Critical value
The dividing point between the region where the null hypothesis is rejected and the region where it is not rejected
p-Value + how to find
The probability of observing a sample value as extreme as, or more extreme than the value observed, given that the null hypothesis is true
Use distribution (z/t) in reverse - compare the z/t statistic we calculated with the table to find the associated probability value - usually p values calculated for a two-tailed hypothesis
The lower the p-value is the more confident we can be in rejecting the null hypothesis e.g. if p-value is less than 0.001 we have extremely strong evidence that H0 isn’t true compared to if the p-value is less than 0.10 we have some evidence that H0 isn’t true
Two sample hypothesis test
Aims to find if there is a significant difference between two sample means
Same statistical principles as in one sample testing but instead of population data need data from each sample
Three assumptions needed to conduct two sample hypothesis testing
Two independent random samples are used
Level of measurements is interval/ratio - in order to calculate the mean
Sampling distribution is normal - we can be sure of this if the sample size is large enough as per CLT)
Steps for two sample hypothesis test
State null hypothesis (H0:\mu_1=\mu_2) AND alternate hypothesis (H1:\mu_1\ne\mu_2)
Choose the level of significance - typically\alpha = 0.05
Choose the test statistic - use this formula to find z:
z=\frac{\left(\overline{x_{}}_1-\overline{x}_2\right)}{\sqrt{\left(\frac{\sigma_1^2}{n_1}\right)+\left(\frac{\sigma_2^2}{n_2}\right)}} - where we assume H0 to be true for the test
Formulate the decision rule - for alpha = 0.05 the critical value is ±1.96
Make a decision e.g. if z > 1.96 we can reject H0
Two sample hypothesis test with unknown SAME INDEPENDENT sample standard deviations
Like the one sample case we substitute the sample standard deviation for the population standard deviation
Need to pool sample variances using formula
s_{p}^2=\frac{\left(n_1-1\right)s_1^2+\left(n_2-1\right)s_2^2}{n_1+n_2-2}
Then use t-statistic as follows:
t=\frac{\overline{x}_1-\overline{x}_2}{\sqrt{s_{p}^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}
If t value falls in critical region we CAN reject H0
Two sample hypothesis test with unknown DIFFERENT INDEPENDENT sample standard deviations
Use t-statistic formula:
t=\frac{\overline{x}_1-\overline{x}_2}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}
But we adjust the degrees of freedom downward (increasing uncertainty)
Using this formula
df=\frac{\left\lbrack\left(\frac{s_1^2}{n_1}\right)+\left(\frac{s_2^2}{n_2}\right)\right\rbrack^2}{\frac{\left(\frac{s_1^2}{n_1}\right)^2}{n_1-1}+\frac{\left(\frac{s_2^2}{n_2}\right)^2}{n_2-1}}
Round down d.f. to be on the safe side - then find critical value for significance level
If t value is less than critical value from degree of freedom table we can reject H0
Two sample hypothesis test with DEPENDENT samples
Find t value using
t=\overline{\frac{d}{\frac{s_{d}}{\sqrt{n}}}} where \overline{d} is the sample mean difference in the pair of related observations, sd is the standard deviation of these differences and n is the number of paired observations
sd is found using formula:
s_{d}=\sqrt{\frac{\Sigma_{i=1}^{n}\left(d_{i}-\overline{d}\right)^2}{n-1}}
Then if t value is higher than sd we can reject H0
Errors in hypothesis testing
Type 1 error: incorrectly rejecting H0 when it is actually true e.g. might occur when we choose sig. level of 0.05 instead of 0.01
Type 2 error: incorrectly failing to reject H0 when it is actually false - chance of this is higher as sig. level decreases e.g. 0.001 instead of 0.01
Probability of making a type-2 error
Identified by Greek letter beta \beta
Use formula to find z value
z=\frac{\overline{x}_{c}-\mu_1}{\frac{\sigma}{\sqrt{n}}}
Then find the area between 0 and the z value on the z table
And then use formula
\beta=0.50- z table value