Unit 7: hypothesis testing

0.0(0)

Studied by 1 person

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/24

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

25 Terms

New cards

Hypothesis tests for a population mean (o known)
Hypothesis, test statistic, P-value, statistical significance
Two sided tests and confidence intervals

New cards

A hypothesis test has different goal than confidence intervals

New cards

Parameter of interest:

& evidence

u, the true population mean

Our goal is to determine where there is strong enough evidence to support this claim
Our evidence comes from sample data
- Our evidence is x-bar, the sample mean

New cards

Claims

On population mean, u

New cards

We can never prove that a claim is correct

We can’t know for sure that u has any particular value without actually calculating it
But we try to reach our conclusions with a reasonably high probability of being correct

New cards

“True mean”

Population mean

New cards

“If the True mean daily vitamin C intake is 75 mg, then what is the probability of observing a sample mean at least as low as 73mg?”

We assume that the population mean really is 75 (u = 75),
Then we know X ~ N → X-bar ~N(u, o/sqrtn
P(X-bar <= 73) = P(Z <= (73 - u⁰)/(o/sqrtn)

so if the True mean daily vitamin C intake of all female Canadians was 75mg, the probability of observing a sample mean at least as low as 73mg would be 30.85% (“observation is likely to occur again”)

So our evidence that u < 75 is not very strong!
Therefore the evidence is insufficient to support the doctors claim
IMPORTANT! We would need a sample mean x-bar even lower than 73mg to be convinced that u < 75,
- 30.85% chance that 73 is the mean therefore u could be … what?? Huh??? probability needs to be lower.
- We are NOT concluding that u =75, and that the doctors claim that u < 75 is wrong
- We just don’t have strong enough evidence for us to be convinced that u < 75

New cards

u⁰

assumed value of u

New cards

… “Is it possible that the True mean speed at the intersection really is 60km/h, and we observed a sample mean as high as 66km/hr purely by chance?”

We need to ask: if the True mean speed of motorists at the intersection is 60km/hr, then what is the probability of observing a sample mean at least as high as 66km/hr?
→ IMPORTANT NOTE: we look for P(X-bar > 66) (because a number in continuous variables have zero chance of happening.)
If The probability is low, then we have strong evidence in support of the parent councils claim
- If it is unlikely we observed a sample mean this high purely by chance, then it’s reasonable to conclude that the True mean u really is higher than 60
  - → if possible (1%), then True mean is new mean?
If the probability is high, then we have weak evidence in support of the parent councils claim
- If it is likely that we observes a sample mean this high purely by chance, then this evidence is not strong enough to conclude that the True mean u is higher than 60.

If we assume that u = 60, then by central limit theorem, we know

X-bar ~dot N(60, 15/sqrt50)
…0.0023
So if the True mean speed of vehicles at this intersection was 60 km/hr, then the probability of observing a sample mean speed at least as high as 66 would only be 0.23%
So observing a sample mean as high as we did is extremely unlikely
In other words: if u = 60, then observing a sample mean at least as high as 66 is extremely surprising
So our evidence that u > 60 is very strong!
So there is sufficient evidence to support the parent councils claim. It is reasonable to conclude that u > 60, and so a red light camera is installed.

New cards

There is always a possibility that we will be wrong in our conclusion

Maybe the True mean really is 60, and we just happened to have an exceptional sample of unusually fast vehicles!!! The probability of this happening is low (0.23%), but not impossible!
However we are able to conclude in favour of the councils claim (that u > 60) with a high level of certainty

New cards

The foundation and main idea lf hypothesis testing can be summarized as follows:

“If our initial assumption were true, then how likely would it be to observe an estimate this extreme?” , and
“An outcome that would rarely occur if an assumption were true, is good evidence that the assumption is not true”

We are now ready to formalize the process of hypothesis testing, with new vocabular

New cards

Alternative hypothesis

The statement making the claim we are trying to support is called the alternative hypothesis, denoted Ha.
In this unit, the alternative hypothesis is always an inequality in terms of u. this could look like:
- Ha: u > u0, or
- Ha: u < u0, or
- Ha: u /= u0
a hypothesis test is assessing the evidence in favour of the alternative hypothesis… what
e.g. Ha: u < 75

H0 → null hypothesis. the statement being tested in a hypothesis test

this is our initial assumption
in this unit, the null hypothesis is always an equality in terms of u.
- H0 : u = u0
H0 is always a statement of “no difference“ or “no effect“
the hypothesis test is assessing the strength of the evidence against the null hypothesis
whart
e.g H0: u = 75

New cards

if we assume that the null hypothesis (H0) is true, then the probability of observing a sample mean x-bar at least as high/low/extreme as the one observed is called the P-value of the test

high if Ha : u > u0
low if Ha : u < u0
extreme if Ha : u /= u0

a low P-value means we have strong evidence against the null hypothesis/ in favour of the alternative hypothesis

.3085 is not a low enough P-value to reject H0 (u = 75) and to conclude that u < 75 (i.e conclude in favour of Ha)
in the speeding vehicle example, the P-value was 0.0023. we concluded this was low enough to reject our initial assumption that u = 60 (reject H0) and conclude that u > 60 (i.e. conclude in favour of Ha)

Q: how low does our P-value need to be to reject the null hypothesis in favour of the alternative hypothesis

A: it depends! before we perform a hypothesis test, we choose the level of significance of our test, denoted a(fish a)

if the P-value is less than or equal to “fish”, then we will reject H0 in favour of Ha
if the P-value is greater than fish, then we fail to reject H0
thus fish is the maximum P-value for which the null hypothesis will be rejected
low fish →we need stronger evidence to reject null hypothesis

New cards

level of significance (in irl…)

if something is “high stakes“, then you require strong evidence: choose a low value of fish →e.g. meds & vaccines
if something is “low stakes“, then you don’t need evidence to be that strong: higher value fish is okay

… @ course, significance level is given…

common values of fish are 0.10(10%), 0.05(5%), and 0.01(1%)
we rarely would use a value higher than 0.10 → (for the same reason that we almost never use a confidence level less than 90%)

New cards

hypothesis test steps, “P-value method“

state lvl of significance (fish) >°))))彡
statment of hypothesis, H0 and Ha
statement of the decision rule (also known as the rejection rule)
- the decision rule (rejection rule) is the precise statement of what must happen in order for rus to reject the null hypothesis
  → for this method, decision rule is always “reject H0 if P-value <= >°))))彡“
calculation of the test statistic
- the test statistic provides a measure of the compatability between the null hypothesis and our data
- in this unit, our test statistic will always be Z:
  →z = (x-bar -u0)/(o/sqrtn)… z-score…
calculate the P-value
conclusion

New cards

hypothesis testing (P-value mehod) example (test of significance ig)

perform a hypothesis test with a 5% level of significance for the car intersection example

solution:

state the level of significance
- let >°))))彡 = 0.05
- interpretation: “we will be willing to conclude in favour of the council (i.e. that u > 60) only if the P-value is less than or equal to 0.05“
statement of hypotheses
- H0 : u = 60 vs Ha : u > 60
- in words:
  - H0: “The true mean speed at the intersection is equal to the posted limit, and no red light camera is needed.”
  - Ha: “The true mean speed at the intersection is greater than the posted limit, and a red light camera is needed.”
statement of the decision rule
- reject H0 if the P-value <= (fish, significance level) = 0.05
calculation of test statistic
- z score using (x-bar - u0)/(o/sqrtn)
- e.g. z = 2.83
- “so he sample mean that we observed is 2.83 std. deviations above our assumed population mean“
calculate P-value
- P-value = P(Z >= 2.83)
  - = 1-P(Z < 2.83)
  - = 1 - 0.9977
  - = 0.0023
- interpretation: ”if the true mean speed of vehicles at this intersection was 60, the probability of observing a sample mean at least as high as 66 would be 0.0023.”
conclusion
- since the P-value = 0.0023 < significance level = 0.05, we reject the null hypothesis in favour of the alternative. At a 5% level of significance, we have sufficient statistical evidence to conclude that the true mean speed of motorists at the intersection is greater than 60km/hr

we call this example a right-sided test, since our alternative hypothesis is of the form Ha: u > u0

New cards

statistical significance

results that lead to the rejection of a null hypothesis are said to be statistically significant. yar
- statistical significance is an effect so large that it would rarely occur by chance alone
- for this reason, statistical hypothesis tests are also referred to as tests of significance. yar

New cards

hypothesis test:

perform a hypothesis test for the vitamin C example. use fish = 0.01

state the level of significance
- let fish = 0.01
- interpretation: “we will be willing to conclude in favour of the doctors claim only if the P-value is less than or equal to 0.01” i.e. that u < 75
statement of hypotheses
- H0: u = 75 vs Ha : u < 75
- H0: the true mean daily vitamin C intake efor all Canadian females is equal to the recommended amount of 75mg
- Ha: the true mean daily vitamin C intake for all Canadian females is less than the recommended amount of 75mg.
statement of the decision rule
- reject H0 if P-value <= fish = 0.01
- “*notice: the decision rule is always the same! the only change is the value of 2”

calculation of test statistic
- z = (x-bar - u0)/(o/sqrtn)
- z = -0.50
- so the sample mean we observed is 0.50 std. deviations below our assumed population mean
calculate the P-value
- P-value = P(Z <= -0.50) = 0.3085
- “if the true average daily vitamin C intake for Canadian females was 75mg, the probability of observing a sample mean at least as low as 73mg would be 0.3085”
conclusion
- since the P-vlaue = 0.3085 > significance level = 0.01, we fail to reject the null hypothesis. at the 1% level of significance, we have insufficient statistical evidence that the true mean vitamin C intake for Canadian females is less than 75mg
- IMPORTANT! note: the conclusion has a very specific format you must follow

left-sided test, as the alternative hypothesis is the form of Ha: u < u0

we do not conclude that u = 75

notice we never conclude H0 is true. All we can say is that we do not have enough evidence to reject the null hypothesis
“NEVER SAY “ACCEPT H0“!!!” whart

New cards

P-values for left and right-sided tests

the only significant difference between left- and right-sided hypothesis tests are the alternative hypothesis, and how we calculate the P-value
a sample rule for remembering the difference between calculating the P-value for left- and right-sided tests:
- the direction of the inequality in the P-value is the same as the direction of the inequality in the alternative hypothesis
- Ha: u < u0 >= P-value = P(Z < test stat.)
- Ha: u > u0 >= P-value = P(Z > test stat.)

New cards

IMPORTANT!: template for conclusion (P-value method)

since the P-value = (Ha) (</>) sig. level, we (reject/fail to reject) H0
at the (#) level of significance, we have (sufficient/insufficient) evidence that (Ha) is true

New cards

two-sided tests - whart

the last type of hypothesis test well look at is a two-sided test. this is a hypothesis test where the alternative hypothesis is of the form:
- Ha : u /= u0
in this case, the P-value is the probability of observing a value of the sample mean at least as extreme (in either direction), (given that the null hypothesis is true)
- “at least as extreme“means “at least as far away from our hypothesized population mean, u0 “

we find the P-value by doubling the probability of the Left or right of our test statistic z (whichever probability is lower, i.e. whichever side is the “tail“)
(double left or right to find P-value)

z = test statistic

e.g. bottling company, underfilled/ overfilled

New cards

two sided hypothesis test: bottling comapany

stement of level of significance
- let level of significance = 0.05
statement of hypotheses
- H0 : u = 500 vs Ha : u /= 500
- H0: the true mean volume of root beer in all bottles in the shipment is 500 ml
- Ha: the true mean volume of root beer in all bottles in the shipment differes from 500ml
statement of the decision rule
- reject H0 if P-value <= sig level = 0.05
calculation of test statistic
- z = (x-bar - u0)/(o/sqrtn)
- z = 1.81
calculate the P-value
- 2P(Z >= 1.81) = 1(1 - P(Z < 1.81))
- = 2(1 - 0.9649)
- = 2(0.0351)
- = 0.0702
- if the true mean fill volume was 500ml, the probability of observing a sample mean at least as extreme (as far away from u0 = 500) as 502ml would be 0.0702
conclusion
- since the P-value = 0.702 > sig. level = 0.05, we goal to reject H0. at the 5% level of significance, we have insufficient evidence that the true mean of all bottles in the shipment differs from 500ml.

New cards

template for P-value Interpretation

“if the (null hypothesis) were true, the probability of observing a sample mean at least as high/low/extreme as (sample mean observed) would be (P-value)“
high = for right-sided test
low = for left-sided test
extreme = for 2-sided test

New cards

P-value calculations

to summarize: the P-value for testing H0: u = u0 vs:
- Ha: u > u0 is P(Z > z)
- Ha: u < u0 is P(Z < z)
- Ha: u /= u0 is 2P(Z > |z|)
  - disregard |z| if you don’t know. just remember the P-value is double the tail area
these P-values are exact if the population is normal, and approximate for large sample sizes in other case (by the CLT)

New cards

verifying results

notice that in the root beer example, the company doesn't want to see the null hypothesis be rejected! →maybe take another sample? or make changes to the fish?
…. no! unethical. already chose the sig. level. already got the population mean and sample mean, std. dev, normal distribution. easy.