1/95
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
when do you use pooled proportion
two proportion hypothesis testing
two proportion confidence interval
use CI formula for two proportions
two proportion hypothesis testing
use p-pooled and z score formula
Minimum sample size
formula with n
Rounding with minimum sample size formula
Round up
Basics for one proportion hypothesis test
population parameter: p
Point estimate: p-hat
Conditions
Majority problems
Special kind of proportion test
Test for majority problems (more than half
No majority (even): null = 0.5
Majority =/ 0.5
Steps for hypothesis testing
Identify variables, hypothesis, claim, alpha
Check conditions
Find test statistic/zscorei, area, p-value, sketch
Compare p value to alpha, conclusion
Important note
Reject null hypothesis = significant difference between null and sample => STATE VALUE HIGHER OR LOWERRR (NOT JUST DIFFERENT)
how to interpret confidence interval
we are % confident that x%-x% of ____
Proportions are for
Categorical data
Proportions are written as
%, decimal, or fraction (x/n)
Sample proportion is called
P-hat
What is p-hat written as
%, decimal, Or fraction
P-hat is written as
^
P
P-hat without hat is
Population sample
Sample stats estimate
Population parameters
Sample stat is also known as
Point estimate
Sample stat should be ____ to pop parameter
very close
Sampling distribution
Average of all p-hats (point estimates)
Standard error
Variation or standard deviation in point estimates
SE vs SD
SE= categorical
SD= numerical
can a graph be curved for categorical data
No— not continuous data
Larger sample size = ___ SE
Smaller
Central Limit Theorem**
If observations are independent and sample size is sufficiently large, sample proportion will be nearly normally distributed (mean = p aka population proportion)
How to verify independence
Random sample less than 10% of pop
How to verify sufficiently large sample
Must meet success failure condition
(Np>=10) and n(1-p)>= 10
If p is not known, you can use
P-hat
Confidence interval
range of plausible values where we are likely to find population parameter
CI written as
(60%, 70%)
What do you need in confidence interval
Parenthesis!!
More commonly used confidence intervals
A- 90% = significance level, 1-0 confidence level
Leftover = alpha
95%
99%
Cutoff scores for critical values
90%+-1.65
95% +-1.96
98$ +-2.33
99%+2.58
(Given)
How to calculate confidence level
Point estimate +- ME
Or x* points estimate +- z* (SE)
what do you use to calculate uncertainty of point estimate
standard error
variables in SE formula
n= sample #
x= observed stat
p= point estimate/sample state
how to get point estimate from confidence interval problem
average it
how to get margin of error
distance from middle to endpoint
steps to solve confidence interval question
check conditions
find point estimate/sample stat (aka phat) and z score (on chart)
calculate with formula
put into words
parameter is also known as
population proportion
success failure condition
np>10 and (1-p)n>10
point estimate
sample value use to estimate population parameter
example of point estimates
sample mean, sample proportion, sample standard deviation
error
difference between observations and the parameter
when conditions are not met distribution is
discrete (not continuous)
skewed
why do we need a confidence interval?
cuz point estimates will likely not exactly hit population proportions
margin of error formula in confidence interval
z*+-SE
why do you need to check conditions
to make sure distribution will be near normal
What is statistical hypothesis testing?
Decision making process for evaluating claims mathematically
Distinguish between results that easily occur or are unlikely
What is a hypothesis
Claim or conjecture that may or may not be true
Two types of hypotheses in a test
Null and alternative
Null hypothesis
Always =, no difference, no change, status quo, written first (on top), H0
Alternative hypothesis
Inequality: not equal, difference or change written second, Ha
Example of null
Drug does not work
Example of alternative hypothesis
Drug works, does decrease level of depression!
do we like null or alt hypothesis in real world?
Alt because we want to see changes
Setup for proportions
H0: p=#
Ha: p=/ #
**number should be the same
what test is used for setup for proportions
Two tailed test
Two possible decisions
Reject the null hypothesis or fail to reject the null hypothesis
reject the null hypothesis
Not the same (significant difference)
fail to reject null hypothesis
Data is not convincing, no sig difference
Testing hypothesis using confidence intervals
Confidence interval: 95% sure real result will be captured in interval
So if result is within confidence interval, null hypothesis is TRUE (fail to reject)
Decision errors
Hypothesis tests are NOT flawless
Type 1 error
Reject null hypothesis when H0 is true
Type 2 error
Failing to reject null hypothesis when it is false
Alpha
probability of making a type 1 error (reject null when it is true)
If making type 1 error (null is really true) is dangerous, choose a
Small significance level (a=0.01) and be careful about rejecting null hypothesis aka demand strong evidence before rejecting null
If type 2 error (alternative is true) is dangerous, choose
Higher significance level (0.10) and be cautious about failing to rject H0 when null is actually false
Hypothesis testing using z score and p value
identify parameter, list hypothesis, identify significance level, identify p-hat and n
check conditions
calculatse z score and identify p value
conclusion (compare p value to alpha)
P value
Probability value is z-score area TIMES 2 because it is on both left and right side
If p value < a
Reject a
If p value > a
Fail to reject H0
Let alpha be ___ if not given
.05
why should we use p-value to test hypothesis instead of CI
CI is not always sustainable cuz confidence interval cannot always be constructed
interpretation for p-value
If the null hypothesis is true that [add context], then the probability
of getting our sample proportion of [context] or even more extreme
is
[p-value], which is [highly unlikely or plausible]
interpretation of inference
there is convining evidennce to reject the claim that there is no difference in the % of 1st and 2nd years that prefer JFK. Out data shows that more 2nd years like JFK
IF observations are independent,
sample size is
sufficiently large
parameter being estimated
proportion vs mean
standard error vs margin of error
margin of error is used to calculate margin of error, standard error measuresr variability
significance level vs confidence level
significance = alpha
confidence = xx% confident that blah blah
p-value is the
probability
of observing data at least as
extreme as the one found in
our sample data set, if the
null hypothesis is true
degrees of freedom in goodness of fit test
number of categories minus 1
Chi-square looks like
X²
Chi Squared is
right skewed
Values are positive because they are counts (how many people)
Categorical data
One parameter: degrees of freedom
Degrees of freedom are represented by
df
Goodness of Fit Test
Does sample fit population?
Does it represent population?
Do the observed counts equal the expected counts?
Conditions for chi square
Independent observations
E (expected) count is at least 5 in each category
Degrees of freedom at least 2
Setting the hypothesis for chi square s
NO SYMBOLS OR PARAMETERS => use sentences
General setting hypothesis format
H0: Observed counts follow same distribution as the expected counts (O=E)
HA: Observed count do not follow the same distribution as expected counts
Findings areas under the chi square curve
P-value = tail area under chi square distribution
Top row = p values, alpha at top
Two tests chi squared is used for
Goodness of fit: compares population with sample based on one characteristic
Independence (“related to“) : tests for relationship between 2 characteristics
How many variables are there in any given contingency table?
TWO (not 20 if 5 by 4 okok)
General format for hypothesis in chi square
H0: variables are independent (not related)
HA; variables are not independent (are related)
Observational study
Association or relationship does not mean CAUSATION (experimentation)