1/76
Sadistic Torture Across Two Semesters
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Key to all letters used in stats?
p = population proportion, yeses / total population. Usually a decimal like 0.6. → alternatively, in 1-PropZTest (hypothesis test), it represents p-value.
p̂ = sample proportion, yeses / sample size. Also called p-hat. Usually a decimal like 0.6
x = amount of “yeses”
n = total sample size
Po = assumed population proportion in the null hypothesis. This is what you guess before you do the experiment.
H0 = null hypothesis
HA = alternative hypothesis
μ = mean
x-bar = sample mean
σ = population standard deviation
s = sample standard deviation
α = significance level
What is a population?
Group of individuals we wish to study (eg. all students at WVC).
What is a parameter?
Either a population proportion or population mean. Represented by p (proportion) or μ (mean). eg. proportion of all WVC students who work part-time.
What is a census?
Example of surveying entire population. However, this is usually unrealistic because the population is too big.
What is a sample?
Collection of individuals taken from population of interest.
What is a statistic?
Number of “yeses” / sample size. You have to have calculated something from the sample, like median or sample proportion. Represented by p-hat.
Which value can never be found, statistic or parameter?
Parameter. You can find the value of a statistic by collecting data, but you can only make inferences about the parameter (generalizations about the whole population).
What is sampling bias?
You get a sample that doesn’t accurately represent the whole population. eg. you survey a very opinionated population
What is voluntary-response bias?
Sampling bias where people only respond if they feel strongly about the results (miku fans will respond to surveys about how much they like miku. non-miku fans won’t fill it out at all)
What is nonresponse bias?
Sampling bias where people asked to do the survey refuse to fill it out. (might ask for uncomfortable info.)
What is measurement bias?
Survey questions do not produce true answers b/c confusing wording or misleading questions
What questions can you ask to determine if a survey is biased?
What % of people who were asked to participate actually did so?
Did the researchers choose people to participate, or did the people themselves choose to participate?
Did the researcher leave out whole segments of the population who are likely to answer the question differently from the rest of the population? (eg. only survey northern Californians, not southern ones)
How do you get a random number from the calculator?
assign a number to each and every member of the population
MATH → PRB → 5 → randInt (1, population size, amount of results you want)
pro: minimizes bias as long as individuals are selected without replacement (ie. don’t choose the number 3 twice, skip numbers that appear twice)
What is the difference between accuracy and precision?
Accurate → how close you are to the target value → measured by how unbiased you are → fixed by getting a random sample
Precise / variation→ how close the values are to one another → measured by size of standard error (the smaller the better) → fixed by getting a larger sample
What is a sampling distribution?
Take the mean of a bunch of samples. Then graph those means as a normal distribution. This is the sampling distribution of that sample statistic.
How do you calculate the population proportion based on the sampling distribution?
Mean of sampling distribution (p-hat, the sample proportion) is equal to population proportion.
How do you calculate shape, center, and spread for sampling distribution?
Shape → check CLT. If all 3 conditions check, shape is Normal and you can make inferences. Else, can’t use normalcdf to calculate probability, stop calculating.
Center → mean.
Mean of sampling distribution = population proportion
Mean of sample statistic = population parameter
Spread → standard error → √[(p(1-p) / n] where p stands for population proportion AKA what % of the sample is “yes” (I put this on the TI-84 under PRGM 5: STDERROR)
Normal distribution vs. sampling distribution
Normal distribution → mean at μ (population mean) → axis on x (sample mean) → calculate standard deviation
Sampling distribution → mean at P (population proportion) → axis on p-hat (proportion of yeses) → calculate standard error → way smaller standard deviation, centered at same number as normal dist. though
Standard deviation vs. standard error?
Standard deviation = variation of your sample around the mean
Standard error = you take the mean of a bunch of samples. Then you calculate the standard deviation of those means.
Criteria for CLT (Central Limit Theorem) for population proportions?
CLT tells you if distribution is Normal, if it is, only then can you run tests on it
Random sample
Large sample (at least 10 yeses and 10 nos)
Large population (at least 10x sample size)
When do you use confidence interval vs. hypothesis test?
Confidence interval when you don’t know the past value or estimate for population proportion
Hypothesis test if you already know the past value
Note that confidence intervals give more info that hypothesis tests b/c tell you BOTH if a parameter could be that specific value AND gives a plausible range of values. Hypothesis test only tells you whether or not there’s significant evidence to prove you right.
Point estimate vs. interval estimate
Point estimate = single number, like a sample proportion or mean, that is our “best initial guess” for the parameter
Interval estimate = interval of numbers within which the parameter value is believed to fall
What is a confidence interval?
Interval containing the most plausible values for a parameter. Written like (point estimate) ± (margin of error).
How do you calculate confidence interval on a calculator?
Verify CLT:
Random sample
Large sample (at least 10 yeses and 10 nos)
Large population (at least 10x sample size)
STAT → TESTS → A: 1-PropZInt
x = number of yeses
n = sample size
c-level = confidence level
report the interval and write a sentence interpreting the interval (We are ___ % confident that the population proportion of all ____ that _____ is between ___ % and ___ %).
What is margin of error for population proportions?
z-score * standard error
Multiply the following SE by z-score to get margin of error:
99% confidence level = 2.58 standard errors
95% confidence interval = 1.96 (shortcut method: 2) standard errors
90% confidence interval = 1.645 standard errors
80% = 1.28 standard errors
How do you know if confidence interval means the majority likes something?
ENTIRE confidence interval has to be over 50% / 0.5. Basically, even if it’s (0.49, 0.51) the majority isn’t true because 0.49 < 0.50.
What is a confidence level?
Probability that the confidence intervals created with this process contain the true parameter. Basically: if I create a bunch of confidence intervals, what % of them capture the true value?
Does NOT apply to a single confidence interval. That one either captures the true value (100%) or doesn’t (0%).
Confidence level is a number chosen to be close to 1, most commonly 0.95.
confidence level * # of intervals = # intervals accurately captured
Correlation between confidence level, margin of error, and sample size?
Increase confidence level → increase margin of error and increase width of confidence interval → decrease sample size
Decrease confidence level → decrease margin of error and decrease width of confidence interval → increase sample size
If you know you want a certain margin of error, how do you find what sample size you need?
sample size n = (z*/m)² * ¼ where z is the critical number from the critical value table (search z-score in this flashcard deck)
Always round up to the nearest whole number EVEN IF decimal is small (eg. 1.01 → 2)
Equation for sample size is registered as a program in the TI-84 under PRGM
What is the sample size short-cut formula?
Special case: you want a 95% confidence level so you can suppose critical value Z* ~ 2 (in reality it’s 1.96)
Short-cut formula = n = 1/m² where n is margin of error you want and n is sample size needed to get that margin of error
Which proportions can be used to draw conclusions?
Population proportion. Never sample proportion.
What are the conditions for confidence interval for 2 populations?
Random sample OR individuals are randomly assigned & all other conditions are met. IF random assignment THEN specify for this condition: YES— BY RANDOM ASSIGNMENT
Samples are independent of one another (process of selecting 1 sample doesn’t affect selection of the other)
Large sample for BOTH (at least 10 yeses and 10 nos)
Large population for BOTH (at least 10x sample size)
How do you calculate a confidence interval for 2 proportions on the TI-84?
STAT → TESTS → B: 2-PropZInt
x1 = # of yeses for population 1
n1 = sample size population 1
x2 = # of yeses for population 2
n2 = sample size population 2
Calculate and report the interval. Search “interpret confidence interval for two populations” in this flashcard deck to find how to interpret the interval.
How do you interpret a confidence interval for two populations?
(+,+) → Population 1 is significantly larger
We are ___ % confident that the proportion of (yes) is between ___% and ___% significantly larger for (population 1) than it is for (population 2).
(-,-) → Population 2 is significantly larger
We are ___ % confident that the proportion of (yes) is between ___% and ___% significantly larger for (population 2) than it is for (population 1).
(-,+) → No significant difference between populations (contains 0)
We are ___ % confident that there is no significant difference in the proportion of (yeses, population 1) and the proportion of (yeses, population 2).
What do hypotheses in hypothesis testing describe?
Population parameters. NEVER sample statistics.
What are the 2 hypotheses?
Null: H0 = P0
Alternative: HA
p > P0 , p < P0 , p ≠ P0
What is a significance level?
How okay you are with making a mistake. Usually 0.05, given by alpha (α).
Is the probability of making a type I error: rejecting the null when the null is true / concluding the alternative hypothesis is true when in fact it is not true (WORST kind of error)!!
What is a test statistic?
How many standard errors the observed proportion is above/below the null hypothesis. The higher it is, the more evidence you have against the null. Represented by z (like z-score).
Only use if the data passes CLT.
1-proportion test statistic can be found by running 1-PropZTest on the TI-84 (DON’T use the one programmed into PRGM).
If test-stat is over 2, it’s unusual and you can reject the null. If it’s closer to 0, not unusual and fail to reject the null.
What is a p-value?
How likely the data is to be the same as expected / probability of obtaining a test statistic as extreme or more extreme than the one we actually observed / “surprise” in sample data is null is true. Represented by p.
Small p-value → large z-test statistic → data isn’t likely to be the same as expected → reject the null
Large p-value → small z-test statistic → data is pretty likely to be the same as expected → don’t reject the null
What is the relationship between p-value and significance level?
p < significance level → enough evidence to reject the null
p > significance level → not enough evidence. don’t reject the null
What are the 4 steps for hypothesis testing?
Write the null and alternative hypotheses
Choose a significance level and check CLT
Random sample
Large sample (at least 10 yeses and 10 nos)
Large population (at least 10x sample size)
STAT → TESTS → 5: 1-PropZTest → find the z (test statistic) and p (p-value)
P = initial %
X = amount of “yes”
N = total population
Prop = (≠, <, or >) P0
Interpret that you either reject or fail to reject the null hypothesis (is p-value bigger or lesser than significance level?) Use sentence template.
What are the “tailed” tests?
Right tailed test: Result is bigger than expected (p > Po). The right part of the normal curve is shaded, representing the p-value.
Two-tailed test: Result is not equal to what is expected (p ≠ Po). The p-value is double what it would be on right- and left- tailed tests, and is shaded on the end of both sides of the normal curve. IF you got the p-value from calculator, DON’T double it, it’s already right! ☆
Left-tailed test: Result is smaller than expected (p < Po). The left part of the normal curve is shaded, representing the p-value.
What is a sampling distribution?
The probability distribution of a sample statistic. ALWAYS normal even if population distribution is skewed. Graphed on x-bar (sample means) scale, not x-scale like the population distribution.
1-Prop ZTest vs. 2-Prop ZTest
same thing but 2-Prop has 2 population proportions, therefore 2 sample sizes and 2 sample proportions (girls vs. boys, new vs. old)
What are the null and alternative hypotheses for 2-Prop ZTest?
Null hypothesis: H0: p1 = p2 AKA p1 - p2 = 0
Alternative hypothesis: HA
Left-tailed test: p1 < p2
Right-tailed test: p1 > p2
Two-tailed test: p1 ≠ P2
*no numbers in these hypotheses! Only comparing the two proportions against each other!
How do you calculate 2-Prop ZTest?
Write the null and alternative hypotheses (search 2-Prop ZTest in this flashcard deck)
Choose a significance level and check CLT
Random sample (assume true if not given)
Samples are independent of one another (selection of one doesn’t affect selection of the other)
Large sample (at least 10 yeses and 10 nos)
Calculate pooled sample proportion: p̂ = (pop #1 yes + pop #2 yes)/(pop1 sample size + pop2 sample size)
For each population, do the following:
→ p̂*sample size ≥ 10
→ p̂*(sample size - # of “yes”) ≥ 10
NO need for large population!!
STAT → TESTS → 6: 2-PropZTest → find the z (test statistic) and p (p-value)
x1 = pop1 amount of “yes”
n1 = pop1 sample size
x2 = pop2 amount of “yes”
n2 = pop2 sample size
p1 = (≠, <, or >) p2
Interpret that you either reject or fail to reject the null hypothesis (is p-value bigger or lesser than significance level?) Use sentence template.
Z-Distribution vs. T-Distribution
Z-Distribution = used for population PROPORTIONS. large sample sizes. you know the population standard deviation. AKA Normal Distribution
T-Distribution = used for population MEANS. small sample sizes. you don’t know the sample mean or the population standard deviation (only sample). Shorter and wider than Normal Distribution to account for extra error b/c you lose 1 degree of freedom when you’re measuring 2 things
How do you calculate population mean?
Average of all sample means
Spread of sample mean vs. spread of population mean
WAY SMALLER
What happens to sample means when you increase sample size?
Graph gets narrower (bigger sample size → better precision). Accuracy does not change (only impacted by level of bias).
How do you calculate the standard error of a sample mean?
standard deviation of all sample means / √n (where n is the sample size)
How do you calculate the mean of a sample mean?
Mean = same as population mean
How do you compute the z-test statistic?
Do 1 or 2-Prop ZTest (depending on which is appropriate) then look for z=
What are “tailed tests”?
Right-tail test → Ha: P > P0 → “result is as extreme or more extreme than hypothesis” → Z is on the right of P0, and everything to the right of Z is shaded
Left-tail test → Ha: P < P0 → Z is on the left of P0, and everything to the left of Z is shaded
2-tail test → Ha: P ≠ P0 → Z is on both sides of P0. The shaded part is twice as big as it is on a 1-tailed graph, and everything from the left and right are shaded (NOT in between).
*on all of these, it is the standard error graph (on a scale of p-hat). P0 (null hypothesis) is the middle of the graph, where the mean on a standard deviation graph would be. the shaded part represents the p-value, the smaller the p-value, the more evidence you have to discredit the null hypothesis
What is skew?
|||IIIIIiiii—- = right skew (it’s flat on the right)
—iiiiIIIIII|||| = left skew (it’s flat on the left)
bell-shaped or uniform = no skew
What are the two distributions that look similar?
Population distribution → distribution of values from the population → has a certain shape, center, and spread → but value of its parameters are generally unknown → graphed on x-axis
Distribution of the sample (if random and large) looks the same as population distribution → has the same shape, center, and spread → described by sample statistics → also graphed on x-axis
Which flavor of distribution is not like the other girls :D
Sampling distribution → found by looking at the probability distribution of a sample statistic (basically st. dev of all the sample means) → describes how close the sample stat is to the population parameter → graphed on x-bar (sample mean) axis
looks NOTHING LIKE population distribution and distribution of the sample!! also usually narrower; its standard deviation = standard error of a sample mean = standard deviation of all sample means / √n
What are the CLT conditions for sample means?
Random
EITHER Normal or sample size ≥ 25
Large population (at least 10 times sample size) (use sentence from the template)
Equation for sampling distribution of sample means
N ( μ , σ/√n)
Equation for z-score
(sample mean - population mean) / standard error
How to find t-statistic with a calculator?
T-test (STAT → 2 → look for t=)
What makes a confidence interval narrower?
Small t-score AKA T* (critical value) → larger sample size → less variation
Lessen confidence level
What makes a confidence level wider?
Large t-score (critical value) → smaller sample size → more variation
Increase confidence level
What is margin of error for population means?
t-score x standard error (VS population proportion using z-score x standard error)
Can you know the standard error for a population mean?
No. This would require you know the value of the population standard deviation, but you don’t. So, you substitute in the sample standard deviation and use that to estimate standard error instead.
How do you calculate confidence intervals for a population mean?
Verify CLT (random sample, Normal distribution or n ≥ 25, large population 10x n)
STAT → Tests → 8: TInterval → either Stats (if given summary stats) or Data (if given raw data in a table- Frequency is ALWAYS 1!!!). Report that interval.
Interpret
How do you calculate sample size for population means?
n = [(2 x standard deviation or estimate of standard error) / desired margin of error]²
*Always ROUND UP to NEXT whole number!! (eg. 72.01 → 73)
How do you calculate hypothesis testing for a population mean?
Hypothesize (H0 is always = population mean)
State significance level, verify CLT (ONLY random and Normal / n ≥ 25 !!! NO large populations requirement)
STATS → Tests → 2: T-Test → report t-test statistic and p-value
Interpret if p-value is more than or less than significance level (α). Reject null if p-value is less than α.
How do you know if you can use a 2-sided test and if the results of the confidence interval will match with the results of the hypothesis test?
If significance level + confidence level add up to 100% (0.05 → 5% significance level + 95% confidence level).
What are dependent samples?
Dependent samples = matched pairs
Measured twice (“before and after,” same item in two stores)
Related somehow (twins, siblings, spouses)
Subjects deliberately matched to have similar characteristics (race, age)
What are independent samples?
No pairing, no connection
Both samples collected randomly
How do you make a confidence interval (to estimate the mean difference) for 2 independent samples?
Verify CLT (random, independent, Normal or n ≥ 25)
STAT → Tests → 0: 2-SampTInt → either Stats or Data → Pooled is always NO!!! → Report the calculated interval
If entire confidence interval is positive, μ1 is significantly larger. If it’s negative, μ2 is significantly larger. If it contains 0 (-, +), there is no significant difference (μ1 - μ2 = 0).
How do you do a hypothesis test (to test the mean difference) for 2 independent samples?
Write the null hypothesis (ALWAYS μ1 = μ2), then alternative hypothesis. There are no numbers involved, only symbols.
Write significance level and verify CLT (random, independent, Normal or n ≥ 25)
STAT → Tests → 4: 2-SampTTest → Pooled: No → report t-test statistic (t=) and p-value (p=)
Interpret if p-value is more than or less than significance level (α). Reject null if p-value is less than α.
What special thing do you do for 2 dependent samples?
Do x1 - x2 to turn them into difference scores.
How do you make a confidence interval for 2 dependent samples?
Make the difference scores and enter them as a list in the calculator
Verify CLT (Random and Normal or n ≥ 25, NO large populations requirement!!)
STAT → Tests → 8: T-Interval → Data → Freq = 1 → Report the interval
If entire confidence interval is positive, μ1 is significantly larger. If it’s negative, μ2 is significantly larger. If it contains 0 (-, +), there is no significant difference (μ1 - μ2 = 0).
How do you do a hypothesis test for 2 dependent samples?
Make the difference scores and enter them as a list in the calculator
Write the null hypothesis (ALWAYS μ1 = μ2), then alternative hypothesis. There are no numbers involved, only symbols.
Write significance level and verify CLT (Random and Normal, NO large populations requirement!!)
STAT → Tests → 2: T-Test → Data → μ0 = 0 → Freq = 1 → Report t-test statistic (t=) and p-value (p=)
Interpret if p-value is more than or less than significance level (α). Reject null if p-value is less than α.
Which method do you use for one-tailed and two-tailed testing?
One-tailed: hypothesis test
Two-tailed: Either hypothesis test or confidence interval (but confidence intervals are preferred because they give more information— both if it could or could not be a specific value AND gives plausible range of values for population parameter)