stats ch. 7, 8, and 9 flashcards (+ key to letters)

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/81

Earn XP

Description and Tags

Sadistic Torture Across Two Semesters

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

82 Terms

New cards

Key to all letters used in stats?

p = population proportion, yeses / total population. Usually a decimal like 0.6. → alternatively, in 1-PropZTest (hypothesis test), it represents p-value.

p̂ = sample proportion, yeses / sample size. Also called p-hat. Usually a decimal like 0.6

x = amount of “yeses”

n = total sample size

Po = assumed population proportion in the null hypothesis. This is what you guess before you do the experiment.

H0 = null hypothesis

HA = alternative hypothesis

μ = mean

x-bar = sample mean

σ = population standard deviation

s = sample standard deviation

α = significance level

New cards

Symbols for Statistics vs. Parameters ?

New cards

What is a population?

Group of individuals we wish to study (eg. all students at WVC).

New cards

What is a parameter?

Either a population proportion or population mean. Represented by p (proportion) or μ (mean). eg. proportion of all WVC students who work part-time.

New cards

What is a census?

Example of surveying entire population. However, this is usually unrealistic because the population is too big.

New cards

What is a sample?

Collection of individuals taken from population of interest.

New cards

What is a statistic?

Number of “yeses” / sample size. You have to have calculated something from the sample, like median or sample proportion. Represented by p-hat.

New cards

Which value can never be found, statistic or parameter?

Parameter. You can find the value of a statistic by collecting data, but you can only make inferences about the parameter (generalizations about the whole population).

New cards

What is sampling bias?

You get a sample that doesn’t accurately represent the whole population. eg. you survey a very opinionated population

New cards

What is voluntary-response bias?

Sampling bias where people only respond if they feel strongly about the results (miku fans will respond to surveys about how much they like miku. non-miku fans won’t fill it out at all)

New cards

What is nonresponse bias?

Sampling bias where people asked to do the survey refuse to fill it out. (might ask for uncomfortable info.)

New cards

What is measurement bias?

Survey questions do not produce true answers b/c confusing wording or misleading questions

New cards

What questions can you ask to determine if a survey is biased?

What % of people who were asked to participate actually did so?
Did the researchers choose people to participate, or did the people themselves choose to participate?
Did the researcher leave out whole segments of the population who are likely to answer the question differently from the rest of the population? (eg. only survey northern Californians, not southern ones)

New cards

How do you get a random number from the calculator?

assign a number to each and every member of the population

MATH → PRB → 5 → randInt (1, population size, amount of results you want)

pro: minimizes bias as long as individuals are selected without replacement (ie. don’t choose the number 3 twice, skip numbers that appear twice)

New cards

What is the difference between accuracy and precision?

Accurate → how close you are to the target value → measured by how unbiased you are → fixed by getting a random sample

Precise / variation→ how close the values are to one another → measured by size of standard error (the smaller the better) → fixed by getting a larger sample

New cards

What is a sampling distribution?

Take the mean of a bunch of samples. Then graph those means as a normal distribution. This is the sampling distribution of that sample statistic.

New cards

How do you calculate the population proportion based on the sampling distribution?

Mean of sampling distribution (p-hat, the sample proportion) is equal to population proportion.

New cards

How do you calculate shape, center, and spread for sampling distribution of a sample proportion/ population proportion?

Shape → check CLT. If all 3 conditions check, shape is Normal and you can make inferences. Else, can’t use normalcdf to calculate probability, stop calculating.

Center → mean.

Mean of sampling distribution = population proportion
Mean of sample statistic = population parameter

Spread → standard error → √[(p(1-p) / n] where p stands for population proportion AKA what % of the sample is “yes” (I put this on the TI-84 under PRGM 5: STDERROR)

New cards

Normal distribution vs. sampling distribution

Normal distribution → mean at μ (population mean) → axis on x (sample mean) → calculate standard deviation

Sampling distribution → mean at P (population proportion) → axis on p-hat (proportion of yeses) → calculate standard error → way smaller standard deviation, centered at same number as normal dist. though

New cards

Standard deviation vs. standard error?

Standard deviation = variation of your sample around the mean

Standard error = you take the mean of a bunch of samples. Then you calculate the standard deviation of those means.

New cards

Criteria for CLT (Central Limit Theorem) for population proportions?

CLT tells you if distribution is Normal, if it is, only then can you run tests on it

Random sample
Large sample (at least 10 yeses and 10 nos)
Large population (at least 10x sample size)

New cards

When do you use confidence interval vs. hypothesis test?

Confidence interval when you don’t know the past value or estimate for population proportion

Hypothesis test if you already know the past value

Note that confidence intervals give more info that hypothesis tests b/c tell you BOTH if a parameter could be that specific value AND gives a plausible range of values. Hypothesis test only tells you whether or not there’s significant evidence to prove you right.

New cards

Point estimate vs. interval estimate

Point estimate = single number, like a sample proportion or mean, that is our “best initial guess” for the parameter

Interval estimate = interval of numbers within which the parameter value is believed to fall

New cards

What is a confidence interval?

Interval containing the most plausible values for a parameter. Written like (point estimate) ± (margin of error).

New cards

How do you calculate confidence interval on a calculator?

Verify CLT:
1. Random sample
2. Large sample (at least 10 yeses and 10 nos)
3. Large population (at least 10x sample size)
STAT → TESTS → A: 1-PropZInt
1. x = number of yeses
2. n = sample size
3. c-level = confidence level
report the interval and write a sentence interpreting the interval (We are ___ % confident that the population proportion of all ____ that _____ is between ___ % and ___ %).

New cards

What is margin of error for population proportions?

z-score * standard error

Multiply the following SE by z-score to get margin of error:

99% confidence level = 2.58 standard errors

95% confidence interval = 1.96 (shortcut method: 2) standard errors

90% confidence interval = 1.645 standard errors

80% = 1.28 standard errors

New cards

How do you know if confidence interval means the majority likes something?

ENTIRE confidence interval has to be over 50% / 0.5. Basically, even if it’s (0.49, 0.51) the majority isn’t true because 0.49 < 0.50.

New cards

What is a confidence level?

Probability that the confidence intervals created with this process contain the true parameter. Basically: if I create a bunch of confidence intervals, what % of them capture the true value?

Does NOT apply to a single confidence interval. That one either captures the true value (100%) or doesn’t (0%).

Confidence level is a number chosen to be close to 1, most commonly 0.95.

confidence level * # of intervals = # intervals accurately captured

New cards

Correlation between confidence level, margin of error, and sample size?

Increase confidence level → increase margin of error and increase width of confidence interval → decrease sample size

Decrease confidence level → decrease margin of error and decrease width of confidence interval → increase sample size

New cards

If you know you want a certain margin of error, how do you find what sample size you need?

sample size n = (z*/m)² * ¼ where z is the critical number from the critical value table (search z-score in this flashcard deck)

Always round up to the nearest whole number EVEN IF decimal is small (eg. 1.01 → 2)

Equation for sample size is registered as a program in the TI-84 under PRGM

New cards

What is the sample size short-cut formula?

Special case: you want a 95% confidence level so you can suppose critical value Z* ~ 2 (in reality it’s 1.96)

Short-cut formula = n = 1/m² where n is margin of error you want and n is sample size needed to get that margin of error

New cards

Which proportions can be used to draw conclusions?

Population proportion. Never sample proportion.

New cards

What are the conditions for confidence interval for 2 populations?

Random sample OR individuals are randomly assigned & all other conditions are met. IF random assignment THEN specify for this condition: YES— BY RANDOM ASSIGNMENT
Samples are independent of one another (process of selecting 1 sample doesn’t affect selection of the other)
Large sample for BOTH (at least 10 yeses and 10 nos)
Large population for BOTH (at least 10x sample size)

New cards

How do you calculate a confidence interval for 2 proportions on the TI-84?

STAT → TESTS → B: 2-PropZInt
- x₁ = # of yeses for population 1
- n₁ = sample size population 1
- x₂ = # of yeses for population 2
- n₂ = sample size population 2
Calculate and report the interval. Search “interpret confidence interval for two populations” in this flashcard deck to find how to interpret the interval.

New cards

How do you interpret a confidence interval for two populations?

(+,+) → Population 1 is significantly larger

We are ___ % confident that the proportion of (yes) is between ___% and ___% significantly larger for (population 1) than it is for (population 2).

(-,-) → Population 2 is significantly larger

We are ___ % confident that the proportion of (yes) is between ___% and ___% significantly larger for (population 2) than it is for (population 1).

(-,+) → No significant difference between populations (contains 0)

We are ___ % confident that there is no significant difference in the proportion of (yeses, population 1) and the proportion of (yeses, population 2).

New cards

What do hypotheses in hypothesis testing describe?

Population parameters. NEVER sample statistics.

New cards

What are the 2 hypotheses?

Null: H₀ = P₀

Alternative: H_A

p > P_{0 ,}p < P_{0 ,}p ≠ P₀

New cards

What is a significance level?

How okay you are with making a mistake. Usually 0.05, given by alpha (α).

Is the probability of making a type I error: rejecting the null when the null is true / concluding the alternative hypothesis is true when in fact it is not true (WORST kind of error)!!

New cards

What is a test statistic?

How many standard errors the observed proportion is above/below the null hypothesis. The higher it is, the more evidence you have against the null. Represented by z (like z-score).

Only use if the data passes CLT.

1-proportion test statistic can be found by running 1-PropZTest on the TI-84 (DON’T use the one programmed into PRGM).

If test-stat is over 2, it’s unusual and you can reject the null. If it’s closer to 0, not unusual and fail to reject the null.

New cards

What is a p-value?

How likely the data is to be the same as expected / probability of obtaining a test statistic as extreme or more extreme than the one we actually observed / “surprise” in sample data is null is true. Represented by p.

Small p-value → large z-test statistic → data isn’t likely to be the same as expected → reject the null

Large p-value → small z-test statistic → data is pretty likely to be the same as expected → don’t reject the null

New cards

What is the relationship between p-value and significance level?

p < significance level → enough evidence to reject the null

p > significance level → not enough evidence. don’t reject the null

New cards

What are the 4 steps for hypothesis testing?

Write the null and alternative hypotheses
Choose a significance level and check CLT
1. Random sample
2. Large sample (at least 10 yeses and 10 nos)
3. Large population (at least 10x sample size)
STAT → TESTS → 5: 1-PropZTest → find the z (test statistic) and p (p-value)
- P = initial %
- X = amount of “yes”
- N = total population
- Prop = (≠, <, or >) P₀
Interpret that you either reject or fail to reject the null hypothesis (is p-value bigger or lesser than significance level?) Use sentence template.

New cards

What are the “tailed” tests?

Right tailed test: Result is bigger than expected (p > Po). The right part of the normal curve is shaded, representing the p-value.

Two-tailed test: Result is not equal to what is expected (p ≠ Po). The p-value is double what it would be on right- and left- tailed tests, and is shaded on the end of both sides of the normal curve. IF you got the p-value from calculator, DON’T double it, it’s already right! ☆

Left-tailed test: Result is smaller than expected (p < Po). The left part of the normal curve is shaded, representing the p-value.

New cards

What is a sampling distribution?

The probability distribution of a sample statistic. ALWAYS normal even if population distribution is skewed. Graphed on x-bar (sample means) scale, not x-scale like the population distribution.

New cards

1-Prop ZTest vs. 2-Prop ZTest

same thing but 2-Prop has 2 population proportions, therefore 2 sample sizes and 2 sample proportions (girls vs. boys, new vs. old)

New cards

What are the null and alternative hypotheses for 2-Prop ZTest?

Null hypothesis: H₀: p₁ = p₂ AKA p₁ - p₂ = 0

Alternative hypothesis: H_A

Left-tailed test: p₁ < p₂
Right-tailed test: p₁ > p₂
Two-tailed test: p₁ ≠ P₂

*no numbers in these hypotheses! Only comparing the two proportions against each other!

New cards

How do you calculate 2-Prop ZTest?

Write the null and alternative hypotheses (search 2-Prop ZTest in this flashcard deck)
Choose a significance level and check CLT
1. Random sample (assume true if not given)
2. Samples are independent of one another (selection of one doesn’t affect selection of the other)
3. Large sample (at least 10 yeses and 10 nos)
  - Calculate pooled sample proportion: p̂ = (pop #1 yes + pop #2 yes)/(pop1 sample size + pop2 sample size)
  - For each population, do the following:
    → p̂*sample size ≥ 10
    → p̂*(sample size - # of “yes”) ≥ 10
4. NO need for large population!!
STAT → TESTS → 6: 2-PropZTest → find the z (test statistic) and p (p-value)
- x₁ = pop1 amount of “yes”
- n₁ = pop1 sample size
- x₂ = pop2 amount of “yes”
- n₂ = pop2 sample size
- p₁ = (≠, <, or >) p₂
Interpret that you either reject or fail to reject the null hypothesis (is p-value bigger or lesser than significance level?) Use sentence template.

New cards

Z-Distribution vs. T-Distribution

Z-Distribution = used for population PROPORTIONS. large sample sizes. you know the population standard deviation. AKA Normal Distribution

T-Distribution = used for population MEANS. small sample sizes. you don’t know the sample mean or the population standard deviation (only sample). Shorter and wider than Normal Distribution to account for extra error b/c you lose 1 degree of freedom when you’re measuring 2 things

New cards

How do you calculate population mean?

Average/mean of all sample means

New cards

Spread of sample mean vs. spread of population mean

WAY SMALLER

New cards

What happens to sample means when you increase sample size?

Graph gets narrower (bigger sample size → better precision). Accuracy does not change (only impacted by level of bias).

New cards

What is standard ERROR (not deviation)?

Standard deviation of the sampling distribution (the mean of a bunch of samples from the population).

New cards

How do you calculate the standard error of a sample mean?

standard deviation of all sample means / √sample size

DON’T use PRGM 5: STDERROR !! that one is for when you’re given the population proportion— but here, you’re trying to find the STDERROR of sample mean!!!

New cards

How do you calculate the mean of a sample mean?

Mean = same as population mean

New cards

How do you compute the z-test statistic?

Do 1 or 2-Prop ZTest (hypothesis test for 1 vs. 2 proportions) then look for z=

OR (sample mean - population mean) / standard error

can use PRGM 8: ZSCORE for this, but replace:

→ given number = sample mean

→ mean = population mean

→ STDDEV = standard error

New cards

What are “tailed tests”?

Describes the sampling distribution:

Right-tail test → Ha: P > P0 → “result is as extreme or more extreme than hypothesis” → Z is on the right of P0, and everything to the right of Z is shaded

Left-tail test → Ha: P < P0 → Z is on the left of P0, and everything to the left of Z is shaded

2-tail test → Ha: P ≠ P0 → Z is on both sides of P0. The shaded part is twice as big as it is on a 1-tailed graph, and everything from the left and right are shaded (NOT in between).

*on all of these, it is the standard error graph (on a scale of p-hat). P0 (null hypothesis) is the middle of the graph, where the mean on a standard deviation graph would be. the shaded part represents the p-value, the smaller the p-value, the more evidence you have to discredit the null hypothesis

New cards

What is skew?

|||IIIIIiiii—- = right skew (it’s flat on the right)

—iiiiIIIIII|||| = left skew (it’s flat on the left)

bell-shaped or uniform = no skew

New cards

What are the two distributions that look similar?

Population distribution → distribution of values from the population → has a certain shape, center, and spread → but value of its parameters are generally unknown → graphed on x-axis

Distribution of the sample (if random and large) looks the same as population distribution → has the same shape, center, and spread → described by sample statistics → also graphed on x-axis

New cards

Which flavor of distribution is not like the other girls :D

Sampling distribution → found by looking at the probability distribution of a sample statistic (basically st. dev of all the sample means) → describes how close the sample stat is to the population parameter → graphed on x-bar (sample mean) axis

looks NOTHING LIKE population distribution and distribution of the sample!! also usually narrower; its standard deviation = standard error of a sample mean = standard deviation of all sample means / √sample size

Sampling distribution ALWAYS normal-shaped even if the population distribution is skewed!

New cards

What are the CLT conditions for sample means?

Random
EITHER Normal OR sample size ≥ 25 (if the distribution is unknown or not normal)
Large population (at least 10 times sample size, if sample is collected without replacement - once an observation is selected it cannot be selected again) (use sentence from the template)

New cards

How do you calculate shape, center, and spread for sampling distribution of a sample mean / population mean?

Shape → ALWAYS Normal!

Center → same as the one noted in the sample.

Spread → standard error → σ/√n where n is sample size. If you don’t know σ (population std. dev.), you can substitute s (sample std. dev.) to calculate the standard error.

*if you’re using Spread to plug into 2nd → VARS (DISTR) → normalcdf, make sure you plug in the Standard Error and NOT the standard deviation! Use equation for spread to calculate std. error.

New cards

Equation for sampling distribution of sample means

N ( μ , σ/√n)

if you don’t know σ (population std. dev.), you can substitute s (sample std. dev.) to calculate the standard error

New cards

Equation for z-score

(sample mean - population mean) / standard error

New cards

Special case of finding probability based off of sampling distribution

New cards

What is the t-statistic?

Basically like the z-score except you don’t know the population standard deviation. So you are only estimating the standard error, losing 1 degree of freedom to account for the additional error in the measurement that was estimated.

This changes the shape of the distribution, making the tails a little bigger/thicker than the Normal distribution. The new distribution isn’t Normal, it’s called the t-distribution instead and is shorter and wider. It’s still bell-shaped and centered at zero, though.

New cards

How to find t-statistic with a calculator?

T-test (STAT → TESTS → 2 → look for t=)

New cards

What factors influence the shape of a confidence interval?

Narrower confidence interval → lessen confidence level → small t-score AKA T* (critical value) → larger sample size

Wider confidence interval → increase confidence level → large t-score (critical value) → smaller sample size → more variation

New cards

What is margin of error for population means?

t-score x standard error (VS population proportion using z-score x standard error)

New cards

Can you know the standard error for a population mean?

No. This would require you know the value of the population standard deviation, but you don’t. So, you substitute in the sample standard deviation and use that to estimate (NOT know for sure) standard error instead.

New cards

How do you calculate confidence intervals for a population mean?

Verify CLT (random sample, Normal distribution or n ≥ 25, large population at least 10x sample size)
Check if you were given the raw data or summary stats
- Raw data: STAT → Edit → enter into L1. Then STAT → Tests → 8: TInterval → DATA → List: L1 → Freq: 1 → C-Level: [enter what’s given] → Calculate
- Summary stats: STAT → Tests → 8: TInterval → STATS → enter sample mean, sample standard deviation, population size, confidence level → Calculate
- Then report the interval you get.
Interpret the interval with the following template: We are ___% confident that the population mean (variable you’re trying to find) of all (population) is between ___ and ___ (units).

New cards

How do you calculate sample size for population means?

n = (2σ / desired margin of error)²

*Always ROUND UP to NEXT whole number!! (eg. 72.01 → 73)

**If you don’t know the standard deviation (σ), use an estimate of the standard error from previous research

New cards

How do you calculate hypothesis testing for a population mean?

Hypothesize (H₀ is always = population mean)
State significance level, verify CLT (ONLY random and Normal / n ≥ 25 !!! NO large populations requirement)
STATS → Tests → 2: T-Test → STATS → population mean, sample mean, sample standard deviation, sample size, ≠ or < or > → report t-test statistic and p-value
Interpret if p-value is greater than or less than significance level (α). Reject null if p-value is less than α.

New cards

How do you know if you can use a 2-sided test and if the results of the confidence interval will match with the results of the hypothesis test?

If significance level + confidence level add up to 100% (0.05 → 5% significance level + 95% confidence level).

New cards

What are dependent samples?

Dependent samples = matched pairs

Measured twice (“before and after,” same item in two stores)
Related somehow (twins, siblings, spouses)
Subjects deliberately matched to have similar characteristics (race, age)

New cards

What are independent samples?

No pairing, no connection

Both samples collected randomly

New cards

How do you make a confidence interval (to estimate the mean difference) for 2 independent samples?

Verify CLT (random, independent, Normal or both samples n ≥ 25)
Check if you’re given the summary statistics OR raw data
- Summary statistics: STAT → Tests → 0: 2-SampTInt → sample 1 mean, sample 1 st. dev, sample 1 size, sample 2 mean, sample 2 st. dev, sample 2 size, confidence level, pooled = NO!!!
- Raw data: STAT → Edit → enter data in L1 and L2 → STAT → Tests → 0: 2-SampTInt → FREQ for both = 1 → pooled is always NO!!!
- Then report the calculated interval
If entire confidence interval is positive, μ₁ is significantly larger. If it’s negative, μ₂ is significantly larger. If it contains 0 (-, +), there is no significant difference (μ₁ - μ₂ = 0).

New cards

3 interpretations for confidence interval for 2 (works for both independent and dependent) samples

New cards

How do you do a hypothesis test (to test the mean difference) for 2 independent samples?

Write the null hypothesis (ALWAYS μ₁ = μ₂), then alternative hypothesis (μ₁ ≠ or < or > μ₂). There are no numbers involved, only symbols.
Write significance level and verify CLT (random, independent, Normal or n ≥ 25)
Check if you’re given the summary statistics OR raw data
- Summary statistics: STAT → Tests → 4: 2-SampTTest → Pooled: No
- Raw data: STAT → Edit → enter data in L1 and L2 → STAT → Tests → 4: 2-SampTTest → FREQ for both = 1 → pooled is always NO!!!
- Report t-test statistic (t=) and p-value (p=)
Interpret if p-value is more than or less than significance level (α). Reject null if p-value is less than α.

New cards

What special thing do you do for 2 dependent samples?

Do x₁ - x₂ to turn them into “difference” scores.

New cards

How do you make a confidence interval (to estimate the mean difference) for 2 dependent samples?

Make the difference scores and enter them into STAT → Edit → L1
Verify CLT (Random and Normal or n ≥ 25, NO large populations requirement!!)
STAT → Tests → 8: T-Interval → Data → Freq = 1 → Report the interval
If entire confidence interval is positive, μ₁ is significantly larger. If it’s negative, μ₂ is significantly larger. If it contains 0 (-, +), there is no significant difference (μ₁ - μ₂ = 0).
Interpret using sentence templates (search “three interpretations” in this flashcard deck, flashcard 77)

New cards

How do you do a hypothesis test (to test the mean difference) for 2 dependent samples?

Make the difference scores and enter them as a list in the calculator
Write the null hypothesis (ALWAYS μ₀ = 0), then alternative hypothesis (μ₀ ≠ or < or > 0).
- “means are not the same” → μ₀ ≠ 0
- sample 1 > sample 2 → μ₀ > 0
- sample 2 > sample 1 → μ₀ < 0
Write significance level and verify CLT (Random and Normal, NO large populations requirement!!)
STAT → Tests → 2: T-Test → Data → μ₀ = 0 → Freq = 1 → Report t-test statistic (t=) and p-value (p=)
Interpret if p-value is more than or less than significance level (α). Reject null if p-value is less than α.

New cards

Which method do you use for one-tailed and two-tailed testing?

One-tailed → “bigger or smaller” → > or < symbols → hypothesis test

Two-tailed → “by how much” → ≠ symbol → Either hypothesis test or confidence interval

Confidence intervals are PREFERRED because they give more information: both if it could or could not be a specific value AND gives plausible range of values for population parameter