stats ch. 7, 8, and 9 flashcards (+ key to letters)

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/76

flashcard set

Earn XP

Description and Tags

Sadistic Torture Across Two Semesters

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

77 Terms

1
New cards

Key to all letters used in stats?

p = population proportion, yeses / total population. Usually a decimal like 0.6. → alternatively, in 1-PropZTest (hypothesis test), it represents p-value.

p̂ = sample proportion, yeses / sample size. Also called p-hat. Usually a decimal like 0.6

x = amount of “yeses”

n = total sample size

Po = assumed population proportion in the null hypothesis. This is what you guess before you do the experiment.

H0 = null hypothesis

HA = alternative hypothesis

μ = mean

x-bar = sample mean

σ = population standard deviation

s = sample standard deviation

α = significance level

2
New cards

What is a population?

Group of individuals we wish to study (eg. all students at WVC).

3
New cards

What is a parameter?

Either a population proportion or population mean. Represented by p (proportion) or μ (mean). eg. proportion of all WVC students who work part-time.

4
New cards

What is a census?

Example of surveying entire population. However, this is usually unrealistic because the population is too big.

5
New cards

What is a sample?

Collection of individuals taken from population of interest.

6
New cards

What is a statistic?

Number of “yeses” / sample size. You have to have calculated something from the sample, like median or sample proportion. Represented by p-hat.

7
New cards

Which value can never be found, statistic or parameter?

Parameter. You can find the value of a statistic by collecting data, but you can only make inferences about the parameter (generalizations about the whole population).

8
New cards

What is sampling bias?

You get a sample that doesn’t accurately represent the whole population. eg. you survey a very opinionated population

9
New cards

What is voluntary-response bias?

Sampling bias where people only respond if they feel strongly about the results (miku fans will respond to surveys about how much they like miku. non-miku fans won’t fill it out at all)

10
New cards

What is nonresponse bias?

Sampling bias where people asked to do the survey refuse to fill it out. (might ask for uncomfortable info.)

11
New cards

What is measurement bias?

Survey questions do not produce true answers b/c confusing wording or misleading questions

12
New cards

What questions can you ask to determine if a survey is biased?

  1. What % of people who were asked to participate actually did so?

  2. Did the researchers choose people to participate, or did the people themselves choose to participate?

  3. Did the researcher leave out whole segments of the population who are likely to answer the question differently from the rest of the population? (eg. only survey northern Californians, not southern ones)

13
New cards

How do you get a random number from the calculator?

assign a number to each and every member of the population

MATH → PRB → 5 → randInt (1, population size, amount of results you want)

pro: minimizes bias as long as individuals are selected without replacement (ie. don’t choose the number 3 twice, skip numbers that appear twice)

14
New cards

What is the difference between accuracy and precision?

Accurate → how close you are to the target value → measured by how unbiased you are → fixed by getting a random sample

Precise / variation→ how close the values are to one another → measured by size of standard error (the smaller the better) → fixed by getting a larger sample

15
New cards

What is a sampling distribution?

Take the mean of a bunch of samples. Then graph those means as a normal distribution. This is the sampling distribution of that sample statistic.

16
New cards

How do you calculate the population proportion based on the sampling distribution?

Mean of sampling distribution (p-hat, the sample proportion) is equal to population proportion.

17
New cards

How do you calculate shape, center, and spread for sampling distribution?

Shape → check CLT. If all 3 conditions check, shape is Normal and you can make inferences. Else, can’t use normalcdf to calculate probability, stop calculating.

Center → mean.

  • Mean of sampling distribution = population proportion

  • Mean of sample statistic = population parameter

Spread → standard error → √[(p(1-p) / n] where p stands for population proportion AKA what % of the sample is “yes” (I put this on the TI-84 under PRGM 5: STDERROR)

18
New cards

Normal distribution vs. sampling distribution

Normal distribution → mean at μ (population mean) → axis on x (sample mean) → calculate standard deviation

Sampling distribution → mean at P (population proportion) → axis on p-hat (proportion of yeses) → calculate standard error → way smaller standard deviation, centered at same number as normal dist. though

19
New cards

Standard deviation vs. standard error?

Standard deviation = variation of your sample around the mean

Standard error = you take the mean of a bunch of samples. Then you calculate the standard deviation of those means.

20
New cards

Criteria for CLT (Central Limit Theorem) for population proportions?

CLT tells you if distribution is Normal, if it is, only then can you run tests on it

  1. Random sample

  2. Large sample (at least 10 yeses and 10 nos)

  3. Large population (at least 10x sample size)

21
New cards

When do you use confidence interval vs. hypothesis test?

Confidence interval when you don’t know the past value or estimate for population proportion

Hypothesis test if you already know the past value

Note that confidence intervals give more info that hypothesis tests b/c tell you BOTH if a parameter could be that specific value AND gives a plausible range of values. Hypothesis test only tells you whether or not there’s significant evidence to prove you right.

22
New cards

Point estimate vs. interval estimate

Point estimate = single number, like a sample proportion or mean, that is our “best initial guess” for the parameter

Interval estimate = interval of numbers within which the parameter value is believed to fall

23
New cards

What is a confidence interval?

Interval containing the most plausible values for a parameter. Written like (point estimate) ± (margin of error).

24
New cards

How do you calculate confidence interval on a calculator?

  1. Verify CLT:

    1. Random sample

    2. Large sample (at least 10 yeses and 10 nos)

    3. Large population (at least 10x sample size)

  2. STAT → TESTS → A: 1-PropZInt

    1. x = number of yeses

    2. n = sample size

    3. c-level = confidence level

  3. report the interval and write a sentence interpreting the interval (We are ___ % confident that the population proportion of all ____ that _____ is between ___ % and ___ %).

25
New cards

What is margin of error for population proportions?

z-score * standard error

Multiply the following SE by z-score to get margin of error:

99% confidence level = 2.58 standard errors

95% confidence interval = 1.96 (shortcut method: 2) standard errors

90% confidence interval = 1.645 standard errors

80% = 1.28 standard errors

26
New cards

How do you know if confidence interval means the majority likes something?

ENTIRE confidence interval has to be over 50% / 0.5. Basically, even if it’s (0.49, 0.51) the majority isn’t true because 0.49 < 0.50.

27
New cards

What is a confidence level?

Probability that the confidence intervals created with this process contain the true parameter. Basically: if I create a bunch of confidence intervals, what % of them capture the true value?

Does NOT apply to a single confidence interval. That one either captures the true value (100%) or doesn’t (0%).

Confidence level is a number chosen to be close to 1, most commonly 0.95.

confidence level * # of intervals = # intervals accurately captured

28
New cards

Correlation between confidence level, margin of error, and sample size?

Increase confidence level → increase margin of error and increase width of confidence interval → decrease sample size

Decrease confidence level → decrease margin of error and decrease width of confidence interval → increase sample size

29
New cards

If you know you want a certain margin of error, how do you find what sample size you need?

sample size n = (z*/m)² * ¼ where z is the critical number from the critical value table (search z-score in this flashcard deck)

Always round up to the nearest whole number EVEN IF decimal is small (eg. 1.01 → 2)

Equation for sample size is registered as a program in the TI-84 under PRGM

30
New cards

What is the sample size short-cut formula?

Special case: you want a 95% confidence level so you can suppose critical value Z* ~ 2 (in reality it’s 1.96)

Short-cut formula = n = 1/m² where n is margin of error you want and n is sample size needed to get that margin of error

31
New cards

Which proportions can be used to draw conclusions?

Population proportion. Never sample proportion.

32
New cards

What are the conditions for confidence interval for 2 populations?

  1. Random sample OR individuals are randomly assigned & all other conditions are met. IF random assignment THEN specify for this condition: YES— BY RANDOM ASSIGNMENT

  2. Samples are independent of one another (process of selecting 1 sample doesn’t affect selection of the other)

  3. Large sample for BOTH (at least 10 yeses and 10 nos)

  4. Large population for BOTH (at least 10x sample size)

33
New cards

How do you calculate a confidence interval for 2 proportions on the TI-84?

  1. STAT → TESTS → B: 2-PropZInt

    • x1 = # of yeses for population 1

    • n1 = sample size population 1

    • x2 = # of yeses for population 2

    • n2 = sample size population 2

  2. Calculate and report the interval. Search “interpret confidence interval for two populations” in this flashcard deck to find how to interpret the interval.

34
New cards

How do you interpret a confidence interval for two populations?

(+,+) → Population 1 is significantly larger

  • We are ___ % confident that the proportion of (yes) is between ___% and ___% significantly larger for (population 1) than it is for (population 2).

(-,-) → Population 2 is significantly larger

  • We are ___ % confident that the proportion of (yes) is between ___% and ___% significantly larger for (population 2) than it is for (population 1).

(-,+) → No significant difference between populations (contains 0)

  • We are ___ % confident that there is no significant difference in the proportion of (yeses, population 1) and the proportion of (yeses, population 2).

35
New cards

What do hypotheses in hypothesis testing describe?

Population parameters. NEVER sample statistics.

36
New cards

What are the 2 hypotheses?

Null: H0 = P0

Alternative: HA

p > P0 , p < P0 , p ≠ P0

37
New cards

What is a significance level?

How okay you are with making a mistake. Usually 0.05, given by alpha (α).

Is the probability of making a type I error: rejecting the null when the null is true / concluding the alternative hypothesis is true when in fact it is not true (WORST kind of error)!!

38
New cards

What is a test statistic?

How many standard errors the observed proportion is above/below the null hypothesis. The higher it is, the more evidence you have against the null. Represented by z (like z-score).

Only use if the data passes CLT.

1-proportion test statistic can be found by running 1-PropZTest on the TI-84 (DON’T use the one programmed into PRGM).

If test-stat is over 2, it’s unusual and you can reject the null. If it’s closer to 0, not unusual and fail to reject the null.

39
New cards

What is a p-value?

How likely the data is to be the same as expected / probability of obtaining a test statistic as extreme or more extreme than the one we actually observed / “surprise” in sample data is null is true. Represented by p.

Small p-value → large z-test statistic → data isn’t likely to be the same as expected → reject the null

Large p-value → small z-test statistic → data is pretty likely to be the same as expected → don’t reject the null

40
New cards

What is the relationship between p-value and significance level?

p < significance level → enough evidence to reject the null

p > significance level → not enough evidence. don’t reject the null

41
New cards

What are the 4 steps for hypothesis testing?

  1. Write the null and alternative hypotheses

  2. Choose a significance level and check CLT

    1. Random sample

    2. Large sample (at least 10 yeses and 10 nos)

    3. Large population (at least 10x sample size)

  3. STAT → TESTS → 5: 1-PropZTest → find the z (test statistic) and p (p-value)

    • P = initial %

    • X = amount of “yes”

    • N = total population

    • Prop = (≠, <, or >) P0

  4. Interpret that you either reject or fail to reject the null hypothesis (is p-value bigger or lesser than significance level?) Use sentence template.

42
New cards

What are the “tailed” tests?

Right tailed test: Result is bigger than expected (p > Po). The right part of the normal curve is shaded, representing the p-value.

Two-tailed test: Result is not equal to what is expected (p ≠ Po). The p-value is double what it would be on right- and left- tailed tests, and is shaded on the end of both sides of the normal curve. IF you got the p-value from calculator, DON’T double it, it’s already right! ☆

Left-tailed test: Result is smaller than expected (p < Po). The left part of the normal curve is shaded, representing the p-value.

43
New cards

What is a sampling distribution?

The probability distribution of a sample statistic. ALWAYS normal even if population distribution is skewed. Graphed on x-bar (sample means) scale, not x-scale like the population distribution.

44
New cards

1-Prop ZTest vs. 2-Prop ZTest

same thing but 2-Prop has 2 population proportions, therefore 2 sample sizes and 2 sample proportions (girls vs. boys, new vs. old)

45
New cards

What are the null and alternative hypotheses for 2-Prop ZTest?

Null hypothesis: H0: p1 = p2 AKA p1 - p2 = 0

Alternative hypothesis: HA

  • Left-tailed test: p1 < p2

  • Right-tailed test: p1 > p2

  • Two-tailed test: p1 ≠ P2

*no numbers in these hypotheses! Only comparing the two proportions against each other!

46
New cards

How do you calculate 2-Prop ZTest?

  1. Write the null and alternative hypotheses (search 2-Prop ZTest in this flashcard deck)

  2. Choose a significance level and check CLT

    1. Random sample (assume true if not given)

    2. Samples are independent of one another (selection of one doesn’t affect selection of the other)

    3. Large sample (at least 10 yeses and 10 nos)

      • Calculate pooled sample proportion: p̂ = (pop #1 yes + pop #2 yes)/(pop1 sample size + pop2 sample size)

      • For each population, do the following:

        → p̂*sample size ≥ 10

        → p̂*(sample size - # of “yes”) ≥ 10

    4. NO need for large population!!

  3. STAT → TESTS → 6: 2-PropZTest → find the z (test statistic) and p (p-value)

    • x1 = pop1 amount of “yes”

    • n1 = pop1 sample size

    • x2 = pop2 amount of “yes”

    • n2 = pop2 sample size

    • p1 = (≠, <, or >) p2

  4. Interpret that you either reject or fail to reject the null hypothesis (is p-value bigger or lesser than significance level?) Use sentence template.

47
New cards

Z-Distribution vs. T-Distribution

Z-Distribution = used for population PROPORTIONS. large sample sizes. you know the population standard deviation. AKA Normal Distribution

T-Distribution = used for population MEANS. small sample sizes. you don’t know the sample mean or the population standard deviation (only sample). Shorter and wider than Normal Distribution to account for extra error b/c you lose 1 degree of freedom when you’re measuring 2 things

48
New cards

How do you calculate population mean?

Average of all sample means

49
New cards

Spread of sample mean vs. spread of population mean

WAY SMALLER

50
New cards

What happens to sample means when you increase sample size?

Graph gets narrower (bigger sample size → better precision). Accuracy does not change (only impacted by level of bias).

51
New cards

How do you calculate the standard error of a sample mean?

standard deviation of all sample means / √n (where n is the sample size)

52
New cards

How do you calculate the mean of a sample mean?

Mean = same as population mean

53
New cards

How do you compute the z-test statistic?

Do 1 or 2-Prop ZTest (depending on which is appropriate) then look for z=

54
New cards

What are “tailed tests”?

Right-tail test → Ha: P > P0 → “result is as extreme or more extreme than hypothesis” → Z is on the right of P0, and everything to the right of Z is shaded

Left-tail test → Ha: P < P0 → Z is on the left of P0, and everything to the left of Z is shaded

2-tail test → Ha: P ≠ P0 → Z is on both sides of P0. The shaded part is twice as big as it is on a 1-tailed graph, and everything from the left and right are shaded (NOT in between).

*on all of these, it is the standard error graph (on a scale of p-hat). P0 (null hypothesis) is the middle of the graph, where the mean on a standard deviation graph would be. the shaded part represents the p-value, the smaller the p-value, the more evidence you have to discredit the null hypothesis

55
New cards

What is skew?

|||IIIIIiiii—- = right skew (it’s flat on the right)

—iiiiIIIIII|||| = left skew (it’s flat on the left)

bell-shaped or uniform = no skew

56
New cards

What are the two distributions that look similar?

Population distribution → distribution of values from the population → has a certain shape, center, and spread → but value of its parameters are generally unknown → graphed on x-axis

Distribution of the sample (if random and large) looks the same as population distribution → has the same shape, center, and spread → described by sample statistics → also graphed on x-axis

57
New cards

Which flavor of distribution is not like the other girls :D

Sampling distribution → found by looking at the probability distribution of a sample statistic (basically st. dev of all the sample means) → describes how close the sample stat is to the population parameter → graphed on x-bar (sample mean) axis

looks NOTHING LIKE population distribution and distribution of the sample!! also usually narrower; its standard deviation = standard error of a sample mean = standard deviation of all sample means / √n

58
New cards

What are the CLT conditions for sample means?

  • Random

  • EITHER Normal or sample size ≥ 25

  • Large population (at least 10 times sample size) (use sentence from the template)

59
New cards

Equation for sampling distribution of sample means

N ( μ , σ/√n)

60
New cards

Equation for z-score

(sample mean - population mean) / standard error

61
New cards

How to find t-statistic with a calculator?

T-test (STAT → 2 → look for t=)

62
New cards

What makes a confidence interval narrower?

Small t-score AKA T* (critical value) → larger sample size → less variation

Lessen confidence level

63
New cards

What makes a confidence level wider?

Large t-score (critical value) → smaller sample size → more variation

Increase confidence level

64
New cards

What is margin of error for population means?

t-score x standard error (VS population proportion using z-score x standard error)

65
New cards

Can you know the standard error for a population mean?

No. This would require you know the value of the population standard deviation, but you don’t. So, you substitute in the sample standard deviation and use that to estimate standard error instead.

66
New cards

How do you calculate confidence intervals for a population mean?

  1. Verify CLT (random sample, Normal distribution or n ≥ 25, large population 10x n)

  2. STAT → Tests → 8: TInterval → either Stats (if given summary stats) or Data (if given raw data in a table- Frequency is ALWAYS 1!!!). Report that interval.

  3. Interpret

67
New cards

How do you calculate sample size for population means?

n = [(2 x standard deviation or estimate of standard error) / desired margin of error]²

*Always ROUND UP to NEXT whole number!! (eg. 72.01 → 73)

68
New cards

How do you calculate hypothesis testing for a population mean?

  1. Hypothesize (H0 is always = population mean)

  2. State significance level, verify CLT (ONLY random and Normal / n ≥ 25 !!! NO large populations requirement)

  3. STATS → Tests → 2: T-Test → report t-test statistic and p-value

  4. Interpret if p-value is more than or less than significance level (α). Reject null if p-value is less than α.

69
New cards

How do you know if you can use a 2-sided test and if the results of the confidence interval will match with the results of the hypothesis test?

If significance level + confidence level add up to 100% (0.05 → 5% significance level + 95% confidence level).

70
New cards

What are dependent samples?

Dependent samples = matched pairs

  • Measured twice (“before and after,” same item in two stores)

  • Related somehow (twins, siblings, spouses)

  • Subjects deliberately matched to have similar characteristics (race, age)

71
New cards

What are independent samples?

No pairing, no connection

  • Both samples collected randomly

72
New cards

How do you make a confidence interval (to estimate the mean difference) for 2 independent samples?

  1. Verify CLT (random, independent, Normal or n ≥ 25)

  2. STAT → Tests → 0: 2-SampTInt → either Stats or Data → Pooled is always NO!!! → Report the calculated interval

  3. If entire confidence interval is positive, μ1 is significantly larger. If it’s negative, μ2 is significantly larger. If it contains 0 (-, +), there is no significant difference (μ1 - μ2 = 0).

73
New cards

How do you do a hypothesis test (to test the mean difference) for 2 independent samples?

  1. Write the null hypothesis (ALWAYS μ1 = μ2), then alternative hypothesis. There are no numbers involved, only symbols.

  2. Write significance level and verify CLT (random, independent, Normal or n ≥ 25)

  3. STAT → Tests → 4: 2-SampTTest → Pooled: No → report t-test statistic (t=) and p-value (p=)

  4. Interpret if p-value is more than or less than significance level (α). Reject null if p-value is less than α.

74
New cards

What special thing do you do for 2 dependent samples?

Do x1 - x2 to turn them into difference scores.

75
New cards

How do you make a confidence interval for 2 dependent samples?

  1. Make the difference scores and enter them as a list in the calculator

  2. Verify CLT (Random and Normal or n ≥ 25, NO large populations requirement!!)

  3. STAT → Tests → 8: T-Interval → Data → Freq = 1 → Report the interval

  4. If entire confidence interval is positive, μ1 is significantly larger. If it’s negative, μ2 is significantly larger. If it contains 0 (-, +), there is no significant difference (μ1 - μ2 = 0).

76
New cards

How do you do a hypothesis test for 2 dependent samples?

  1. Make the difference scores and enter them as a list in the calculator

  2. Write the null hypothesis (ALWAYS μ1 = μ2), then alternative hypothesis. There are no numbers involved, only symbols.

  3. Write significance level and verify CLT (Random and Normal, NO large populations requirement!!)

  4. STAT → Tests → 2: T-Test → Data → μ0 = 0 → Freq = 1 → Report t-test statistic (t=) and p-value (p=)

  5. Interpret if p-value is more than or less than significance level (α). Reject null if p-value is less than α.

77
New cards

Which method do you use for one-tailed and two-tailed testing?

One-tailed: hypothesis test

Two-tailed: Either hypothesis test or confidence interval (but confidence intervals are preferred because they give more information— both if it could or could not be a specific value AND gives plausible range of values for population parameter)