1/16
one month remaining
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
chapters 10, 11, and 4
Chapter 10
I have a paper flowchart for this, please refer to it under “Chapter 10” packet for convenience
What are categorical variables?
Places people into one of several categories (young, old, yes, no, zipcode, phone #, student ID)
“Qualitative variables”
What are contingency tables?
Two-way tables that organize two categorical variables and show many times each combination of categories occurs
Var #1 in rows, Var #2 in columns (column = what you want to predict)
The numbers are “observed counts.” It does NOT make the values numerical as long as the table is [category] vs. [category].
What are expected counts?
Frequencies we would expect to see if our assumptions (null hypothesis) = true
Calculated by (row total x column total) / grand total → equation available in TI-84 under PRGM 1: EXPCTDCT
If you already have the expected rate: (sample size) x (rate in %)
What is the chi-square (x²) statistic?
Measures the amount that our expected counts differ from observed counts
Calculated by: (observed count - expected count)²/expected count + repeat for each observed count and add them all up
Always a positive number
What is the chi-square (x²) distribution?
Probability of obtaining a x² statistic as extreme or more extreme than the one observed
Right-skewed and allows only positive values. The less degrees of freedom, the more skewed it will be.
P-value is shaded to the right of the x² number on the graph.
What is a goodness of fit test?
A kind of hypothesis test that evaluates whether the results from a situation/experiment match with the expectation (null hypothesis is yes)
ONE categorical variable with 3+ categories
What are the four steps of a Goodness of Fit Test?
Write null and alternative hypothesis (H0 = population is same as expected; HA = distributions are different)
State significance level, verify random sample, independent observations, and large sample size (E ≥ 5)
STAT → Edit → enter observed counts in L1, expected counts in L2 (expected counts are given) → STAT → TESTS → D: x²GOF-Test → enter observed in L1, expected in L2 → df is (# of categories - 1) → report x² and p-value
Reject/fail to reject the null. Same criteria as hypothesis test: compare to significance level (usually 0.05)
What is a chi-square test for association?
Checks whether TWO categorical variables are independent or associated
Observational studies can determine association. ONLY experiments can determine causality.
Test of homogeneity vs. test of independence
Homogeneity: 2+ independent samples and 1 categorical response variable, or 1 sample and randomly assign the individuals to 2+ groups
Independence: 1 random sample and 2 categorical response variables
Both test for association
What are the 4 steps for a test of association?
Write null and alternative hypothesis (H0 = variables are independent/no association/no prediction ability; HA = variables are dependent/is association/ability to predict).
State significance level, verify random sample, independent observations AND samples, and smallest expected count in the matrix in [STEP 3] is at least 5
2nd → (x-1) MATRIX → EDIT → 1 → enter data → 2nd →(x-1) MATRIX → EDIT → 2 → make it look like Matrix A
Then: STAT → Tests → C: x²-test → Observed = [A], Expected = [B] → report x² and p-value
Then: 2nd → (x-1) MATRIX → 2: [B] → verify all numbers are at least 5
Reject/fail to reject the null (same criteria as hypothesis test: compare to significance level, if lesser, then reject)
What are the special characteristics and limitations of a test of association?
If both categorical variables have only 2 categories, the test of homogeneity is identical to a 2-tailed test of 2 proportions
In test of independence, df = (# rows - 1) x (# columns - 1)
CON: The tests tell us if 2 variables are associated but don’t tell us how they’re associated
What is ANOVA used for?
Testing 3 or more population means & the variance between and within them
Tests whether there is an association between a categorical and numerical variable
What is the F-statistic?
Test statistic for ANOVA test
Computed by (variation between groups/variation within groups)
Large F means more variation between groups than within them & discredits the null hypothesis of equal means
What are the 4 steps for conducting an ANOVA test?
Write null and alternative hypothesis (H0 = no association/no predictive ability/μ1 = μ2 = μ3 = … μk [all means are the same]; HA = is an association/predictive ability/at least one of the means is not equal to another).
State significance level, verify random sample, independent observations AND samples, equal variance (largest SD/smallest SD < 2) NEW!!!, normal distribution (given or each n ≥ 25)
STAT → Edit → Enter sample data in L1/L2/L3, STAT → Tests → H: Anova (L1, L2, L3) → Report f-test statistic and p-value
Then: 2nd → (x-1) MATRIX → verify all numbers are at least 5
Reject/fail to reject the null
What are the limitations of ANOVA?
Tells you one of the means is different, but not which one