1/109
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
one-sample t-interval for μ
x̄ ± t* (s/√n), df = n-1
one-sample t-test for μ
t = x̄ - μ / (s / √n), df = n - 1
one-sample z-interval for p
p̂ ± z* √(p̂(1-p̂) / n)
one-sample z-test for p
z = (p̂ - p) / √(p(1-p) / n)
two-sample t-interval for μ1- μ2
(x̄1 - x̄2) ± t* √( (s1²/n1) + (s2²/n2) )
two-sample t-test for μ1- μ2
t = (x̄1 - x̄2) / √( (s1²/n1) + (s2²/n2) )
two-sample z-interval for p1 - p2
(p̂1 - p̂2) ± z* √( (p̂1(1-p̂1) / n1) + (p̂2(1-p̂2) / n2) )
two-sample z-test for p1 - p2
z = (p̂1 - p̂2) / √( (p̂c(1-p̂c) / n1) + (p̂c(1-p̂c) / n2) ), where p̂c = (X1 + X2) / (n1 + n2)
χ2 test for homogeneity/independence
χ2 = ∑ (observed - expected)² / expected, df = number of groups - 1
χ2 goodness of fit test
χ2 = ∑ (observed - expected)² / expected, df = (rows - 1)(columns - 1)
t-interval for slope
b = t* SEb, df = n - 2
t-test for slope
t = (b - β) / SEb
point estimate
if a confidence interval is (A, B), the point estimate is the average of A and B, or the exact center of the confidence interval
margin of error
critical value * standard error of statistic, or B - (the point estimate) for an interval (A, B)
power
the probability a test will correctly reject the null hypothesis, given the alternative hypothesis is true
type 1 error
when the null hypothesis is true and rejected (false positive)
type 2 error
when the alternative hypothesis is true and the null hypothesis is not rejected (false negative)
interpret the confidence interval
We are C% confident that the confidence interval from [A] to [B] captures the population parameter (in context)
interpret the confidence level
In repeated random sampling with the same sample size, approximately C% of confidence intervals created will capture the population parameter.
interpret the p-value
a p-value is the probability of obtaining a test statistic as extreme or more extreme than the observed test statistic when the null hypothesis is assumed to be true
unbiased estimator
when estimating a population parameter, a statistic is unbiased if the center of the sampling distribution for the statistic is equal to the population parameter
conditions for a one-sample t-test and t-interval for μ
random: data comes from a random sample
10%: when sampling without replacement, n < 10% of the population size
normal: population distribution is normal, large sample (n > 30), or a dotplot of the sample data shows no strong skewness or outliers
conditions for a one-sample z-test and z-interval for p
random: data comes from a random sample
10%: when sampling without replacement, n < 10% of the population size
large counts: np > 10 and n(1-p) > 10 for a test, np̂ > 10 and n(1-p̂) > 10 for an interval
conditions for a two-sample t-test and t-interval for μ1 - μ2
random: data come from independent random samples or 2 groups in a randomized experiment
10%: when sampling without replacement, n < 10% of the population size for both samples
normal: for both populations, either the population distribution is normal, large sample (n > 30), or a dotplot of the sample data shows no strong skewness or outliers
conditions for a two-sample z-test and z-interval for p1 - p2
random: data come from independent random samples or 2 groups in a randomized experiment.
10%: when sampling without replacement, n < 10% of the population size for both samples
large counts: n1p̂c > 20, n1(1-p̂c) > 10, n2p̂c > 10, n2(1-p̂c) > 10 for a test, n1p̂1 > 10, n1(1-p̂1) > 10, n2p̂2 > 10, n2(1-p̂2) > 10 for an interval.
conditions for a χ 2 test
random: data from a random sample, separate random samples, or groups in a randomized experiment
10%: when sampling without replacement, n < 10% of the population size for all samples
large counts: all expected counts must be at least 5
conditions for a t-test or t-interval for slope
linear: true relationship between the variables is linear
independent observations, 10% condition when sampling without replacement
normal: responses vary normally around the regression line for all x-values
equal variance around the regression line for all x-values
random: data from a random sample or randomized experiment
LINER
why do we check conditions?
random: so we can generalize to the population from which the sample was seleced
10%: so sampling without replacement is okay and we can use the stated formula for standard deviation
normal/large sample: so the sampling distribution is approximately normal
parameter
a number that describes the population (μ, p, σ)
statistic
a number that describes the sample (x̄, p̂, s)
population distribution
distribution of responses for every individual of the population
sample distribution
the distribution of responses for a single sample
sampling distribution
the distribution of values for the statistic for all possible samples of a given size from a given population
calculator function for one-sample t-interval for μ
T-Interval
calculator function for one-sample t-test for μ
T-Test
calculator function for a one-sample z-interval for p
1-PropZInt
calculator function for a one-sample z-test for p
1-PropZTest
what factors affect the width of a confidence interval?
decreases as n increases, increases as the confidence level increases
how do i make a decision based on a p-value?
if the p-value ≤ α, reject the null hypothesis
if the p-value > α, fail to reject the null hypothesis
what does it mean to reject the null hypothesis?
there is convincing statistical evidence to support the alternative hypothesis
what does it mean to fail to reject the null hypothesis?
there is not convincing statistical evidence to support the alternative hypothesis
what is the probability that a specific confidence interval captures the population parameter?
0 or 1, a confidence interval calculated from sample data either does or does not capture the population parameter
how to calculate expected counts in a χ 2 test for homogeneity/independence
(row total)(column total)/table total
how to choose the right inference procedure
does the scenario describe mean(s), proportion(s), counts, or slope?
does the scenario describe one sample, two samples, or paired data?
does the scenario describe a test or a confidence interval?
describing a distribution
shape (skew, binomial, symmetric, etc.)
center (mean or median if the distribution is skewed)
spread (variability, standard deviation for mean and interquartile range for median)
outliers (or potential outliers if you are estimating)
outlier rule
any value that falls more than 1.5IQR above Q3 or below Q1
Lower outliers < Q1 - 1.5(IQR)
Upper outliers > Q3 + 1.5(IQR)
how can we use a graph to compare the mean and the median?
skewed left, mean < median
roughly symmetric, mean = median
skewed right, mean > median
interpreting the standard deviation
gives the typical distance that the values are away from the mean, if there are more values away from the mean there’s a larger standard deviation
how do we describe the relationship between two variables (like in a scatterplot)
direction - positive or negative
unusual values - outliers, influential observations
form - linear or curved
strength - weak or strong
DUFS
how to find the mean, standard deviation, and 5 number summary using a calculator
stat → edit → enter data in L1 → stat → calc → 1-Var Stats → leave FreqList blank and calculate
how to calculate a least squares regression line using a calculator
stat → edit → enter x-values in L1 and y-values in L2 → stat → calc → LinReg(a+bx) → leave FreqList blank and calculate
What is the IQR (interquartile range)?
the difference between the third and first quartiles. Q1 and Q3 form the boundaries for the middle 50% of values in an ordered data set.
interpret the y-intercept of the least squares regression line
the predicted value of y in context when x in context is 0 is (a)
interpret the slope of the least squares regression line
for every increase of 1 unit of x in context, the predicted output of y in context increases/decreases by (b)
properties of correlation r
unitless, always between -1 and 1, greatly affected by regression outliers,
if the direction is negative, so is r. if the direction is positive, so is r.
the closer r is to -1 or 1, the stronger the relationship. the closer that r is to 0, the weaker the relationship.
gives the strength and direction of the linear relationship between 2 quantitative variables, does not apply for non-linear relationships
interpret the coefficient of determination r2
gives the percent of the variation of y in context tat is explained by the least squares regression line using x = x in context
regression outlier
a point that does not follow the general trend shown in the rest of the data and has a large residual
influential point
any point that, if removed, changes the relationship substantially (creates big changes to slope and/or y-intercept)
high-leverage point
has a substantially larger or smaller x-value than the other observations have
discrete variables
can take on a countable number of values, whether they are finite or infinite
continuous variables
can take on infinitely many values, but those values cannot be counted
categorical variable
takes on values that are category names or group labels
quantitative variable
one that takes on numerical values for a measured or counted quantity
control group
a collection of experimental units that are either not given a treatment of interest or given a treatment with an inactive substance (placebo) to provide a baseline to which the treatment groups can be compared, so it can be determined if the treatments have an effect
single-blind experiment
subjects do not know which treatment they are receiving, but members of the research team do, or vice versa
double-blind experiment
neither the subjects nor the members of the research team who interact with them know which treatment a subject is receiving
explanatory variable
a variable whose levels are manipulated intentionally
response variable
an outcome that is measured after the treatments have been administered
non-random (poor) sampling methods
convenience sampling and voluntary response sampling because they do not use chance to select the individuals
experimental units
animals or objects in an experiment
subjects
humans in an experiment
nonresponse bias
selected people do not respond
undercoverage
systematically excluding people from being able to be selected
response bias
providing inaccurate responses (on purpose or by accident)
wording issues
confusing wording or question is slanted towards a particular response
can increasing the sample size correct a biased sampling method?
no, you’ll just get a bigger flawed sample.
bias
the systematic tendency to overestimate or underestimate the true population parameter
observational study
no treatment imposed
experiment
imposed treatment on experimental units or subjects
can the results be generalized to a larger population?
the results can only be generalized to the population from which the sample/subjects were randomly selected. if the sample/subjects were not randomly selected then the results can only be generalized to “people like the ones in the study”
cause and effect
if the researchers randomly assigned the subjects to treatment groups, you can make conclusions about cause and effect.
if the researchers did not randomly assign subjects to treatment groups, you cannot say that the explanatory variable caused the change in the response variable.
stratified random sample
a simple random sample selected from the division of a population into separate groups (strata) based on shared attributes or characteristics (homogeneous grouping) within each stratum
simple random sample (SRS)
a sample in which every group of a given size has an equal chance of being chosen
systematic random sample
sample members from a population are selected according to a random starting point and a fixed, periodic interval
confounding variable
related to the explanatory variable and influences the response variable and makes it challenging to determine cause and effect
completely randomized design
treatments are assigned to experimental units completely at random. random assignment tends to create roughly equivalent groups, so that differences in responses can be attributed to the treatments
a well designed experiment should include
comparisons of at least 2 treatment groups, one of which could be a control group
random assignment of treatments to experimental units
replication enough experimental units in each treatment group to be able to detect a difference
control of potential confounding variables, where appropriate
matched pairs design
a special case of randomized block design, using a blocking variable, subjects are arranged in pairs matched on one or more relevant factors, every pair receives both treatments by randomly assigning one treatment to one member of the pair and subsequently assigning the remaining treatment to the second member of the pair, or alternatively, each subject may get both treatments.
randomized block design
treatments are assigned completely at random within each block. for each block, individuals are similar to each other with respect to at least one blocking variable in order to reduce variability of results within each treatment group and to eliminate the possibility of the blocking variable as a confounding variable
mean and standard deviation of a binomial distribution
μx = np
σX = √np(1-p)
n is the number of trials, p is the probability of success
mean and standard deviation of a geometric distribution
μx = 1/p
σX = √1-pc / p
formula for the binomial probability P(X = x)
P(X = x) = nCx px (1-p)n-x
n is the number of trials, p is the probability of success, x is the number of successes
formula for the geometric probability P(X = x)
P(X = x) = (1-p)x-1p
conditions for a binomial random variable
binary: two outcomes for each trial (success or failure)
independent: each trial is independent of the next
number of trials is a fixed number n
same probability of success for each trial p
conditions for a geometric random variable
binary: two outcomes for each trial (success or failure)
independent: each trial is independent of the next
trials until a success (not a fixed number)
same probability of success for each trial p
probability of “at least 1”
P(At least 1) = 1 - P(none)
law of large numbers
simulated (empirical) probabilities tend to get closer to the true probability as the number of trials increases
how can i tell if two events are independent?
P(A | B) = P(A)
P(B | A) = P(B)
P(A and B) = P(A) * P(B)
calculating conditional probability
P(A | B) = P(A and B) / P(B)
mean and standard deviation of the sum of 2 independent random variables
μT = μX + μY
σT = √σx2 + σY2