1/84
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
R²
SSM/SSE for bivariate regression
how to find p value for bivariate regression
pf(f value, df model, df error)
if R² is close to then
model explains the data well
R² for bivariate regression measures
the proportion of variance
SSM/SST
correlation r for bivariate regression
measures the direct linear relationship between X and Y
found by sqrt(R²) and then change the sign on it based off of the slope
σ^e is
estimated standard deviation of the residuals from the line of best fit
sqrtMSE
calculate z score without the sample size
observed-null/sd
when is a z score considered unusual
when values lie more or less than 2 standard deviations from the mean
what is a pooled proportion and when to use it
combines the sample means (x bar) and sample sizes (n) use it when finding SE for a normal distribution comparing a difference in proportions
Standard error
measures how much a sample statistic (like the mean) would vary from sample to sample.
MANY SAMPLES
SE gets smaller when
sample sizes are large
data is less variable
A standard error of 0.050 means
the difference in proportions of about +/ 0.05 is what to expect from random chance variation
Standard deviation (SD)
measures spread of individual data points in a sample or population.
ONE SAMPLE
z score is used for
normal distributions
what does z score tell us
How unusual is this result if the null hypothesis were true
if close to 0, that means the result is very typical under the null
the p value is
how likely is it to see a test statistic this extreme?
is bootstrap or null distributions used for SE
bootstrap! it estimates variability. it conceptualizes uncertainity in our estimate
what are randomization distributions also called and what are they for
null distributions and hypothesis testing
the middle 95% of values fall within
mean ± 2*SE
what type of distribution is used to get p value
randomization or null distributions
a randomization spread tells us
what we would expect if there were no real difference in whatever is being measured
what to look for when comparing bootstrap and randomization distributions
are the graphs shaped and spread similarity
bulk of data lies in the middle 95%
when p value is greater than the significance level
fail to reject the null hypothesis
when p value is smaller than the significance level
reject H0
when looking at residual plots you want to have
no patterns present in spacing and an even spread to the dots
in a residual plot if the dot is above the line it means
the model underestimated the value of the observed
in a residual plot if the dot is below the line it means
the model overestimated the value of the observed
residual is
observed value - predicted value
what can we learn from our residual plots
we can infer if the model is good or not
the model is not a good fit if the residual plot will have weird patterns and curves
what are the 4 steps to hypothesis testing
state the null and alternative hypothesis
calculate the test statistic
find the p value
draw a conclusion
middle line in a boxplot is the
median
the box length spreads from
Q1 to Q3
it represents the middle 50% of the data
if the whisker is longer on the right the distribution is
right-skewed
when multiple boxplots are side-by-side they are used to compare a
quantitative variable
a larger IQR means
The group is more spread out and is less consistent
mean compared to median
less resistant to change so goes in the direction of the skew
50% of the values in a boxplot fall in
the IQR
a curved distribution will have a boxplot with
a LARGE IQR because the middle 50% spreads across a sparse middle region
how would i estimate the p value given a dotplot
locate the test statistic then see how many dots fall to the more extreme
divide that number by the number of dots
r command for a randomization distribution
do(1000) * diffmean(response variable ~ shuffle(explanatory variable), data = YOUR DATA)
diffmean could also be diffprop
diffmean is used when the response variable for a randomization distribution is
quantitative
diffprop is used when the response variable for a randomization distribution is
categorical
to make a histogram of a randomization/null distribution use
gf_histogram(~ diffmean, data = YOUR RANDOMIZATION DATA)
diffmean could also be diffprop
How do you use a confidence interval to estimate the p-value for a hypothesis test?
look to see if the confidence interval includes 0
result is statistically significant at 0.05
when is the result statistically significant at a 95% CI
at 0.05 (two-sided)
if the confidence interval includes 0 then
The p-value is greater than 0.05
how is critical value found with a t distribution
qt(confidence interval, df = n-1)
degrees freedom for critical value computation for one mean
sample size -1
empirical rule
68 (1 sd), 95 (2 sd), 99.7 (3 sd)
how to find percentages in a normal distribution when given specific values
pnorm(#, mean, sd)
how to find a specific value in a normal distribution when given percentile
qnorm(percentile, mean, sd)
how to find p value when given SE
p^ - p0 divided by the SE
gives z score
then z score is plugged into r
pnorm(z score)
when to use one sided vs two sided
if the question points a certain direction then just use one sided
where is the randomization distribution centered?
at the value of the parameter specified in the null hypothesis
where is the bootstrap distribution centered?
at the observed sample statistic
how do i see if two events are independent?
P(A)P(B) = P(A and B)
when matching the boxplot with the ANOVA table look for
sum of squares residual
F
what does the sum of squares residual tell you for ANOVA
difference WITHIN groups
if large, it means the data points are more spread out within groups
larger IQR for a boxplot
A big F for ANOVA means
sees if the differences between group means are significant by comparing the variance between groups to the variance within groups
there is a significant difference between at least one groups means
what does a f value result in?
A low p-value
IQR is
Q3-Q1
pnorm(0.75, mean, sd) - pnorm(0.25, mean, sd)
df for model for ANOVA
#groups - 1
df for residuals for ANOVA
#observations - # of groups
mean sq for groups for ANOVA
SSM/DFM
mean square for residuals ANOVA
SSE/DFE
f value for ANOVA
SSM/SSE
how is pvalue found for anova
1-pf(f value, df1 = ___, df2 = ___)
bayes theorem
P(A|B)= P(B|A) * P(A) / P(B)
sensitivity means
positive given they have it
specificity means
negative given they dont have it
bayes thereom
P(A|B) = P(B|A) * P(A) / P(B)
how to find the denominator for bayes theorem
P(B) = (P(B|A) P(A)) + (P(B|not A) * P(not A))
what does one dot on a bootstrap sample represent
1 bootstrap sample
to approximate a confidence interval using a dot plot of a bootstrap you
count the number of dots
Find the percentage of the dots that are not included in the CI and count them
how to find p value for a hypothesis test
find observed difference
Calculate the test statistic, either t (means) or z (proportions)
then pt() or pnorm()
how to make a bootstrap distribution with R
do (1000) * diffmean(~explanatory variable, data = reshuffle (____))
Q1
(the first quartile) corresponds to the 25th percentile, or the value at which 25% of the data lies at or below this value.
Median in a boxplot
corresponds to the 50th percentile or the middle value, or the value at which 50% of the data lies at or below this value.
Q3
the third quartile) corresponds to the 75th percentile, or the value at which 75% of the data lies at or below this value.
for a normal distribution the mean and median are
the same
z score for a normal distribution
z = x-m/sd
central limit theorem tells us
the sampling distribution will be approximately normal when the sample size it large.
if you know a population sd use
normal (Z) distribution
type one error
rejects a true null
level of significance is
probability of a Type 1 error. It is the probability of rejecting the null hypothesis when the null hypothesis is true.