1/116
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Central limit theorem for sample mean
When we collect sufficiently large sample of n independent observations from a population with mean and a standard deviation, the sampling distribution of will be nearly normal with mean and standard error
Two conditions for central limit theorem
Independent observations (random sample, <10% of pop)
Normality: if n< 30 and no clear outliers, we assume, the sample came from a nearly normal population
f n ≥ 30 and there are no extreme outliers, then we
assume the sampling distribution is nearly normal
To test for mean we use
T distribution
What is t distribution used for
used for large and small samples because p-values are determined for each sample size using degrees of freedom (aka based on sample size)
Properties of t distribution
bell shaped
Symmetric so t is + or -
Mean =0, s >1
Less peaked
Different curve for each degree of freedom
Hypothesis test: one mean t-test
H0: mean =#
HA: mean =/ #
Conditions for hypothesis test: one mean t-test
Independent observations
approximately normal
Areas on t table get ____ as you move right
smaller
Paired data
each observation in one set is related to or
corresponds with exactly one observation in
the second sample
Example of pairs data
Before/after data (SAT before and after)
To analyze paired data, look at the
Difference in outcomes of each pair of observations
Hypotheses for mean of differences
H0=average difference (before-after) = 0
Ha= average difference(before-after) =/0
Paired t-test aka
mean of differences
Conditions for paired data mean test
Independent observations
Normal distribution
What line on test 3 formula sheet is for paired data
3
If you can’t find degrees of freedom in paired mean problem
Go to nearest number that is SMALLER (ex: if df=199, you go to 150 NOT 200)
Interpretation of inference alt vs null claim
Alt: “support”
Null: “reject”
Type 1 vs Type 2 Error
If CI is + or -, mean difference is
Higher/lower
Mean of differences
From dependent or matched pair samples (aka paired t-test)
difference of two means
From independent samples
Look at means of both samples
Mean 1= mean 2 (mean 1-mean 2=0)
Can difference of two means have different sample sizes? What about mean of differences?
Yes
No
Conditions for 2 means
Both sample approx normal
Independent observations within the samples
Independent observations BETWEN SAMPLES, meaning two samples are not related
Which degrees of freedom do you use for two means
Smaller N-1
How do you recognize two means
Two of everything: SD, mean, samples
ANOVA
Analysis of variance
Used to compare means from two or more groups by using variance
Variance
Measures spread of data
Between vs within in anova formula is
On the top vs on the bottom
Variability within groups in anova is also known as
Sampling error
Variability between samples is
variability within samples is
Due to different treatments
Due to regular sampling error
F distribution is for
ANOVA
F distribution is always
Right skewed
Large F statistic
Higher variability between each group
Repeated measures
Use the same group of subjects with each treatment
One way ANOVA
Samples are compared using one category/factor/characteristic
Conditions for ANOVA
Observations should be independent within groups
Observations should be independent between routes
The observations within each group should be nearly normal
Variances across groups should be about equal
Post hoc
After the fact
ANOVA table: Row 1
Variability between sections
ANOVA table: Row 2
Residuals (aka error aka WITHIN)
K in Anova
Number of categories
Degrees of freedom of sections for ANOVA
K-1
Degrees of freedom in residuals in Anova
N-K
Sum sq
Sum of squares
How to find sum of squares in ANOVA
Difference, square, add up (WHAT THOUGH)
Mean sq
Subtract every grade from mean square and add up
Get by divide sum sq by df
Pr(>+F)
Probability of getting f or more extreme aka P VALUE
Power of the test
Probability of correctly rejecting the null (1-beta)
To increase power of a test
Increase sample size
Increase alpha (but type 1 more likely)
Decrease standard deviation
sampling distribution would be
the mean and SD of several means
formal definition of sampling distribution
distribution of a statistic across an infinite number of samples
SE of sample means
SD/sq root of sample size
population vs sampling distribution
population: more spread out because larger SE
sampling distribution: less spread out
most frequently used test statistic
t
t is more accurate for
small samples
what parameter is needed for t distribution
standard deviation
as degrees of freedom increase, the t distribution
approaches standard normal (aka peak goes higher)
unique aspect of tails of t distributions
tails are thicker, meaning more observations are more likely to fall beyond 2 SD from mean
t score and z scores are similar in that
both measure spread on one
how to find estimated p value
look on df row
for your test statistic, t.s., then go
straight up to the top for p-value for interval like chi square
interpreting inference format
there is not enough/enough evidence to support (alt)/reject (null) the claim that _____________. Then compare different.
interpretation of p value
if the mean ___ is ___,
the probability of getting our sample mean of _____ or more extreme is less than [pvalue], which is highly unlikely/likely.
interpreting confidence interval
we are % confidence that the true population mean is between _____ and _______
type 1 vs type 2 errors
type 1: reject null but you should’ve failed to reject
type 2: fail to reject but you should’ve rejected
how to find sample size of the mean
first formula on formula sheet (always round up)
how to get two sets of data into one
subtract
Parameter for paired t test
Average difference
between the reading and writing scores of
all high school students.
Point estimate for paired t test
Average difference between
the reading and writing scores of sampled
high school students.
subscript for paired data
d
hypothesis for two means test
H0: Mean 1- Mean 2 =0
HA: Mean 1- Mean 2 =/ 0
example of interpreting beta value (beta =0.15)
if beta = 0.15, the probability of failing to reject the null hypothesis when you should have rejected it is 0.15
the power would then be 1-0.15=0.85 so probability of correctly rejecting the false null hypothesis is .85
rejecting null means you found a difference so
if you tested hypothesis 100
times, [power %] of the tests would
detect the difference
hypothesis test for ANOVA
H0: μ1 = μ2 = ... = μk
HA: At least one mean is different
how to get f in ANOVA
mean sq between/mean sq within
how to set up post hoc tests
find Bonferroni alpha (a*) => used as new alpha for comparison
steps for post hoc tests
get means and populations for each option
plug into formula with whichever two data you are comparing (have to do for all combinations!)
how to get sample error from anova table?
mean sq from residual
overall anova conclusion
some combinations will work and some may not so write that in conclusion and make inference based on data
what is linear regression?
Determines if a linear relationship exists
If one numerical variable can predict another
can linear regression show you causation
no
how to model relationship between two variables
linear regression line w/ points in (x,y) form
x vs y in linear regression
x= explanatory variable
y= response variable
points in linear regression model
usually do not fall exactly on line, instead cluster around the line
why is regression line called a regression line
Regress means to move
towards a previous or less
formed state, so the
points would all be moving
toward the line in a
perfect world
correlation coefficient
described with R
scale from -1 to 1, with -1 and 1 having the strongest correlation (line) and 0 being the weakest (no correlation)
different names for regression line
line of best fit, least squares line
residuals of error
how far the points are from the line
difference between observed data and predicted data
how to calculate residual
observed – predicted (formula with e)
variables in residual formula (y and y-hat)
y= observed data.
y-hat= predicted data.
Residual > 0
predicted data is an
underestimate.
Residual < 0
the
predicted value is an
overestimate.
The Least Squares Method
square all the residuals/errors, add them
purpose of least squares method
to find the best line of fit
why least square method?
Most commonly used
• Easier to compute
• Highlights the errors
basis formula for line
y=mx+b (m is slope and b is y intercept)
how to find slope for regression line
formula with b1=
how to find intercept for regression line
formula with b0=
interpret the slope
For each additional % _____ in ______, we would
expect the % _______ to decrease/increase (based on positive or negative) on average by ___% (slope)
interpret the intercept
situation when explanatory variable (x) is 0, ___% (intercept) of situation
interpretations of the intercept tend to be
unbelievable