1/272
lecture 1-17
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
what two main questions can we answer about a population mean (mu) using a sample
estimation: what is the plausible range of values for mu? - this is answered with a confidence interval
hypothesis testing: could mu be a specific, particular value? - this is answered with a hypothesis test
what is the sampling distribution of the mean (y-bar)?
It is the theoretical distribution of sample means (y-bar) that would be obtained from taking an infinite number of samples of the same size (n) from a population
what is the formula for the standard deviation of the sampling distribution of the mean
sigma/sqrt(n)
where sigma is the true population standard deviation
what is the standard error of the mean SE_y-bar?
It is the estimate of the standard deviation of the sampling distribution of the mean. It is calculated using the sample standard deviation (s)
SE_Y-bar = s / sqrt(n)
When is the sampling distribution of the mean (Y-bar) normally distributed?
When the variable (Y) is normally distributed in the population
When the sample size (n) is large, even if the population is not normally distributed
What is the Central Limit Theorem?
The theorem states that the sum or mean of a large number of random measurements sampled from any population is approximately normally distributed
what is the correct interpretation of a 95% confidence interval?
If we were to take many samples and calculate 95% CI for each one, 95% of those intervals would contain the true population mean (mu)
what is the 2SE rule of thumb?
It is a rough approximation for a 95% confidence interval calculated as Y-bar ± 2SE_Y-bar
If we know the population standard deviation (sigma), what is the formula for a 95% CI?
Y-bar ± 1.96(sigma/sqrt(n))
When we don’t know the population standard deviation (sigma) what do we use instead?
We use the sample standard deviation (s) as an estimate
What is the general formula for a confidence interval for the mean (mu) using the t-distribution?
Y-bar ± t_a(2),df * (s/sqrt(n))
aka
Y-bar ± t_a(2),df * SE_Y-bar
what is the t-statistic and why do we use it?
The t-statistic is t = (Y-bar - mu) / (s/sqrt(n))
we use it instead of the Z-statistic when we do not know the population standard deviation (sigma) and have to estimate it with the sample standard deviation (s).
what distribution does the t-statistic follow?
The students’s t distribution
how does the t-distribution compare to the Z-distribution (standard normal)?
The t-distribution has “fatter tails” (higher probabilities at the tails) to account for the extra uncertainty from using s to estimate sigma
what parameter defines the shape of a specific t-distribution?
The degrees of freedom (df)
How are degrees of freedom calculated for a one-sample confidence interval or t-test?
df = n - 1, where n is the sample size
In the notation t_a(2),df what does alpha represent for a confidence interval?
alpha = 1 - confidence level
for 95% CI, alpha = 0.05
for 99% CI, alpha = 0.01
A sample of n = 8 snakes has Y-bar = 1.375 and s = 0.324. What are the degrees of freedom (df)?
df = n - 1 = 8 - 1 = 7
For n = 8, Y-bar = 1.375, s = 0.324. What is the Standard Error (SE_Y-bar)?
SE_Y-bar = s / sqrt(n) = 0.324 / sqrt(8) = 0.115
what are the four steps of hypothesis testing?
state hypotheses
calculate test statistic
compare to critical value
draw conclusions
what is the purpose of a one-sample t-test?
It compares the mean (Y-bar) of a random sample from a normal population to a hypothesized population mean mu_0
What are the null and alternative hypotheses for a one-sample t-test?
H_0: mu = mu_0 (the population mean is equal to the hypothesized value)
H_A: mu ≠ mu_0 (the population mean is not equal to the hypothesized value)
What is the formula for the one-sample t-test statistic?
t = (Y-bar - mu_0) / (s/sqrt(n)) or
t = (Y-bar - mu_0) / SE_Y-bar
what are the assumptions of one-sample t-test?
The variable is normally distributed
The sample is a random sample
We test H_0: mu = 98.6 degrees F. Our sample data is n = 24, Y-bar = 98,28, s = 0.940. Calculate the t-statistic.
t = (Y-bar - mu_0) / (s / sqrt(n)) = (98.28 - 98.6) / (0.940 / sqrt(24)) = -1.67.
The critical value for alpha=0.05 and df=23 is +/- 2.07. Our t-statistic is -1.67. Do we reject H_0
No. The calculated t-statistic (-1.67) is not more extreme than the critical value (-2.07). It falls in the non-rejection region, so P > 0.05. We cannot reject the null hypothesis
The df=129 and the critical value is +/- 1.98. Our t-statistic is -5.44. Do we reject H_0?
The calculated t-statistic (-5.44) is much more extreme than the critical value (-1.98). We reject the null hypothesis and conclude the mean body temperature is not 98.6 degrees F.
what does it mean for a statistic test to be robust?
A method is robust if the answer it gives is not sensitive to modest departures from its assumptions
which t-test is considered robust?
A one-sample t-test (and by extension, the paired t-test)
Why is the central limit theorem important for robustness?
It is one of the main reasons many of our statistical tests (like t-tests) are considered robust, as they rely on the assumption of normality, which the CLT helps satisfy for sample means.
what is the main difference between a paired design and a 2-sample design?
In a paired design, each data point in one group has a direct, one-to-one correspondence with a data point in the other group. In a 2-sample design, the two groups are independent.
What is the benefit of a paired design?
It allows us to account for a lot of extraneous variation, because each member of a pair shares much in common (eg. the same person, same plot of land, same twin)
what are some examples of a paired design?
measuring something before and after a stimulus on the same object
applying a treatment to one arm and a placebo to the other arm of the same person
using identical twins, with one getting a treatment and one not
splitting a plot of land in half, fertilizing one side and not the other
comparing water quality “upstream” and “downstream” from the same power plant
how does a paired t-test work?
It first calculates the difference (d) for each pair. Then, It performs a one-sample t-test on that single list of differences
What are the hypotheses for a paired t-test (when testing for any difference)?
H_0: The mean difference is zero (mu_d = 0)
H_A: The mean difference is not zero (mu_d ≠ 0)
what is the formula for the t-statistic in a paired t-test?
t = (d-bar - 0)/ (s_d / sqrt(n))
where “d-bar” is the mean of the differences, “s_d” is the standard deviation of the differences and “n” is the number of pairs
How are degrees of freedom (df) calculated for a paired t-test?
df = n-1, where n is the number of pairs
what are the assumptions of a paired t-test?
The pairs are sampled independently and randomly
The differences (d) are normally distributed. (The individual measurements do not have to be)
What is the goal of a two-sample (unpaired) t-test?
To compare the means of a numerical variable for two independent groups.
What is the parameter of interest in a two-sample test?
The difference between the two population means (mu_1 - mu_2)
what is our estimate the difference between population means?
The difference between our sample means (Y-bar_1 - Y-bar_2)
What is “pooled variance” (s_p²) and why is it used?
It is a weighted average of the variances from the two sample. We use it when we assume both populations have the same variance.
What is the formula for pooled variance (s_p²)?
s_p² = (df_1 * s_1² + df_2 * s_2²) / (df_1 + df_2)
where df_1 = n_1 - 1 and df_2 = n_2 - 1
what is the formula for the standard error of the difference between two means?
SE = sqrt( s_p^2 / n_1 + s_p^2 / n_2 )
How are the degrees of freedom (df) calculated for a two-sample t-test?
df = df_1 + df_2 = (n_1 - 1) + (n_2 - 1) = n_1 + n_2 - 2
What are the hypotheses for a two-sample t-test (when testing for any difference)?
H_0: The population means are equal
(mu_1 = mu_2), or (mu_1 - mu_2 = 0).
H_A: The population means are not equal
(mu_1 != mu_2), or (mu_1 - mu_2 != 0).
What is the formula for the two-sample t-test statistic?
t = ( (Y-bar_1 - Y-bar_2) - 0 ) / SE
where SE is the standard error of the difference
What are the assumptions of a two-sample t-test?
both samples are random samples
both populations have normal distributions
the variance of both populations is equal
What is a common wrong way to compare the means of two groups?
Concluding that Group 1 is significantly different from a value, but Group2 is not, therefore Group 1 and Group 2 are different. This is a common logic error/
when visually comparing 95% CIs for two means, when can you be sure they are significantly different?
when the 95% CIs do not overlap at all.
When visually comparing 95% CIs or two means, when can you be sure they are NOT significantly different?
When the 95% CI for one group overlaps the point estimate (the mean) of the other group
When visually comparing the 95% CIs, what does it mean if the CIs overlap but do not cross each other’s point estimate?
The difference in means is unknown. You must perform a formal hypothesis test (like a 2-sample t-test) to be sure.
What are the hypotheses for a test comparing the variances of two groups?
H_0: sigma_1² = sigma_2² (The population variances are equal)
H_A: sigma_1² ≠ sigma_2² (The population variances are not equal)
What is Levene’s test used for?
It compares the variances of two or more groups.
In the R command leveneTest(data = D, Y ~ X, center = mean), which variable is numerical and which is categorical?
The convention is numerical ~ categorical. So, Y is the numerical variable and X is the categorical (group) variable.
An R output for Levene’s test shows a p-value (“Pr(>F)”) of 0.4897/ How do you interpret this?
The p-value (0.4897) is greater than 0.05. Therefore, we fail to reject the H_0 that the population variances are equal.
Why is Levene’s test preferred over the F-test for comparing variances?
The F-test is not robust. It is very sensitive to its assumption that both distributions are normal, whereas Levene’s test is more reliable.
What are the three main assumptions for t-tests?
The sample(s) are random
The population are normally distributed
(for 2-sample t-tests) the populations have equal variances.
What are three ways to detect deviations from normality?
Histograms
Quantile plots (QQ plots)
The Shapiro-Wilk test
How do you interpret a normal quantile plot (QQ plot)?
If the points fall on a straight line, it indicates the data fits the normal distribution. Points that curve away from the line indicate a lack of fit.
On a normal QQ plot, what does a positively (right) skewed distribution look like?
The points curve up and away from the line at the high end.
On a normal QQ plot, what does a negatively (left) skewed distribution look like?
The points curve down and away from the line at the low end.
On a normal QQ plot, what does a symmetric with fat tails distribution look like?
The points curve up at the high end and down at the low end (forming a slight S-shape)
What is a Shapiro-Wilk test used for?
To test statistically whether a set of data comes from a normal distribution.
What are the H_0 and H_A for a Shapiro-Wilk test?
H_0: The data are from a normal distribution.
H_A: The data are not from a normal distribution.
What are the options, in rough order, when statistical assumptions are violated?
Ignore: If sample sizes are large, the tests are often robust
Transformations: (eg. log-transform)
Permutation tests
bootstrapping
non-parametric tests
When can we often ignore violations of normality?
When sample sizes are large. The Central Limit Theorem (CLT) states that the means of large samples are normally distributed, even if the underlying data is not.
What test is ideal if you have unequal variance between two groups?
Welch’s t-test
What is a data transformation?
It changes each data point by some simple mathematical formula, applying the same formula to every individual.
what is the log-transformation (eg. Y’ = ln[Y]) especially good at fixing?
Data that is skewed to the right.
What are three signs that a log-transformation might be useful?
The frequency distribution is skewed to the right
The variance seems to increase as the mean gets larger (when comparing groups)
The variable is the result of multiplying or dividing other components
What is the correct way to work with log-transformed data?
First, transform each individual data point (eg. ln(Y)). Then, calculate the mean, SD, and CI using those new transformed values.
What is the wrong way to find the mean of log-transformed data?
To calculate the mean of the original data first, and then take the log of that mean (eg. ln(Y-bar)). The log of the mean does not equal the mean of the log values
What R function calculates the natural log (base e)?
The log() function. It can be applied to a simgle number or a whole vector.
what is the “flaw of averages”
The idea that designing for the “average” often means designing for no one. This mean is a useful prediction but the probability of observing the mean value for multiple variables all at the same time is extremely low.
What does the phrase “Correlation does not imply causation” mean?
Just because two variables are associated (correlated) does not mean that one causes the other.
What is a spurious correlation?
A strong correlation that appears in a dataset purely by chance, like the correlation between Nicolas Cage films and swimming pool drownings.
What is data dredging?
The practice of performing many, many statistical tests on a dataset until you fins a relationship that appears significant, even though it is just a random match
What is a confounding variable?
An unmeasured, “sneaky third” variable that is the actual cause of both of the variables you are examining, making them look correlated when they are not directly related.
What is the classic example of a confounding variable?
Shark attacks and ice cream sales are highly correlated. The confounding variable is Temperature (or season); hot weather causes both more people to swim (rising shark attacks) and more people to buy ice cream
What is the difference between an observational and experimental study?
experimental: The researcher assigns treatments randomly to individuals.
observational: The researcher does not assign the treatments; they only observe.
What is the main benefit of an experimental study?
random assignment averages out the effects of confounding variables, making it easier to determine a causal relationship.
what are the two main goals of experimental design?
reduce bias
reduce sampling error (which increases precision and power)
what are the three main design features that reduce bias?
controls
random assignment
blinding
what is a control group?
A group that is identical to the experimental group in all respects except for the treatment itself
what is a placebo?
A sham treatment given to a control group. It helps account for the placebo effect, where patients improve simply because they believe they are receiving treatment
Why is a control group important for “independent recovery”?
People often seek treatments when they feel their worst, and may improve naturally over time. A control group shows what would have happened without the treatment, providing a baseline for comparison.
What is blinding?
The process of preventing the participant (single blind) or both the participant and the researcher (double-blind) from knowing which treatment is being administered.
Why is blinding important?
Unblinded studies often find much larger (and likely unbiased) effects, suggesting that knowing who go t the treatment can influence the results.
What are the three main design features that reduce sampling error?
replication
balance
blocking
what is replication?
carrying out the study on multiple independent units.
If you put 4 plants in a “control” chamber and 4 plants in a “treatment” chamber, what is your true sample size (n) for each group?
n = 1 for each group. The chamber is the independent unit, not the plants. The 4 plants in each chamber are pseudo-replicates.
What is balance in experimental design?
Having nearly equal sample size (n1 = n2) in each treatment group.
Why is a balanced design more powerful?
for a fixed total sample size, the standard error is smallest (and precision is highest) when the sample sizes in each group are equal.
What is blocking?
Grouping similar experimental units together before randomly applying treatments. For example, grouping patients by “hospital” or plots of land by “field”
How does blocking increase precision?
It accounts for variation between the blocks. This removes that variation from the “noice” (error), making it easier to see the “signal” (the treatment effect).
what statistical test is used to compare the means of two groups?
two-sample t-test
what is the null hypothesis (H0) for a two-sample t-test?
H0: mu1 = mu2 (the means od the two populations are equal)
What is the problem with conducting multiple t-test if you have more than two groups (eg. 4 groups)?
The probability of making at least one Type I error (a false positive) becomes much greater than the significance level (alpha) you set for a single test
If you have 4 groups, how many uniquw pairwise comparisons are possible?
The formula is (4-choose-2) = 6)