What is the standard error.
The SD of the sampling distribution.
What is the sampling distribution?
The probability of an event based on data from a sample size of a large population, it allows us to compare values and the likelihood of certain events.
What is an alpha level and why is it important?
The probability of finding a significant statistical result when there is actually no real-world effect is equal to the alpha level. The alpha level is important because it helps to control the rate of Type I errors, which occur when you mistakenly reject a true null hypothesis. By setting a predetermined alpha level, you establish a standard for how much evidence is needed to reject the null hypothesis, thus ensuring the reliability of your statistical analysis. It also allows for consistency and comparability across different studies or analyses.
What are Type 1 and Type II errors? How do they relate to rejecting and failing the null hypothesis?
Type I- When the null is correct, but we concluded we should reject the null. (False Positive)
Type II- When the null hypothesis is false, but failed to reject the null.
What are the differences between a within-participants design a between participants design, and a paired-participants design?
Within-participants: dependent measures.
Between-participants: independent measures
paired-participants: If you have different individuals in each group/condition, but the individuals were not randomly assigned, but were carefully chosen so that for each person in condition 1, there is an ‘identical’ person in condition 2.
How are parametric statistics different from the simulate-and-build-the-null-distribution approach we used earlier in the semester?
They use mathematical models to estimate the shape of the null distribution instead of running simulations.
What are the sources of variability in the data from an experiment and why do they matter?
Differences due to the normal everyday differences between and within individuals (random), and the differences that are due to your experimental manipulation.
What is a t statistic? How does it relate to statistical significance?
The t statistic is slightly different, depending on whether an experiment is a between-participants design, or a within-participants design.
The t statistic: a measure of the difference between the conditions/a measure of the underlying variability in the population.
When you perform null hypothesis significance testing with a t statistic, you are performing a "t test"
How do we estimate the population variance and why is the equation for population variance different from the equation for a samples variance?
We must divide by degrees of freedom, the variance of our sample underestimates the variance of our population.
How do you write the full/formal results statement for t tests, and for ANOVAs? You do not have to memorize this for the in class exam, but you should be able to do it for the take-home exam.
F(df, residuals df) = F, P
What are the shortcomings of null hypothesis significance testing?
Significance isn’t everything. A large p value doesn’t tell us the null must be correct, or that the null is probably correct, or how likely the null is at all. It only tells us we can’t rule out the null as a possible origin for our data. A small p value doesn’t tell us if the null is or probably is incorrect. It only tells us that we have rejected the null as a possible origin for our data. P values don’t tell us anything about research hypotheses.
What is the file drawer effect and why does it matter?
There are null results from similar studies sitting in various filing cabinets across the world...all of them suggest that Variable A may not have a causal effect on Variable B... but as soon as one person collects data with p < .05, it's written up and sent off to a journal. We don’t know how many times similar experiments may have been run without generating significant results.. so any given published result likely has a higher probability of being as Type I error than the alpha level cutoff for that published study.
What is p-hacking?
making small (or sometimes large) decisions about your statistical analysis after you've collected [some of] your data, in order to "tweak" your results so that you generate a statistically significant p value
What is effect size and what does it tell us?
Cohen's d (effect size for a t test) is a measure of how far your manipulation of the independent variable shifted the population distribution, relative to the standard deviation of that population.
An effect size of +1 would mean that if you applied your manipulation of the IV to the entire population, you would expect to shift the mean of that population up by one standard deviation
An effect size of d = ~0 is “no effect”
An effect size of d = ~±0.2 is a “small effect”
An effect side of d = ±0.5 is a ‘medium effect”
An effect size of d = ~±0.8 or larger is a “large effect”
What are confidence intervals and what do they tell us?
Confidence intervals help is interpret/understand results. They are a tool for helping us evaluate how confident we should be in a given statistical result. A 95% Cl for an average value means that if you ran the same study 100 times and calculated the average values and confidence intervals for each of those studies, then 95% of the confidence intervals would contain the true population parameter value.
M= 10.6, 95% CI [9.3, 11.9]- Relatively small interval
M= 10.6, 95% CI [-1.4, 22.6]- relatively large interval
The larger the interval the less confident we are in the estimate. The smaller the more confident we are.
How do we move from testing two conditions or groups to testing more that 2 conditions or groups?
ANOVA
What is an ANOVA and how does it differ from a t test?
ANOVA- analysis of variance to measure more than 2 groups, f stat - variance between/ variance within group
What are between- and within- groups variance and why do they matter?
Variance between- how spread means are, means of IV aff Dv, aka height.
Variance within- how spread scores are in individual groups, width of histogram.
How do we model the null hypothesis for between and within participants ANOVAs?
Between- The null typically states that there is no difference in the population means of the groups being compared.
Within- In a within subjects ANOVA the null hypothesis typically states that there is no difference in the population means across different time points or conditions within the same group.
What is the shape of the null distribution for an ANOVA and why is it shaped that way?
Uniondale and positively skewed shape depends on how many conditions and participants in the study.
What does rejecting the null hypothesis for an ANOVA tell us? What doesn’t it tell us?
-Calc p value, if p value is smaller than alpha level then reject.
Tells is at least one of the groups is significant diff from one of the other groups.
Doesn’t tell us where the difference is.
What is a post-hoc test?
When an ANOVA indicated at least one group/condition is sig diff from others. Test to perform pair wise comparison to determine where the diff is that we couldn’t from the anova.
What is the problem with running post-hoc tests and how can we fix those problems?
Running multiple post hoc tests increases the overall probability of a type I error.
What are replication and meta-analysis and how can they solve some problems with the scientific process?
Meta analysis- Comparing results between 2 separate studies, running experiments for a second time to see if the results replicated are extreme enough to reject the null, helps variety results.
What is HARKing?
When researchers notice relationship between variables in data, make hypothesis to explain that relationship and presents that hypothesis as if it were the initial one. Hypothesizing after results known.
What is pre-registration and how can it solve problems with the scientific process?
This helps people know what you did and what happened to achieve these results because a majority of these studies aren’t being published. Mandatory pre-registration helps other people learn from others mistakes and tweak them to make it work.
Last year, over Thanksgiving break, I spent the night in a tent outside of Best Buy, a statistician friend of mine spent the night in front of Wal-Mart, and another colleague waited in the lines outside of Target. When each store opened, we rushed in, clipboards in hand, and stood near the cash registers collecting data about how much money the first forty customers through the line in each store spent. If we wanted to know if there was a difference in spending between the three retailers, what type of analysis should we run on this data and why?
I would set up an Independent measures ANOVA analysis because there are 3 variables involved. The groups conditions are independent of each other.
Two tuba instructors are debating the best way to teach middle school students how to play the tuba.
One believes that starting the students with fundamental scales, arpeggios, and other mechanical exercises will lead to better playing sooner. The other believes that teaching the students using popular tunes and melodies will make learning the instrument easier and is the right way to go. They have come to you for help. They want to set up a study to settle their dispute, but they are not sure where to start.
Would you set this study up as a between-participants design or as a within-participants design? Why?
I would set up a between-participants to compare the effectiveness of teaching tuba to middle school students using fundamental scales vs popular tunes. Each students progress can be directly compared across both teaching methods.
Imagine that you conducted a between-participants experiment where you collected data from 100 participants in each condition. Suppose that the test statistic, the difference between the two group means, was 22, and that the standard deviation for the data you collected was 12. Now imagine that you conducted a second between-participants experiment where you again collected data from 100 participants in each condition. Suppose that, just like in the first experiment, the difference between the two groups' means was 22, but in this case the standard deviation for the data you collected was 18. In which of the two experiments would you be more likely to reject the null hypothesis? Or would you expect similar outcomes for both experiments? Why?
You would be more likely to reject the null hypothesis in the first experiment where the standard deviation is 12, compared to the second experiment where the standard deviation is 18. This is because the data in the first experiment is less variable, making it easier to detect the effect of interest.
What is the p value?
The p value is the probability of getting a result at least as extreme as our experimental result if the null hypothesis were true.
Where do you find the portion of the variability in a ANOVA test results.
You find the N. Convert the number to a percentage.