1/80
final practice
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Chi-Square:
Categorical predictor (X) and categorical outcome (Y)
Observed frequencies:
the actual counts in each category from the sample
Expected frequencies
What the counts should be if there is no relationship between predictor and outcome (null hypothesis)
T-Tests / ANOVA
Categorical predictor (X) and continuous outcome (Y)
Correlation / Regression
Continuous predictor (X) and continuous outcome (Y)
Correlation
Assesses how changes in X consistently predict changes in Y
(slope)
Regression coefficient (b1)
The expected change in Y for each one-unit increase in X
The Replication Crisis
Science often relies on confirmatory research, which may increase the chance of false results
Exploratory research
Hypothesis generating, often using data contingent on findings: important for discovery
Generating hypotheses after data collection.
Post hoc hypotheses
Data contingent
Hypothesis generating
P-values NOT interpretable
Confirmatory research
A priori hypotheses with independent data and interpretable p-values:
clear test of hypotheses stated in advance
testing the specific hypotheses you made before running the study
Data independent
Hypothesis testing
P-values interpretable
Underreporting
Reporting only DVs that support the hypothesis, increasing false positive rate (familywise error rate). Misrepresents exploration research as confirmatory
Familywise error rate (FWER)
Probability of false alarms when performing multiple tests.
P-hacking
Researchers make decisions during analysis to get statistically significant results
HARKing (Hypothesizing After the Results are Known)
Presenting exploratory findings as confirmatory: (Don't pretend it was a hypothesis you had in advance)
Transparent methods:
Researchers disclose study details, including sample size determination, exclusions, manipulations, and measures
Transparent results
Publicly sharing data, how it was processed, and computed
Preregistration:
Documenting research plan in advance to prevent HARKing
Registered reports:
Submitting study plans before data collection for conditional acceptance
Factorial Between-subjects design
All factors are manipulated between participants. There are different groups. Each participant does just ONE condition or level, not all, of the IVs.
Factorial Within-subjects design
All factors are manipulated within participants. Each participant is exposed to ALL conditions or levels of IV. Reduces variability, and thus, type ii error (not finding a significant effect). It increases statistical power.
Central Limit Theorem (CLT)
The distribution of sample means approaches normality as sample size increases (n ≥ 30)
Simple linear regression
A procedure for finding the best fitting straight line for a set using a single predictor variable
Slope x score on effect AKA known point on graph (B1)
The grand linear model (GLM) takes a continuous outcome (DV) and partitions each score into several components:
grand mean, treatment offsets, and residuals (difference between observed and predicted scores)
Degrees of Freedom
When n deviations from the sample mean are used to estimate variability in the population, only n-1 are free to vary (based of restriction that the sum of all deviations must equal zero) [underestimates variability in pop]
z-test
pop standard deviation (sigma) is known—Representation of a scores deviation from a mean in terms of standard deviation
t-test
difference between the sample mean and the pop mean (sampling error) divided by the standard errors of the mean (the standard deviation of a sampling distribution of sample means)
Cohen’s D
Measure of effect size for difference between two means
Expresses difference between means in standard deviation units
Type I error
No effect is present but researcher rejects the null
“False alarm” or “alpha error”
Factors affecting error rate: alpha level
Type II error
A real effect is present but a researcher fails to reject the null (said null was not true)
Not sufficient evidence to reject the null
What is a p-value?
the probability of observing a given test stat or something more extreme if the null hypothesis is true.
Alpha
the probability value that we use to determine which sample outcomes are considered very unlikely if the null hypothesis is true (size of the critical region)
Critical region
The region of the sampling distribution that contains the sample outcomes that are considered very unlikely if the null is true (the little tails)
Critical value
the values that define the boundaries of the critical regions
Depend on: what alpha is and whether the test is one or two tailed
Null hypothesis statistical testing (NHST)
The logic of hypothesis testing
Establishing a standard of evidence (i.e. a decision rule)
Computing a test statistic and evaluating by p-value
The empirical rule
tells us approximately 99% of the normal distribution should fall between ±3 standard deviations of the mean
Random sampling
sampling method in which you identify all the members of your population and select a random subset
Question of external validity
Random assignment
researchers place participants into groups by random
Question of internal validity
Convenience
recruit participants who are easily accessible
Your sample can be different from other groups/samples
Most common
non-probability
Quota
recruiting a specific number of participants from each groups
Non-random selection method
Similar to stratified sampling but quota is NOT random
non-probability
Snowball
asking participants to ask other people to ask them to join
Participants help recruit each other
non-probability
Purposive
select participants on the basis of some characteristics that they share
Non-random selection method
non-probability
Simple random
equal chance of being selected (probability)
cluster sampling
identify pre-existing clusters–randomly select some of the clusters, sample everyone in the selected cluster (some of the groups) (probability)
systematic sampling
(probability) put subjects in order, pick a random starting point, then choose every nth person
Ordinal to some degree (characteristic of the sample that is:) (shortest to tallest, take every 5th person, and circle around the group to get the sample)
stratified random
identify pre-existing groups, sample all of the groups equally
Guarantee sample has equal number (some from ALL the groups, each one)
pick one m&m of each color
Proportionate stratified random
identify pre-existing groups, sample all of the groups proportionately
5% of some ppl in a category
Probability sampling
every individual population is identified–you have a way of knowing who is in the pop or not
No one has a zero chance of being selected (non-zero probability)
Selection is random
Nonprobability sampling
Sampling in which one or more of the three above criteria are not met
A researcher is interested in conducting a study on roommate relationships at UCSD. Her study involves in-person interviews with roommate pairs/groups, which is relatively costly and time-intensive. To ensure her sample is representative of UCSD students who live on campus, she randomly selects 10 dorms and recruits all of the students in those dorms for her study. What kind of sampling method has she used?
Cluster
A researcher is investigating the relationship between extraversion and social conformity. Subjects come into the lab and complete a personality assessment. Each subject is placed into one of three groups based on the assessment scores: extraverted, introverted, or neutral (neither extraverted nor introverted). All participants then complete a social conformity task in which they are asked to provide judgments of line lengths in the context of confederates who unanimously give incorrect answers. She finds that the extraverted group displays more social conformity than the introverted or neutral groups. Which research strategy has been employed in this study?
Correlational
We know that IQ is a variable that is normally distributed in a population, w a mean of 100 and a standard deviation of 15. Use the empirical rule to estimate what proportion of individuals in the population we would expect to have an IQ higher than 130.
2.5%
Why do we try to falsify the null hypothesis rather than support the alternative hypothesis?
We know what the population parameters would be under the null, but we don’t know precisely what they would be under the alternative
What is the relationship between alpha and the critical region?
Alpha is the size of the critical region
Which of the following statements about alpha and p is TRUE?
Alpha and p are both areas under the curve of the sampling distribution
If we run a hypothesis test with an alpha level of 0.05, what is the probability that we would make a type II error if the null hypothesis is true?
0%
A drug and alcohol researcher is interested in studying the effects of alcohol on learning ability of college seniors. She randomly assigns 10 students to an “alcohol group” and another 10 students to a control group. The students in the alcohol group all receive 8 oz of alcohol prior to being tested. Then all of the students are run through a learning assessment and the number of errors is recorded and compared between conditions.
Which t-test is appropriate?
independent t-test
A researcher wants to know whether a presentation on the health benefits of Brussels sprouts would change people’s attitude towards Brussels sprouts. She recruited 84 undergraduates. Before the intervention, she asked participants to rate their attitude towards Brussels sprouts. After the presentation, they rated their attitudes again. Which t-test is appropriate?
dependent t-test
What is the difference between a treatment offset and a group mean?
A treatment offset is what you add to the grand mean to get the group mean
What is the difference between a predicted score and a group mean?
A predicted score is the same thing as a group mean (no difference) data= model (predicted score) + error (residual error)
Repeated measures designs increase our power to detect a significant effect of condition because including person as a factor in the model _____
decreases the mean square error
If the null hypothesis in a one-way ANOVA is that all of the treatment offsets are equal (and equal to zero), what could we say about the groups if we reject the null hypothesis?
At least one of the groups has a different mean from the others
Prof ellis wants to interview undergraduate students to examine how increases in tuition shape students' mental health. He obtains his sample of students from his psych class that quarter. What kind of sampling?
convenience sample
You are doing research on a hospital personnel–orderliers, technicians, nurses, and doctors. You want a probability sample w cases in each of the personnel categories. What strategy would you use?
stratified sampling
Robert is interested in the effects of age and emotion on memory. He conducts a study in which he manipulates participants’ emotion: participants complete a memory task once after being induced to feel happy and once after being induced to feel sad. Participants are either children (ages 8-15) or young adults (ages 18-25). He measures each subject’s memory using a recognition test paradigm. What kind of design has Robert used?
2×2 mixed design—this is a mixed design given the fact that not all participants did the same tasks--there are two factors: age and emotion. each has two levels, age, two levels of age and emotion, happy or sad. for age, it is between subjects, emotion is within subjects (everyone did the task)
Karisa is interested in the effect of study music on memory. In her study, all participants studied a list of words and later took a recognition test. Karisa manipulated the kind of music participants listened to while they studied: either rap or metal, and either with or without lyrics. Each participant heard only one type of music. Which type of analysis is appropriate for this design?
Factorial ANOVA (between subjects) There are no repeated measures, no one does more than one condition.
Kaiqi is interested in how studying for the GRE in a foreign language can be improved. She manipulates whether participants study new vocabulary by seeing definitions or contextual sentences, as well as whether those definitions/sentences are presented in the native language (Chinese) or the language of the test (English). All participants study 5 new GRE words in each of the 4 conditions and are tested on all 20 words using real GRE test items. Which type of analysis is appropriate for this design?
Factorial repeated measures analysis
Why is it that repeated measures designs have more statistical power than between-subjects designs?
Because the residual (unexplained) error is smaller in a repeated measures design
A main effect of an IV can also be referred to as an overall effect of an IV.
true
A factorial design cannot have more than three independent variables
false
An interaction can be described as a difference in differences
true
A cell in a factorial design table acts as a unique condition in factorial designs
true
If a design has no interaction, there will be no main effects either.
false
If a design has no main effects, there should be no interaction
false
A researcher conducts a study in which participants played a “dictator” game, in which they were given points and asked how many they wished to allocate to a partner (a stranger) who had no points. Participants did not gain anything by allocating points to the partner, and in fact, giving points to this partner would hurt the participants’ chances of winning the game. The researchers hypothesized that participants’ altruism (operationalized as the points they allocated to the partner) would be related to their social class. To test this hypothesis, the researchers collected information about each participant’s yearly household income (in dollars), in order to determine whether social class was predictive of altruism. What analysis is appropriate for this hypothesis test?
correlation/regression (both variables are continuous)
A researcher conducts an experiment investigating what types of activities cause boredom. She randomly assigns subjects to either read a dictionary for one hour or do a series of simple math problems for one hour. (Each subject does only one task.) The subject is then asked a simple yes/no question: “Would you rather have done nothing at all for an hour than do the task you just completed?” The researcher predicts more people will answer “yes” after doing math problems than after reading the dictionary. Which statistical test must the researcher use to test her hypothesis?
This is chi-square because there are two categorical outcomes: dictionary/math and yes/no.
A researcher conducts a study in which subjects complete a difficult search task in which they attempt to locate an X hidden among many Y’s in a grid on a computer screen. Each subject completes this task under two “color” conditions: one in which the X is the same color as the Y’s (e.g., both black) and one in which the X is a different color than the Y’s (e.g., the X is green and the Y’s are blue). The dependent variable is how long (in seconds) it takes each subject to locate the X in the two different “color” conditions. The researcher conducts a hypothesis test to determine if subjects are faster to locate the X in the “different color” condition than in the “same color” condition. Which analysis is appropriate for this hypothesis test?
This is a t-test because the predictor is categorical and the outcome is continuous.
The family wise error rate for your study is 0.4. What could you do to reduce this value?
Adjust your alpha level to account for multiple comparisons
Which of the following does not contribute to the file drawer problem?
A scientist pre-registers their experiment and makes it public regardless of the outcome
a researcher runs an experiment testing the effect of exercise on subjective well-being. Participants are randomly assigned to complete 0, 2, or 4 hours per week of any exercise of their choice for the span of one month. At the end of the monthfall participants complete a subjective well-being survey. Which analysis would be appropriate to determine whether exercise affects well-being in this study?
between subjects anova
a researcher wants to know whether the average commute time in San Diego is different from the national average commute time (perhaps because San Diego traffic is bad or because it's expensive to live close to major job centers). He surveys 1,000 people in the San Diego area and conducts a hypothesis test to determine whether the mean commute time in San Diego is longer than the national average commute time of 25.4 minutes. Which analysis is appropriate for this testing this hypothesis?
one sample t-test
A researcher is studying the effects of visual imagery on memory for word pairs. She takes a sample of 40 people, and randomly assigns 20/to the imagery condition and 20 to the no-imagery condition. The subjects in the imagery condition are told to imagine a visual image of each word pair during study, while the no-imagery condition is just told to study the list of word pairs. Both groups study the list for 5 minutes, and then after a 10-minute filler task are asked to recall as many of the word pairs as possible. The dependent variable in this study is the number of word pairs correctly recalled. Which analysis is appropriate to compare the groups and test whether there is a significant effect of visual imagery on memory?
independent sample t test