1/16
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Random Sampling!
Since we are seldom able to measure an entire population, we rely on a random sampling of that population as a representative group from that population. This allows us to infer things about the population, if the sample is truly random. Random sampling refers to the method you use to select participants and ensure that every individual/element in the population has an equal chance of being in the sample; every subject has an equal opportunity for being selected to participate in the study.
A truly random sample would sample without replacement. Sampling without replacement increases the probability of the next person/element being selected. Random sampling enhances external validity because it reduces sampling bias, which increases the likelihood of having a representative sample of the population. In turn, allowing researchers to generalize findings to a broader population with greater confidence and making the sample more likely to reflect the characteristics of the larger group.
Ex: one would consider the population of all the university’s athletes, choose a sample size (N = 30), assign a number to each athlete, and randomly select 30 numbers. You can randomly select the numbers by the lottery method, or drawing numbers out of a hat. In this example we want to assume that participant characteristics are reasonably dispersed to increase representative of the larger population.
In practice, very few research studies use "true" random sampling because it is usually not feasible to ensure that all individuals in the population have an equal chance of being selected. For this reason, it is especially important to avoid using the term "random sample" if your study uses a nonprobability sampling method, such as convenience sampling (ex: a researcher surveying students in their own university class)
Stratified Random Sampling!
Stratified random sampling is a form of random sampling in which a population is divided into distinct, non-overlapping subgroups—called strata—based on a predetermined characteristic. For example, dividing a population into groups based on sex (e.g., male and female), age (e.g., 0-18 year olds, 19-60 years olds,), education (e.g., high school diploma, bachelor’s degree, graduate degree), etc.
This form of random sampling is done when you want to obtain a representative sample that has specific proportions of certain types of people/examinees. This can also be an effective sampling technique for studying how a trend or issue might differ across subgroups.
To conduct stratified sampling, researchers identify the stratification variable (age, gender, etc), identify the sample size needed (N), divide the population into non-overlapping groups/strata, randomly select/sample participants from each stratum, Then, the different samples are weighted to obtain a population estimate that represent the population of interest.
Because individuals within a stratum are typically more similar/homogeneous to each other than to the general population, stratified random sampling reduces sampling variability and increases the precision of population estimates. It also enhances external validity, as it improves the likelihood that findings can be generalized to the broader population.
It is most effective when the stratification variable is meaningfully related to the DV. If the stratification variable has little or no correlation with the DV, the benefits of reduced variance and improved representation may not materialize.
An example of stratified random sampling would be using data from the U.S. census to pre-determine the number of different race and ethnic groups in your study so that you can ensure that your sample is representative of the population in the United States. This type of example is commonly used to ensure that assessments such as the MMPI and WAIS have accurate normative samples that are representative of the U.S.
Random Assignment!
Random assignment is when we assign participants to different groups in an experiment because we want the groups to be an equally random, representative sample of the population. They should not be biased on some characteristic (i.e. age, sex, political belief) and should allocate people to each group in such a way that the probability of each subject appearing in any of the groups is equal. If you randomly assign participants to groups, you assume that personal differences—like intelligence, creativity, or personality traits—will be spread out fairly evenly across groups. For example, if you’re doing a study on art training and some people are naturally more creative than others, random assignment increases the chance that both the experimental group and control group will include a similar mix of highly creative and less creative people. That way, creativity isn't skewing the results. This is important because it helps researchers be confident that any differences in results between the groups were caused by the experimental treatment, not by differences that already existed between the participants. Important for increasing internal validity.
Random sampling refers to how you select individuals from the population to participate in your study and concerns the source of our data (external validity). While random assignment refers to how you place those participants into groups (such as experimental vs. control).
Random sampling increases external validity, while random assignment increases internal validity.
Extra: Non-random assignment
For some research questions, random assignment is not feasible and assignment to groups cannot be carried out randomly, so people are placed in groups in a non-random fashion. In such cases, we need to minimize effects of variables that affect the observed relationship between a causal variable and an outcome. Such variables are commonly called confounds or covariates. The researcher needs to attempt to determine the relevant covariates, measure them adequately, and adjust for their effects either by design or by analysis. If the effects of covariates are adjusted by analysis, the strong assumptions that are made must be explicitly stated and, to the extent possible, tested and justified. Describe methods used to attenuate sources of bias, including plans for minimizing dropouts, noncompliance, and missing data.
One would use non-random assignment when you want one group to consist only of people with a certain characteristic and the other group to consist only of people without that characteristic. For example, testing a cholesterol drug on two groups of people. Group A consists of Caucasian males, group B consists of African American males. You can then see if any of these groups respond statistically differently to the cholesterol drug
Internal Validity!
refers to the extent to which a study can confidently attribute the observed effects in the dependent variable (DV) to the manipulation of the independent variable (IV), rather than to confounding or extraneous variables. When a study has high internal validity, the results can be interpreted as being caused by the IV. In contrast, when internal validity is low, alternative explanations for the findings remain plausible, making the results uninterpretable. Researchers enhance internal validity by using techniques such as random assignment, control groups, replication, and intent-to-treat analyses. These strategies help ensure that the groups being compared are equivalent at baseline and that changes in the DV can be confidently linked to the IV.Three types of validity are typically examined: content, criterion, and construct validity.
Content validity: the degree in which the item contents actually reflect the construct of interest; is the test representative of all aspects of the construct? For example, in the PHQ-9 is it asking all the domains that are included in depression or is it leaving out necessary criteria and topic areas. In order to strengthen content validity, the construct should be well-defined and items generated based on theory and expert judgement, then items statistically examined (e.g., factor analysis) to confirm item content and overlap. Content validity is different from face validity in which the test and construct being measured is discernable to the layperson so they understand what is being measured and feel motivated to complete it.
Criterion validity evaluates how well a test can predict a concrete outcome, or how well the results of your test approximate the results of another test. Is split between both concurrent and predictive validities. Concurrent examines the measure’s score’s relationship to another measure’s score’s taken at the same time while predictive examines whether one’s measurement scores can predict other criterion variables measured at a later time. For example, if a student takes an achievement test and then you use their gpa to see if there is high correlation (concurrent validity). Furthermore, you could also use their achievement test for predictive validity to see how well they might do on a test such as ACT or SAT.
Construct validity speaks to whether the measure being examined actually reflects the psychological construct it aims to measure. often used when content and criterion are not available. For example, from what we know about depression, does the PHQ-9 ask relevant information in how we conceptualize depression as a construct? Or is measuring self-esteem, mood, etc.? It is made up of both convergent and discriminant validities. Convergent examines whether the test scores actually correlated with other measure’s test scores with similar constructs to what you are developing (e.g., resilience and grit) while discriminant examines whether the test scores do not correlate with measures it should not theoretically be related to (e.g., resilience and coffee drinking habits). These can be examined using a multi-trait multi-method matrix where correlations with other measures and methods are calculated. Construct validity is also comprised of the test content, its internal structure, its association with other test scores, the psychological processes responsible for test responses, and the test’s consequences.
Compensatory Equalization of Treatment / Threats to Internal Validity!
HMTISSADC “Hey, My Turtle Is So Super Awesome, Don’t Cap”
Threats to internal validity are important to control in order to confidently determine that effects on a DV truly result from an IV. Random assignment of participants to groups is integral to increasing internal validity. Further, using control groups, replication studies, and intent-to-treat analyses can be ways to strengthen internal validity.
1) Historical threats refer to events external to the intervention that occur to all participants in their lives (e.g., natural disaster; global pandemic) that could be attributed to the results (e.g., greater PTSD).
2) Maturation threats refer to naturally occurring processes/changes within participants that occur naturally over time (e.g., growing stronger throughout the year; children become taller as they age). Maturation and history often go hand in hand. People grow/change over time, so an intervention needs to show effects beyond maturation
3) Testing threats refer to changes in scores attributed to repeated assessment and familiarity with the test (e.g., participant purposefully keeping consistent responses from memory).
4) Instrumentation: Changes in the actual measuring instrument over time; Testing threats reflect the individual, but instrumentation threats typically reflect the actual measure/instrument (i.e., the way the DV is measured); Changes in the items, structure, or instructions can lead to instrumentation threat. Another type of instrumentation threat is response shifts, which are changes in a person’s values/perspectives/criteria that lead to different perceptions. So, the actual measure isn’t changing, but the way an individual interprets and responds to the item does.
5) Statistical regression threats refer to the phenomenon that occurs when utilizing participants with extreme scores; random error on subsequent testing may result in scores drifting towards the mean and away from extremes which might call into question intervention treatment. (Regression to the mean --> Statistically, an individual who achieves an extreme score on the first testing will probably score closer to the mean on the second testing. The problem here is that you need to make sure the decrease in DV is more likely due to the IV than regression.)
6) Selection bias threats refer to natural differences between participants that may influence treatment differences in results rather than treatment conditions themselves. These differences are before any experimental manipulation occurs, due to selection or assignment of subjects for example only having people who volunteered to participate. Random selection helps avoid this
7) Attrition/Mortality threats refer to the loss of subjects when studies take more than one session. Attrition is a direct function of time --> the longer a study goes, the more participants you’ll loss. It can be particularly problematic if there are different rates of attrition across groups. The main thing is the possibility that participants who drop out may share specific characteristics resulting in treatment effects only applying to the “survivors” of the study. Essentially, when participants who drop out differ systematically from those who stay, the remaining sample may no longer be representative, thus affecting the study's ability to accurately assess the relationship between variables.
8) Diffusion of treatment threats refers to instances in which control groups accidentally receive parts of the intervention, or the intervention group fails to receive all parts; this threat dilutes true treatment effects in the results. When the interventions/conditions of one group accidentally spread to another/control group (e.g., the control group and treatment group talk to each other). When the control group is exposed to the treatment, they may no longer be a true control, as they are receiving some form of the intervention. Diffusion can also lead to resentment or demoralization within the control group, as they may feel disadvantaged compared to the treatment group. Ex: a study investigating the effectiveness of a new teaching method. If the control group learns about the new method from the treatment group (perhaps through shared conversations or observing the treatment group's learning), then any observed improvements in the control group's performance could be attributed to the diffusion of treatment rather than the intended intervention
9) Compensatory equalization: in between group studies when an untreated (I.e., control) group demands to receive a treatment that is the same as or equivalent to the treatment received by another group in the research study. When the control group (the one that does not receive the intervention) is treated better in some way in order to address their inequity and may actually lead to overcompensation (or reduction in symptoms) because they may perform even poorer/better as a reaction to being in the control group The treatment effect can be eliminated if those in the "less desirable" condition are given additional services/benefits, by those who are aware of the "inequity," which has the potential to influence post test results. This is a problem for studies because it covers up the effect of the treatment. (e.g., giving the control group extra tutoring time instead of the new teaching method to compensate for not getting the new teaching method. The tutoring could be a confounding variable, making it unclear if the new teaching method itself is superior or if the added tutoring is the main reason for any differences between the groups)
• This can be combatted by using double blind studies so that the researchers do not know which group is getting the treatment.
External Validity & Generalizability!
External validity: the extent to which the results of an experiment can be generalized beyond the specific conditions of that experiment to other groups or settings outside of the lab and into the real world, and actually represent the population of interest. Do these findings extend beyond the sample that the research study was conducted on? External validity also requires internal validity to determine that the conclusions in the study are first trustworthy.
External validity consists of both generalizability, or whether results can be replicated under different conditions AND ecological validity, or whether results can be applied to specific settings.
Considering external validity in research is important because the interaction between an I and D variable might rely on other variables or be situational to that specific study (factors such as the characteristics of the participants - including the demographic and clinical characteristics, inclusion criteria, the setting of the study, and the interventions or exposures studied), limits a finding’s generalizability to other groups and situations. Subsequent studies can help answer these questions about confounding variables and interactions.
When a researcher hopes to increase the external validity of their research, they might hope to increase its generalizability to broader groups (e.g., studies with children to be applicable among teens); however, the nature of applying sample-based findings to a target population means findings may not apply to others without infinitely researching other groups. Overgeneralization can occur when findings from a restricted sample are used without proper replication studies under expanded conditions; for instance, it would be inappropriate to use a drug for people with varied ethnic backgrounds if it had only ever been tested among European individuals.
Threats to external validity
Threats to external validity might be used as criticisms of most studies but can come across as superficial without plausibility that the factor actually restricts generalizability of the result. For instance, researchers can easily criticize a study by stating the investigator did not examine an effect within an older population.
1) Sample characteristic threats refer to results relying on the demographic or natural characteristics of participants (e.g., undergraduates).
2) Narrow stimulus sampling threats refer to results being limited to the stimulus, materials, or researchers involved, limiting their extension to non-experiment conditions.
3) Reactivity of experimental arrangements and of assessment refer to participants’ awareness that they are part of an experiment that may influence non-experimental generalizability as well as awareness that influences how a participant might respond to questions (e.g., answering favorably).
4) Test sensitization threats refer to participants becoming aware of pre-test assessment procedures that influence or trigger them to respond differently upon follow-up (e.g., greater insight into experiment; “aha! I was asked xx”).
5) Multiple-Treatment interference threats refer to order effects, or the difficulty in ascertaining whether treatment effects result from separate treatments, or the order of conditions given.
6) Novelty effect threats refer to new aspects or environments in an experiment contributing to results rather than the intervention itself (e.g., new therapy setting).
Power!
Power is defined as the probability of correctly rejecting a false null hypothesis (or 1-Beta). (Null hypothesis is the hypothesis that there is no significant difference between specified groups/populations, any observed difference being due to sampling or experimental error). In other words, the likelihood of a significance test detecting an effect when there actually is one. Having enough statistical power is necessary to draw accurate conclusions about a population using sample data.
Power is influenced by (1) alpha level, (2) effect size, (3) sample size and as each of these increase, so does power.
Approximately 80% power is recommended by many researchers.
High power in a study indicates a large chance of a test detecting a true effect.
Low power means that your test only has a small chance of detecting a true effect or that the results are likely to be distorted by random and systematic error.
Alpha level: if you decrease alpha level, it becomes harder to reject the null, the threshold moves farther out on the tail, which means we are more likely to fail to reject the null when the null hypothesis is false (beta probability increases, power decreases). If you increase alpha it will become easier to reject the null which will decrease the probability that we fail to reject the null when the null is false. Therefore, (1-beta) power increases. However, this also increases Type 1 error (false positive).
If we increase effect size (= increase of distance between the centers of distribution) probability of Type II error (beta) will decrease causing power to increase.
As sample size increases, our estimation of population mean and SD gets better and variability in the sample means goes down causing power to increase as sample size increases.
You need 3 of the following 4 things to be able to run a power analysis: effect size, alpha, power, and number of participants.
Demand characteristics and experimenter expectancy effects!
Demand characteristics and experimenter expectancy effects are threats to construct validity (the extent that your measure- measures what it is intended to) due to their impact on results.
Demand Characteristics: Because individuals naturally attempt to understand what is happening to them and the meaning of events, participants will frequently try to understand the nature of the research and hypothesize about its goals. In doing so, participants may attempt to adjust the outcome of the research as a form of reactivity. Participants might make assumptions based on the information provided through sources of information, such as instructions, procedures, and other features of the experiment (informed consent or by deriving information from study procedures or cues). These subtle cues/demand characteristics can make participants aware of what the experimenter expects to find or how participants are expected to behave. This can change the outcome of an experiment because participants will often alter their behavior to conform to expectations. Demand characteristics can threaten construct validity if it is plausible that the extraneous cues associated with the intervention (or their change in behaviors) could explain the findings.
Example: Some participants were informed of the purpose of the study was to see the treatment effects (reduced symptoms) of a menstrual cycle pill, and were told that the researchers wanted to look at menstrual cycle symptoms. Both were given placebo pills, however, the group that were informed of the purpose of the study were significantly more likely to report negative premenstrual and menstrual symptoms than participants who were unaware of the study's purpose.
Demand characteristics can be controlled by:
1) Reducing cues within the research including reducing obvious manipulations or limiting participants to single conditions (e.g., displaying ageism when presented with older and younger pictures)
2) Increasing motivation of participants by reminding them of active choice (e.g., can leave at any time)
3) Incorporate “fake good” role-playing procedures to estimate true versus purposefully fake participant responses, and 4) Separating the dependent variable from the study (e.g., deception practices that allow participants to believe the study has ended).
Experimenter expectancy effects occur when experimenters’/investigators’ expectations of participant responses result in behaviors, facial expressions, or attitudes (changes in voice, posture, facial expression, delivery of instructions) that affect participants’ responses and how they perform. This can be unintentional, or it can be intentional b/c experimenters hope to find data that support their hypotheses, or they hope that early detected patterns will result in later predicted data patterns.
These can result in 1) biased observations, in which hopes and expectations might result in biased ratings (e.g., expecting female participants to be emotional results in greater emotion ratings), or 2) influencing participant responses by treating participants differently based on experimenter assumptions (e.g., nonverbal or verbal feedback to correct answers). Experimenter expectancy effects can be reduced by utilizing detailed scripts, masking conditions from the experimenters, disallowing snooping for patterns in data, using double blind studies in which the experimenter does not know which group or condition the participant is in.
Example: The researcher may respond to participants by nodding their head, using facial expressions or body language that may influence how the participant answers or responds to the experiment.
Between-subjects versus within-subjects designs!
Within-subjects designs are studies in which participants are tested repeatedly (i.e., repeated measures design). Participants might be tested pre- (baseline) and post-treatment, be placed in both the experimental and control conditions, or tested repeatedly across time points.
Within-subjects case study designs include baseline assessment or their existing level of performance, continuous assessment typically multiple times a week of the same subject, examine pattern and stability of performance, and use of different phases of baseline and intervention. From a within-subject experimental design, one can have an ABAB design where A = baseline and B = treatment (which is applied after a stable baseline has been established), A = return to baseline and removal of treatment. This can account for different treatments. For example. a person measuring their sadness each day for one week, then starting an intervention like exercising each day, recording their sadness each day, then the next week recording sadness but not working out, then the next week recording their sadness and working out again. This allows them to compare to their baseline sadness from week 1 what the effect of exercise is on their sadness level.
Pros: This design is beneficial in that characteristics among individuals should remain stable, reducing error and increasing power, and a smaller sample is required.
Cons: However, within-subjects designs are prone to order, practice, fatigue, carryover, and sensitization effects due to repeated testing, susceptible to history, maturation, or response bias (the social desirability bias from already being used in the study or practice effects).
Between-subjects designs are studies in which different groups of participants undergo different aspects of an experiment, typically only receiving one condition. Participants would be tested in either the control or experimental conditions and are usually randomly assigned to reduce error and confounding effects. Between-subjects designs can include pretest-posttest control groups (4) or intervention/control groups (3). An example of a between subjects pre-post design would be if one randomly assigned group was given a pre test, then CBT treatment for insomnia, then a post test measure. The comparison group would also be randomly assigned, undergo a pre-test, no intervention, then a post-test. Comparison between groups would reveal information about the effectiveness of CBT-i.
Pros: Prevents carryover effects, usually shorter duration
The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.
Manipulation Check!
Manipulation checks are important to ensure construct validity of experiments and interventions. A manipulation check refers to examining the treatment integrity, the extent to which treatment was conducted as intended, and check the participant's understanding regarding the condition to which they were exposed. Was the independent variable, experiemental manipulation, or intervention actually implemented as intended? Manipulation checks can be especially helpful in situations in which the intervention had no effect, or when experiment conditions need to be especially distinct to examine appropriate levels of intervention.
For instance, researchers might want to ensure that participants in each condition are receiving the appropriate levels of intervention and manipulating only the independent variables of interest. Participants can provide self-report data to determine whether the intervention had its intended effect (e.g., did showing kitten pictures lead to more joy), or experimenters can add dependent variables to the study as manipulation checks to determine the same (e.g., in an aversion eye-gaze task, asking people to rate the powerfulness of people posing for corroboration).
This can be accomplished by manualizing treatment, training the research team, and providing continuous supervision. Further, you can have subjects complete a self-report on whether they completed the treatment/condition, an informant report it, or observing them do the treatment/manipulation. An example of a manipulation check would be to check to see if participants actually completed the number of sessions required under the intervention or checking to see if each group received the same treatment. Purpose is to increase internal validity and establish greater confidence that the experimental manipulation was responsible for the outcome. If a manipulation check is successful the researcher can conclude that participants correctly perceived, interpreted, or reacted to the stimulus and then draw more accurate conclusions related to the relationship between the independent and dependent variables.
Waitlist Control!
Control group is a group that is ideally the same as the experimental group but doesn’t get the treatment or experimental manipulation. A control group is used to reduce the chance of threats to internal validity (i.e. history, maturation, selection and testing). In waitlist control groups, treatment is withheld from the control group for a period of time after which they receive treatment, typically they serve as the control group until the experimental group finishes course of treatment. For this group to be most effective, subjects need to be randomly assigned.
There are 3 features of wait-list controls
(1) if pretest is used, there must be no treatment between 1st and 2nd assessment period
(2) the time period from first to second assessment must correspond to the time period pre-post assessment of the treatment group
(3) wait-list controls complete the pre- and posttest assessments. T
Wait-list control groups are easier to obtain than no-treatment control groups as they are simply delayed rather than withheld assistance; however, wait-list control groups need to have extra considerations for their unique needs for treatment, providing other resources, and the severity of the condition. Additionally, it is difficult to ascertain maturation and history effects for this group because natural changes over time may be large or small. There may also be ethical issues if an individual is in need of immediate treatment and the long term impact of factors like history cannot be evaluated.
o Here is how a waiting-list control group is diagrammed:
R | O1 | X | O2 | ||
R | O3 | O4 | X | O5 |
X= treatment or independent variable
O=observation (pre/post teat) of independent variable
Extra: Independent Variable vs Dependent Variable
IV: Studies in psychology are usually designed to test specific hypotheses and “if-then” statements. The “if” portion refers to the IV and the “then” part refers to the dependent variable or outcome. The IV contains conditions that are varied or manipulated to produce changes or differences among conditions in a study.
1) Environmental (varying that is done to, done with, or done by the subject…ie experimental or control group),
2) Instructional (variations in what the participant are told or are led to believe they are)
3) Subject/individual difference (attributes or characteristics of the individual subject … gender or race). The researcher typically controls the independent variable and the IVs can be quantitative, qualitative, discrete or continuous.
DV: A DV is hypothesized to be affected. The aim of an experiment is to learn whether and how the DV has been affected by the IV, usually measured through behaviors. In correlational research, the DV effect is what one hopes to predict or explain. A DV and is often a continuous variable but can be categorical. DVs are hoped to be related to IVs, but a relationship cannot be determined with absolute certainty.
Extra: Counterbalancing
Counterbalancing is a method used in experimental design to control for order effects—that is, the possibility that the order in which treatments or conditions are presented might affect the results. Imagine you're testing two different treatments (A and B) to see which works better. If everyone gets A first, then B, you can't be sure if B looks better just because it came second—maybe participants were more tired, more experienced, or more motivated the second time. That’s called an order effect, and it can confound (mix up) your interpretation of which treatment actually worked better.
Counterbalancing mixes up the order in which participants receive treatments. This way, any effects caused just by the order (like doing better the second time because of practice) are spread out evenly across conditions, making it easier to isolate the actual effect of the treatment itself.
Simple counterbalancing: For 2 conditions (A and B), split participants into two groups:
Group 1: A → B
Group 2: B → A
Crossover designs: Each participant gets both treatments, but in different orders, with time in between to reduce carryover effects.
Latin Square design: Used when there are more than two conditions, this ensures that each condition appears in each position (first, second, third, etc.) an equal number of times across participants. An example of this would be to administer 123 to one group, 231 to the other, and 312 to the last group.
Extra: Regression to the mean
Those who score at the extremes on a measure or construct do so partly because of a rare combination of factors unlikely to occur together again, in other words: extreme scores, upon retesting will likely move close to their mean. For example, children with very tall parents were on average shorter than their parents and children of very short parents were on average taller than their parents.
RTM occurs when the correlation between two variables is not perfect, which may be due to measurement errors and or factors or elements unique to each of the variables. The lower the correlation the greater the amount of error and the greater the regression to the mean.
Another example - When you give students a test and select the bottom 5% of the students, it is statistically unlikely that those exact same set of students will again perform poorly on the next test as well. Even if one of the students in the group performs better, then the group as a whole is in a better position. Same can be said for top 5%
How to combat? Use multiple baseline measurements and use RCT with control group
Extra: Mediation vs Moderation
For both mediation and moderation, as outlined by Baron and Kenny (1986) is that the role of a third variable plays an important role in governing the relationship between two other variables.
For us to claim a mediation effect the first step is to show a significant relationship between independent variable and the mediator (A) and then show a significant relationship between the mediator and the dependent variable (B) and then a significant relationship between the IV and DV (C). However, others have argued that these requirements for mediation result in very low power. The final step in proving mediation is that once the mediator and independent variable are used simultaneously to predict the dependent variable then the significant path between the independent and dependent variable is now reduced, if not insignificant. Therefore, mediation tests the causal chain X leads to change in mediator which leads to change in Y and explains why X affects Y. Mediators tell us why something works
Ex. SES predicts parental education levels. Parental education levels predict child reading ability. Parental education levels are shown to explore the relationship between SES and reading ability.
· Mediation: third variable accounts for the relation between the IV and the DV. So the relationship between an IV and DV is explained by the mediator on a causal path. Therefore, mediation tests the causal chain X leads to change in mediator which leads to change in Y and explains why X affects Y. Mediators tell us why something works
o Step 1 – IV predicts DV
· Caveat – not always necessary. If steps 2 &3 are met then path between IV and DV are implied
o Step 2 – IV predicts mediator (have to have)
o Step 3 – Mediator predicts DV (controlling for IV)
· others have argued that steps 1-3 for mediation result in very low power so Baron and Kenny say step 4 is the final step in proving mediation
o Step 4 - when mediator is entered into regression, the relation between IV and DV disappears (or weakens)
o Ex: Does body dissatisfaction mediate the effect of body mass index on the drive for thinness? If significant, body dissatisfaction would explain the relationship between BMI and drive for thinness.
Moderation : Third variable (moderator) affects the direction and/or strength of the relation between IV and DV. The moderator can influence the magnitude and/or direction of the relationship between the two variables. Does not explain why X affects Y as in mediation but does tell us if the relation between X and Y is different at different levels of (Moderator) so for who or under what conditions.
Testing for moderation includes seeing whether there is an interaction between predictor x moderator that significantly predicts the outcome variable.
Ex: Is the effect of stress on depression symptoms moderated by social support? For individuals with high social support, relationship between stress and depression weakens. Or ,for individuals with low social support, relationship between stress and depression is amplified. Ex. Gender moderates the relationship between work experience and salary.