Lecture Notes on Internal and External Validity

Selection bias is a sample-related factor that can affect the validity of a study.
Example: Starting a group therapy with the first 20 volunteers and using remaining participants as a control group.
Confounding variable: Motivation to change, as early volunteers might be more motivated.
If a stop-smoking program uses early sign-ups for the intervention group and stragglers for the control, the higher motivation of early sign-ups could drive the positive findings, not the intervention itself.

Assignment bias occurs when the researcher assigns participants to specific groups, rather than using randomization.
Example: Executive monkeys study by Brady (1958).
- Monkeys had to press a lever every 20 seconds to avoid electric shocks.
- Executive monkeys pressed the lever, control monkeys' fate depended on the executive monkeys.
- Executive monkeys developed ulcers, control monkeys did not.
- Flaw: Monkeys were not randomly assigned. Monkeys fast at learning shock avoidance were assigned to the executive condition, and slow learners to the control condition.
- Rival hypothesis: Fast-avoiding monkeys may have a lower pain threshold or higher emotionality.
Ulcers are caused by bacteria, not psychological stress. (Discovery by an Australian researcher who won a Nobel Prize).
Fix: Random sampling or assignment of participants to groups.

Individual scores vary around a stable mean.
People scoring very high (top quartile) at one time are likely to score lower the next time.
People scoring very low (bottom quartile) at one time are likely to score higher the next time.
Extreme scores are temporarily more extreme by chance.
If experimental and control groups are derived from the top and bottom quarters of an initial sample, the group means will move closer together upon repeat testing, even without intervention.
This statistical artifact can be reduced by avoiding the use of groups made up of the highest and lowest scores on pre-test.
If required, half of these scores should be used as a control group.
Example: Studying the effect of therapy on depressed people. Select subjects based on high depression scores, but put half in the control group.

Changes happen to a participant over the course of a study (changes within the respondents as a function of time).
Participants grow stronger or weaker based on age, or they might get tired after repeated testing.
People suffering from acute depression due to a traumatic event tend to get better over time without therapy.
Confounding due to maturation: Can be avoided by having a control group and by randomly assigning subjects to experimental or control groups.
Example: A friend suggests drinking 2 liters of orange juice every day to cure a common cold, and the symptoms disappear after 4 days. The symptoms would have disappeared anyway due to the innate immune system responding to the pathogens within that time frame.

Events happening between pre and post testing.
Example: Trying to get people to conserve electricity with a media campaign, but power prices increase by 33% during the study.
People might use less electricity due to the price increase, not the campaign.
A recent or controversial event can alter a person or even a whole group's attitude towards an ethnic group.

The nature of being observed often alters a person's behavior.
Example: Oral presentation behavior changes when talking to a crowd versus talking to a friend.
Hawthorne studies (1950s): Researchers adjusted environmental factors (lighting) to see if they impacted productivity on the factory floor.
Productivity increased as lighting became brighter and when dimming down.
The mere fact that the workers were being observed altered their behaviour.
To counter this problem, use control groups and counterbalance the treatment (mix up the order of treatment).

Participants lose interest or avoid completing a study.
Data from dropouts often has to be excluded, as comparisons across different points are not possible.
People who drop out of weight addiction programs are typically those performing poorly.
This can create an artificial idea of the efficacy of the program.
Solutions:
- Control group.
- Matched controls or yoking: Recruit people into the study, number them, and if one drops out, the paired participant's data is not used.
- Systematically exclude someone with the same demographic profile.
- Yoke people based on their initial weight for a weight loss program.

Changes in instruments themselves (e.g., scales).
Variation in observation of an outcome.
Example: Different interpretations of self-injuring behavior. (Is a light tap self-injury?).
Need to define the behavior specifically.
Train the observers well.
Ideally, have one person doing all the observations but they may also change the way they do it over the course of the study, they may become better at recognizing what an incident is classified as by seeing various different cases over the course of the study.
Double-blind design: Useful if participants receive medication to reduce self-injury.
- Neither the participant nor the observer knows what treatment the participant is receiving.
- A third person is aware of the treatment and puts all the data together.

The experimenter may act differently if they know who is receiving the treatment.
Example: Giving out expensive jelly beans (with a special drug to make people jump higher).
The experimenter's body language may give an indication of what the participants are receiving.
Goes back to Clever Hans (the horse that could count).
- The horse relied on the observer to know when to stop tapping its hoof.
- The observer would look down at the horse's hooves and when they reached the mark, they would look up, which was the cuew for the horse.
Triple-blind experiment: Participant, observer, and tester don't know who gets the manipulation.
- A fourth person uses a coding sheet to analyze the data.

Selection bias, regression to the mean, history, maturation, testing, mortality, instrument and experimental error.
Control groups and random assignment fix the first four.
Special control groups are useful in testing situations.
Example: Sham surgery for Parkinson's symptoms (stem cell graft).
- One group received stem cells, another had