GD

Psyc 333 Study Guide: Causality, Experiments, Surveys, and Culture of Honor

Understanding Causality and Randomized Experiments

  • Causality is identifying cause-and-effect relationships, which is challenging due to many influencing factors.

  • Randomized experiments infer cause-and-effect by randomly assigning participants to groups (treatment and control).

  • Random assignment ensures each participant has an equal chance to be in any group, making groups similar except for the treatment.

Why Random Assignment Helps Causality

  • Minimizing Researcher Bias: Prevents researchers from unintentionally placing certain people in specific groups.

  • Equal Distribution of Characteristics: Balances confounding variables (age, gender, etc.) among groups.

  • Internal Validity: Increases confidence that the independent variable caused the effect on the dependent variable.

  • Important Distinction:

    • Random assignment differs from random sampling.

    • Random sampling selects participants from a population for a study, aiding generalizability (external validity).

    • Random assignment places participants into experimental groups for causal control (internal validity).

    • Ideally, use both techniques for representativeness and causal control.

Random Sampling vs. Random Assignment

  • Random sampling improves external validity by generalizing results to a population.

  • Random assignment improves internal validity by ensuring groups differ only by the treatment.

Focused Statistical Tests in Experiments

  • Common tests for comparing two conditions or groups include t-tests, F-tests (with 1 degree of freedom in the numerator), and chi-square tests (with 1 degree of freedom).

  • t-test: Checks if two means are significantly different.

    • Formula for independent groups: t={\bar{X}1 - \bar{X}2 \over \sqrt{{s1^2 \over n1} + {s2^2 \over n2}}}

    • \bar{X}1 and \bar{X}2 are means of group 1 and group 2.

    • The numerator is the difference between group means (signal).

    • The denominator measures the spread of scores within each group (standard error of the difference).

    • s1^2 and s2^2 are variances of the two groups, and n1 and n2 are group sizes.

    • A larger t-value indicates a significant difference.

  • F-test: Used in ANOVA; with 1 degree of freedom in the numerator, it compares two groups.

    • When comparing two groups, F = t^2 (F statistic equals t statistic squared).

    • Formula: F = {variance \ between \ groups \over variance \ within \ groups}

    • A high F indicates a significant difference.

  • Chi-square ($\chi^2$) test: Used for categorical data, often in a 2x2 table comparison.

    • Formula: \chi^2 = \sum {(O-E)^2 \over E}

    • O is the observed count, and E is the expected count in each cell.

    • Large \chi^2 values suggest groups behave differently.

    • P-value: The probability of observing the data (or more extreme data) if there is no true effect.

    • A small p-value (typically < 0.05) indicates statistical significance.

    • Statistical significance means the effect is likely real but doesn't indicate the effect's size or importance.

Threats to Validity in Experiments

  • Demand Characteristics: Participants alter behavior based on perceived study purpose, skewing results.

  • Expectancy Effects (Observer Expectancy): Researcher expectations unintentionally influence participants (Pygmalion/Rosenthal effect).

    • Double-blind designs prevent this.

Internal vs. External Validity

  • Internal Validity: Confidence that the independent variable caused changes in the dependent variable.

  • External Validity: Generalizability of results to other contexts/populations.

  • Random sampling improves external validity.

  • Field experiments enhance external realism.

Challenges in Conducting True Experiments

  • Randomized controlled trials (RCTs) are the gold standard but face challenges:

    • Perfect Control is Hard: Participants may seek treatment outside the study or not adhere to instructions.

    • Participant behavior can defeat design: Side effects or lack of perceived benefit can cause dropouts, skewing results.

    • Ethical Issues: Randomly assigning harmful conditions is unethical. IRBs protect participants.

    • Generalizability (External Validity): Certain individuals may refuse random assignment, limiting sample representativeness.

  • Example – Salk Polio Vaccine Trial (1955): Design issues included non-randomized parts and the “Cutter Incident” (defective vaccines).

  • Conclusion:

    • Randomized experiments are not perfect due to practical and ethical hurdles.

Reasons to Use Random Assignment (and Its Limitations)

  • Key Reasons We Insist on Random Assignment:

    • Avoiding Bias: Prevents researcher bias in group assignments.

    • Balancing Groups: Creates comparable groups on extraneous variables.

    • Statistical Validity: Justifies using statistical tests that assume random assignment for calculating meaningful p-values.

  • Limitations:

    • Ethical Constraints: Cannot ethically randomize harmful or crucial conditions.

    • Practical Constraints: Cannot randomize inherent traits (gender, personality).

    • Alternatives (Quasi-experiments and Matching): Quasi-experiments includes matched groups or statistical controls like propensity score matching.

  • Bottom Line:

    • Randomized experiments are crucial for strong causal claims but aren't always possible.

Principle of Converging Evidence and Parsimony

  • Converging Evidence: Using multiple methods with different flaws to converge on the same finding.

  • Parsimony: Preferring the simplest explanation fitting all evidence.

  • Approaching a question from many angles enhances confidence in the conclusion.

  • Each method’s strengths compensate for others’ weaknesses.

C.O.S.I.

  • Method weaknesses:

    • Construct Validity: Measuring what intended (e.g., accurately measuring honor culture attitude).

    • External Validity: Generalizing results to other people, places, times.

    • Statistical Conclusion Validity: Using correct statistical methods and sufficient data.

    • Internal Validity: Establishing cause-and-effect without confounds.

    • Converging evidence strengthens COSI, addressing each aspect by compensating for individual method weaknesses.

Experiments vs. Surveys: Pros, Cons, and Unique Strengths

  • Experiments (Lab Experiments):

    • Demonstrate causation through control and random assignment.

    • Observe actual behavior in real-time.

    • Offer unique access to special populations such as infants or animals who cannot verbally communicate.

    • Can be aesthetically pleasing, compelling narratives.

  • Surveys:

    • Get large samples and find out subjective data across populations.

    • Cannot establish causation.

    • Rely on self-report data, which is prone to untruthfulness or not knowing own mind fully.

Challenges with Self-Report Data (Surveys)

The “Unsolvable Problem” of Surveys

  • If unwilling, people won’t tell the truth, biasing results.

Do People Have Stable Attitudes?

  • People lack stable, coherent attitudes.

  • “Non-attitudes” are flimsy, easily changed opinions.

  • “The American Voter” study (1950s) found inconsistent political opinions.

The Miracle of Aggregation (Why surveys still work in aggregate)

  • Miracle of aggregation uses the Law of Large Numbers.

  • Errors cancel out in large samples, reflecting underlying reality.

  • Even if individuals lack knowledge, the group collectively knows a lot.

  • Aggregation addresses random errors but cannot fix systematic biases.

Attitudes vs. Behavior

  • Attitudes and behaviors are weakly correlated.

  • What people say and what they do differ.

  • Attitude-behavior study on birth control:

    • General attitude toward birth control vs. actual use of birth control pills: r = .08.

    • Attitude toward “birth control pills” specifically vs. actual use of birth control pills: r = .32.

    • Attitude about using birth control pills (more personal) vs. actual behavior of using: r = .53.

    • Attitude about using birth control pills in the next 2 years vs. actual use: r = .57.

  • Specificity matching:

    • Attitudes predict behavior better when measured specifically to the behavior.

Channel Factors

  • Channel factor or nudge prompts intention into actual behavior (Kurt Lewin).

  • Small prompts facilitate follow-through.

  • Behavior is a “tension system” of forces (intentions, social pressure, restraining forces).

  • Channel factors reduce restraining forces and/or increase pushing forces.

  • To improve attitude-behavior consistency, be specific and provide a clear “channel” for action.

Case Study: Culture of Honor and Violence in US South

  • Culture of Honor theory: Higher violence rates in the Southern US (Richard Nisbett and Dov Cohen).

What is a “Culture of Honor”?

  • People, particularly men, are sensitive to reputational slights/insults.

  • Expected to defend honor, often with violence when necessary.

  • Insults test hierarchy.

  • Honor cultures arise where the law is weak.

  • Roots in Scots-Irish herders of lawless border regions, self-reliance in justice.

  • Primary thesis of the Culture of Honor Theory: The South has more violence through primary argument relating to honor and violent threats or insults.

Other Factors to Consider (Secondary Theses and Alternatives)

  • Contributing factors:

    • Secondary Theses: Herding economy, lawlessness of the frontier, origins of settlers, role of Southern white history, chivalry, gender roles.

    • Alternative Explanations: Heat, poverty, education levels, historical slavery legacy.

Evidence and Data: What Supports the Culture of Honor?

  • Converging methods used: Archival data, survey data, lab experiments, policy analysis.

  • A. Archival Data (Homicide Rates):

    • The archival data consisted of violence statistics, especially homicide (murder) rates, comparing the South and the North.
      Homicide rates tend to be higher in the south, but especially higher in the south for conflict-related homicides but not felony-related. The notes said “show real violence-measure it well”

    • Strengths: Real behavior, broad comparison, patterns by crime type.

    • Weaknesses: Extreme cases (tip of the iceberg), inferring culture without direct measurement.

  • B. Survey Data (Attitudes on Violence):

    • Surveyed attitudes toward violence in different situations.

    • Southerners approve violence for self-protection and insults than northerners

    • Noted “not much of a man if you don’t respond violently” indicating cultural norms in the south.

    • Directly measures attitude towards culture and validates that belief system.

    • Strengths: Measures honor beliefs (culture), large sample, controlled for factors.

    • Weaknesses: Just talk, social desirability bias, cannot prove causation.

  • **C. Field Experiment (The study about southern honor/temper in males being threatened):

    • Conducted at the University of Michigan, this classic experiment was done on Northern and Southern students where
      they will walk down a hallway and were randomly insulted and researchers measured their reaction.

    • Southerners responded emotionally: reported being very angry at the insult

    • Cognitively: Southern students that were insulted would complete stories more violent.

    • Physiologically: cortisol was tested between South and Northern males

    • Behaviorally: the student that was insulted was in close proximity to a “big guy” in order to assert dominance The students showed and had more domineering effect after the test supporting that honor is real

      • Strengths: Controlled causal evidence, multi-level measures, large effect.

    • Weaknesses: Population mostly based on universities/not a wide-ranging set of people, no real violence occurred

  • D. Archival/Policy Data: Laws and Institutional Practices
    *Examined self defense/gun laws etc.
    Found Southern states loose with gun control and “stand your ground laws” and supports strong military, all correlating with a legitimate state of violence.
    Newspaper accounts of honor violence are empathetic to perpetrators in the south in comparison to the North.
    *In support that culture is now not just institutionalized between the people but part of actual laws
    *Not as strong because west has similar laws and not everyone agrees with them.

Putting It All Together

  • Very strong support for the culture promoting violence with honor system

  • The most parsimonious supports historical factors making a modern culture through socialization

  • ”The goal of intervention is to reduce violence against honor”

Thinking Statistically: Effect Size, Significance, and Practical Importance

  • When analyzing, ask the following 3 questionsfile-p5cprxui9fwrp3c9hcrjee: :*

    • 1 Is there an effect?*

    • 2 How big is the effect?*

    • 3 How important is the effect?*

Statistical Significance

  • P-value helps determine if results are down to chance. The significance of a study test depends on:

    • 1 size of the study

    • 2 size of the group

  • Larger sample sizes allow for more stable estimates

Effect Size

  • How much something changess or how strong a relationship is in standardized terms.
    Ex: Cohen's’ d is commonly used to measure the differences of 2 groups
    d = (difference between the groups) / (standard deviation within the groups)file-p5cprxui9fwrp3c9hcrjee}.
    Intuitively as *“difference between groups (signal) / standard deviation within groups(noise).”file-p5cprxui9fwrp3c9hcrjee}
    Cohen’s d in simple terms.
    A: Cohen’s d is a number that tells us how far apart two group averages are, in units of the
    groups’ standard deviation. For example, d = 0.5 means the two groups’ means differ by
    Half of a standard deviation. Roughly, d = 0.2 is small, 0.5 medium, 0.8 large.

Practical Importance

  • An effect of a given size can be important in one context vs the other. Evaluating importance by
    Relative Benchmark: The difference in comparison to familiar contexts such the margin of victory between the electionWhat are the three questions we should ask about any result?
    A: 1) Is it real (statistically significant)? 2) How big is it (effect size)? 3) Does it matter (practical
    significance)? This ensures don’t get excited by

  • Accumulative impact over time vs one single shot. The notes mention “electionsfile-p5cprxui9fwrp3c9hcrjee} – for example, a 4% swing in voter preference can be the difference between winning and losing a presidential election (so that’s important in politics), even if 4% is a small effect size in some other domains. Always ask: relative to what?
    An effect can be high or low based on if it happens frequently such as with sports training. Another example is compound interest, which small effects daily can affect more with: Results=P(1+r)^n, where *P is the principal, *r is growth and *n is the period. This translates to 6% annually over 60 years turns $1,000 into $33K.

  • *Statistical vs Narrative and how it differs from probabilities”file-p5cprxui9fwrp3c9hcrjee}This highlights a psychological insight: People (even very smart ones) struggle with probabilistic thinking.
    Narrative: anecdotally, our brains are more deterministic.

Key Formulas and Definitions Recap

*Expected. Value=(EV* *Formula* *from* *decision theory/probability*
*Expected Value= Probability (Outcome)* Value [payoff}*
“The problem becomes imposing value-e.g. how do you value for loss of a miscarriage vs having a child with abnormality-subjective=, “*use carefully*

What is random assignment and why is it important in experiments?
Explain C.O.S.I and what they measure!
Construct Validity—Measuring what they think they doing
External Validity—how much results reflect real-world setting
Statistical Conclusion Validity—Data/analyzing enough claim
Internal Validity—Elimated Confounds with cause and effect.*
differences over time impactful as”differences accumulate over time* vs one-shot deal*file-p5cprxui9fwrp3c9hcrjee]They gave Examples
Baseball hits: get just 1 extra hit every two weeks might only bat a few percentage points higher, Compound interest, They explicitly gave formula

  1. Random sampling ensures the sample reflects the entire population, allowing for generalization of results.

  2. Nonresponse bias occurs when individuals selected for a survey do not respond, potentially skewing results.

  3. The main benefit of using the randomized response technique in surveys is that it increases the likelihood of truthful responses by protecting respondent anonymity.

  4. The Law of Large Numbers implies that the sample mean will get closer to the population mean as the sample size increases, enhancing accuracy of estimates.

  5. Standard deviation indicates how much individual data points differ from the mean, reflecting the variability or spread of the data, which is crucial for data analysis.

  6. The primary distinction between effect size and p-values is that effect size measures the magnitude of a difference, while p-values assess the statistical significance of that difference.

  7. Compound growth helps in understanding changes over time in various psychological metrics, illustrating the impact of psychological factors across different periods.

  8. Vague terms in survey questions can lead to biased responses, compromising the quality of the data collected.

  9. Priming can lead to biased or skewed results by influencing how respondents interpret and answer questions, affecting the reliability of survey results.

  10. The primary purpose of a contingency table in psychology is to display the frequency distribution of variables, aiding in the analysis of relationships between categorical data.

What is the Randomized Response Technique used for?
It helps gather truthful responses on sensitive topics by ensuring respondents' anonymity.

How does the Randomized Response Technique work in estimating cheating among students?
Students roll a die: if they roll 1 or 2, they say 'Yes'; if 3-6, they answer honestly. This method helps estimate the true proportion of students who have cheated.

How do you calculate the true proportion of students who have cheated using the observed data?
Using the equation: 0.40 = (0.333 Ă— 1) + (0.667 Ă— x), solve for x to find that approximately 10% of students have cheated.

What does the Law of Large Numbers state?
As sample size grows, estimates (e.g., averages) get closer to the population parameters.

What is the difference between effect size and p-value?
The p-value indicates if there's a statistically significant difference, while effect size indicates the magnitude of that difference.

What is compound growth in the context of improvement over time?
It refers to the process where improvements accumulate over time, such as Sarah and Emma's running distances increasing due to daily percentage improvements.

What is the significance of a contingency table in decision-making?
It helps visualize the relationship between two categorical variables, such as test results and disease presence.

What is the probability that a person who tests positive actually has X-Disease?
Only a 16.7% chance, despite the test's accuracy, due to the low prevalence of the disease.