Notes on Correlation, Experimental Design, and Basic Statistics in Psychology

Correlation, Causation, and Quantitative Basics

  • Focus: moving from qualitative discussion to quantitative analysis; key goal is to determine if a relationship exists between two variables and if one can predict the other's outcome.
  • Example variables: height and weight; ask whether higher height associates with higher weight (positive correlation).
  • Core concept: correlation coefficient (r) measures the rate and direction of a relationship; r is a number between
    -1 and 1: r[1,1]r \, \in \, [-1, 1]
  • Visual aid: scatter plot to display pairs of data points and assess the form of the relationship.
  • Positive correlation:
    • Both variables move in the same direction (e.g., height and weight tend to increase together).
    • Example wording: as one increases, the other tends to increase.
  • Negative correlation:
    • Variables move in opposite directions (e.g., as one increases, the other decreases).
    • Example wording: as one increases, the other decreases.
  • No correlation:
    • Data points show no clear pattern or directionality; little to no relationship.
    • Example wording: hours of sleep vs shoe size likely show no meaningful relationship; points scattered with no pattern.
  • Important nuance:
    • Negative correlation does not mean there is no connection; it means the relationship direction is inverse.
    • No correlation is not the same as a negative correlation.
  • Strength of correlation:
    • The closer |r| is to 1, the stronger the relationship; |r| near 1 indicates a stronger correlation.
  • Example values and interpretation:
    • Happiness and depression: r=0.92r = -0.92; strong negative correlation (as happiness increases, depression tends to decrease).
    • A correlation of r=0.8|r|=0.8 indicates a strong positive correlation; the closer to 1, the stronger.
  • Important caveat: correlations do not imply causation.
    • Example: ice cream sales and shark attacks both rise in the summer; a positive correlation does not mean ice cream causes shark attacks.
    • Possible third-variable explanation: seasonality, heat, outdoor activity could influence both.
    • Always consider potential confounding/third variables rather than inferring cause-and-effect from a correlation alone.
  • Terminology and practice:
    • Correlation describes a relationship; causation requires demonstrating a cause-and-effect link via manipulation and control.
    • When there is a correlation, researchers may look for possible causal pathways, but evidence must support causality beyond correlation.
  • Quick practice problem cues:
    • If you see a correlation of r=0.80r = 0.80 with p=0.01p = 0.01, interpret as a strong positive correlation with a 1% probability the result occurred by chance; statistically significant.
    • If you see a correlation of r=0.30r = 0.30 with p=0.40p = 0.40, the relationship is weak and not statistically significant.

Independent and Dependent Variables; Experimental vs Correlational Design

  • Independent variable (IV): the variable the researcher manipulates or changes to observe effects.
  • Dependent variable (DV): the outcome measured by the researcher.
  • Experimental setup example: studying whether violent TV shows influence violent behavior.
    • IV: level of violence in TV programming (manipulated by researcher).
    • DV: observed violent behavior.
  • Another example: sunshine exposure and happiness.
    • IV: sunshine exposure (e.g., 30 minutes outside vs no instruction).
    • DV: happiness levels measured after exposure.
  • Experimental group vs control group:
    • Experimental group: receives the manipulated IV (e.g., new therapy, 30 minutes sunshine).
    • Control group: does not receive the manipulated IV (baseline or standard condition).
  • Random assignment (vs random sampling):
    • Random sampling: how participants are selected from a population to be in the study.
    • Random assignment: how participants are allocated to experimental vs control groups after selection.
    • Both are distinct steps; random sampling precedes the experiment; random assignment occurs within the study to create equivalent groups.
  • Experimental examples:
    • Therapy study: Group 1 receives standard therapy (control); Group 2 receives a new therapy (experimental).
    • Sunshine study: Group A spends 30 minutes outside (experimental); Group B has no sunshine exposure (control).
  • Experimental terminology:
    • Experimental group: receives the manipulated variable.
    • Control group: does not receive the manipulated variable; serves as a baseline comparator.
  • Sampling and assignment terms comparison:
    • Random sampling: selecting participants from the population to participate.
    • Random assignment: assigning participants to groups to avoid bias after they are in the study.

Bias, Blinding, and Ethics in Research

  • Bias in experiments:
    • Experimenter bias: researcher's expectations might influence data collection or interpretation.
    • Participant bias: participants' awareness of the study can influence their responses or behavior.
  • Blinding to reduce bias:
    • Single-blind study: the participants do not know which condition they are in, but the researcher does.
    • This helps minimize participant expectancy effects; sometimes the researcher’s knowledge could still influence data collection.
    • Double-blind studies (not described in the transcript) blind both participants and researchers to condition to further reduce bias.
  • Informed consent:
    • Participants are informed about the study, potential risks, and what participation entails.
    • A form is read and signed before participation to document consent.
    • Ensures participants understand potential harms and benefits; protects against ethical concerns.
  • Debriefing:
    • Post-study explanation of the study's purpose and procedures to participants.
    • Important for transparency and to mitigate any potential distress caused by the study.
  • Confidentiality:
    • Protect participant information; ensure data privacy and secure handling of records.
  • General ethical issues:
    • Avoid causing unnecessary emotional distress or harm.
    • Care for participants’ well-being throughout the study.
    • Ensure proper consent, debriefing, and confidentiality.
  • Meta-analysis (overview):
    • Not an experiment; a research method that analyzes and synthesizes findings from multiple existing studies.
    • Used to answer questions by aggregating evidence across studies rather than collecting new data.

Basic Statistics for Psychology; Mean, Standard Deviation, and Significance

  • Mean (average):
    • Definition: the central value of a data set.
    • Example data set: 5 numbers (e.g., money in wallets) could yield a mean of a certain value.
  • Standard Deviation (SD):
    • Definition: a measure of spread; how much data vary from the mean.
    • Lower SD implies data are clustered around the mean; higher SD implies more spread.
    • Notation: σ\sigma (lowercase sigma).
    • Formula (conceptual): σ=1N<em>i=1N(x</em>iμ)2\sigma = \sqrt{\frac{1}{N} \sum<em>{i=1}^N (x</em>i - \mu)^2} where μ\mu is the mean.
  • Data point distance from the mean:
    • Measured in units of standard deviation (i.e., z-score concept, number of SDs away from the mean).
  • Normal distribution and the 68-95-99.7 rule:
    • About 68% of data fall within 1σ1\sigma of the mean.
    • About 95% of data fall within 2σ2\sigma of the mean.
    • About 99.7% of data fall within 3σ3\sigma of the mean.
  • Applied example (height example):
    • Average height = 5'10" (70 inches) with a SD of 3 inches.
    • 68% of men are between 67 and 73 inches; 95% between 64 and 76 inches; 99.7% between 61 and 79 inches.
    • Very extreme deviations (e.g., > 3σ) are rare (about 0.3% beyond ±3σ).
  • Five-sigma rule (particle physics):
    • A result that is five standard deviations away from the mean: FDA-like standard for discovery, with a ~1 in 3,500,000 chance of being due to random fluctuation.
    • expresses extremely strong evidence against random variation for that result: 5σ5\sigma.
  • Statistical significance and p-values:
    • Statistical significance assesses the probability that observed results would occur if there were no real effect.
    • P-value (p): probability of obtaining results as extreme as observed, under the null hypothesis.
    • Common threshold: p < 0.05 for statistical significance.
  • Null hypothesis (H0):
    • Proposes there is no effect or no difference between groups or conditions.
    • Rejection of H0 suggests a real effect or difference; failure to reject does not prove no effect, only insufficient evidence.
  • Practical examples used in class:
    • Correlation example with p = 0.01: strong positive correlation between hours slept and happiness; r = 0.8; p = 0.01 indicates the result is unlikely due to chance (1% probability).
    • Hair length vs. academic achievement: r = 0.3 with p = 0.40; weak relationship and not statistically significant (40% chance the result could occur by random variation).
  • Interpreting correlations and p-values together:
    • A high |r| near 1 with a small p-value indicates a strong, statistically significant relationship.
    • A modest r with a large p-value suggests a weak or non-significant relationship.

Data Visualization: Bar Charts, Histograms, Pie Charts, and Scatterplots

  • Bar charts:
    • Use vertical axis to show frequency or value for distinct categories.
    • Each bar represents a distinct category; bars do not touch.
    • If bars touch, the chart is a histogram (continuous data on the x-axis).
    • Can have multiple variables with a legend if needed.
  • Histograms:
    • Similar appearance to bar charts, but bars touch to indicate continuous data on the x-axis.
    • Common examples: test scores, time scales (months, years).
  • Pie charts:
    • Circular graphs showing percentages; entire circle represents 100% of the data.
    • Slices represent proportions, e.g., 50% = half circle, 25% = quarter, etc.
  • Scatterplots/scattergrams:
    • Plot paired data points to examine relationships between two quantitative variables.
    • Used to visualize correlations (positive or negative).
    • See related video on correlations for more detail.
  • Operationalization and labeling:
    • Identify the variables (which are IVs and DVs) in the study question.
    • Label axes clearly; provide a title that reflects the operationalized variables.
  • Exam tip: be able to identify the appropriate graph type for a given data scenario and justify axis labels and variable roles.

Reading and Interpreting Research Scenarios (Correlational vs Experimental)

  • Correlational study example:
    • Scientists study backgrounds of children in foster care and find abuse-related histories; conclusion: correlational, since they analyze existing records without manipulating variables.
  • Experimental scenario example:
    • A physician tests a new drug by randomly assigning patients to receive the drug or not (coin flip), then follows up after 3 months.
    • This is an experimental design because there is manipulation of the IV (drug vs no drug).
  • Quick rule of thumb:
    • If the study involves manipulation of an IV to observe DV differences, it’s experimental.
    • If the study only observes existing differences without manipulation, it’s correlational.

Quiz and Study Tips (What to Review for the Exam)

  • Distinguish correlational vs experimental research projects.
  • Review random sampling vs random assignment; know how they differ and how they are used in studies.
  • Understand experimenter bias and participant bias; why blinding can help reduce bias.
  • Be comfortable with informed consent, debriefing, and confidentiality as ethical considerations.
  • Be able to identify the independent variable, dependent variable, experimental group, and control group from a scenario.
  • Be familiar with basic statistics: mean, standard deviation, and the interpretation of the 68-95-99.7 rule.
  • Know how to interpret r and p values: strength and significance; what constitutes a strong vs weak correlation; what p-values imply about chance.
  • Recognize common data-visualization types and their appropriate use (bar chart vs histogram vs pie chart vs scatterplot).
  • Practice interpreting short vignettes: determine whether the described study is correlational or experimental, and identify the key variables and group assignments.
  • Remember to review foundational personalities and ideas from early psychology as mentioned in course materials; bolded terms and names often appear on quizzes.
  • Be prepared to read brief experimental descriptions and identify which part is the dependent variable, the independent variable, and the study design.

Quick Practice Scenarios (Applied Lines from Transcript)

  • Scenario: Researchers compare depression scores between an experimental group receiving a new drug and a control group receiving no drug.
    • Determine design type: experimental (random assignment/manipulation of IV by drug condition).
  • Scenario: A correlation study reports r = 0.30 with p = 0.40 between hair length and academic achievement.
    • Conclusion: weak, non-significant relationship; not statistically significant at conventional α = 0.05.

Key Definitions (Glossary)

  • Correlation coefficient: rr, measures the strength and direction of a linear relationship between two variables; r [1,1]r \in\ [-1,1].
  • Positive correlation: as one variable increases, the other also increases.
  • Negative correlation: as one variable increases, the other decreases.
  • No correlation: no discernible relationship pattern on scatter plot.
  • Independent variable (IV): the variable manipulated by the researcher.
  • Dependent variable (DV): the outcome measured by the researcher.
  • Experimental group: participants exposed to the IV manipulation.
  • Control group: participants not exposed to the IV manipulation; baseline for comparison.
  • Random sampling vs random assignment:
    • Random sampling: how participants are drawn from the population.
    • Random assignment: how participants are allocated to experimental vs control groups.
  • Bias: systematic deviation from the truth due to expectations or procedures.
  • Blinding: method to reduce bias by keeping participants (and sometimes researchers) unaware of the condition.
  • Informed consent: participants’ agreement to participate after being informed of risks and procedures.
  • Debriefing: post-study explanation to participants.
  • Confidentiality: protection of participants’ data.
  • Meta-analysis: synthesis of results across multiple studies to answer a research question.
  • Mean: the average of a data set.
  • Standard deviation: a measure of spread around the mean; how data vary from the mean, denoted σ\sigma.
  • The 68-95-99.7 rule: about 68% within 1σ1\sigma, about 95% within 2σ2\sigma, about 99.7% within 3σ3\sigma.
  • Statistical significance: probability that results would occur by chance under the null hypothesis; often tested with the p-value p < 0.05.
  • Null hypothesis: there is no effect or difference; no relationship between variables.
  • Five-sigma rule: 5σ5\sigma, indicating a very high level of evidence against chance (very small probability of random fluctuation).

Note on Graphical Conventions (From Transcript)

  • Bar charts: discrete categories; bars do not touch.
  • Histograms: continuous data; bars touch.
  • Pie charts: circular representation of percentages; 100% total circle; slices proportional to data.
  • Scattergrams/scatter plots: display pairs of data to illustrate correlations.
  • When labeling graphs: clearly identify the axes, specify the operationalized variables, and provide a descriptive title.