Notes on Correlation, Experimental Design, and Basic Statistics in Psychology

Correlation, Causation, and Quantitative Basics

Focus: moving from qualitative discussion to quantitative analysis; key goal is to determine if a relationship exists between two variables and if one can predict the other's outcome.
Example variables: height and weight; ask whether higher height associates with higher weight (positive correlation).
Core concept: correlation coefficient (r) measures the rate and direction of a relationship; r is a number between
-1 and 1: $r \, \in \, [-1, 1]$
Visual aid: scatter plot to display pairs of data points and assess the form of the relationship.
Positive correlation:
- Both variables move in the same direction (e.g., height and weight tend to increase together).
- Example wording: as one increases, the other tends to increase.
Negative correlation:
- Variables move in opposite directions (e.g., as one increases, the other decreases).
- Example wording: as one increases, the other decreases.
No correlation:
- Data points show no clear pattern or directionality; little to no relationship.
- Example wording: hours of sleep vs shoe size likely show no meaningful relationship; points scattered with no pattern.
Important nuance:
- Negative correlation does not mean there is no connection; it means the relationship direction is inverse.
- No correlation is not the same as a negative correlation.
Strength of correlation:
- The closer |r| is to 1, the stronger the relationship; |r| near 1 indicates a stronger correlation.
Example values and interpretation:
- Happiness and depression: $r = -0.92$ ; strong negative correlation (as happiness increases, depression tends to decrease).
- A correlation of $|r|=0.8$ indicates a strong positive correlation; the closer to 1, the stronger.
Important caveat: correlations do not imply causation.
- Example: ice cream sales and shark attacks both rise in the summer; a positive correlation does not mean ice cream causes shark attacks.
- Possible third-variable explanation: seasonality, heat, outdoor activity could influence both.
- Always consider potential confounding/third variables rather than inferring cause-and-effect from a correlation alone.
Terminology and practice:
- Correlation describes a relationship; causation requires demonstrating a cause-and-effect link via manipulation and control.
- When there is a correlation, researchers may look for possible causal pathways, but evidence must support causality beyond correlation.
Quick practice problem cues:
- If you see a correlation of $r = 0.80$ with $p = 0.01$ , interpret as a strong positive correlation with a 1% probability the result occurred by chance; statistically significant.
- If you see a correlation of $r = 0.30$ with $p = 0.40$ , the relationship is weak and not statistically significant.

Independent and Dependent Variables; Experimental vs Correlational Design

Independent variable (IV): the variable the researcher manipulates or changes to observe effects.
Dependent variable (DV): the outcome measured by the researcher.
Experimental setup example: studying whether violent TV shows influence violent behavior.
- IV: level of violence in TV programming (manipulated by researcher).
- DV: observed violent behavior.
Another example: sunshine exposure and happiness.
- IV: sunshine exposure (e.g., 30 minutes outside vs no instruction).
- DV: happiness levels measured after exposure.
Experimental group vs control group:
- Experimental group: receives the manipulated IV (e.g., new therapy, 30 minutes sunshine).
- Control group: does not receive the manipulated IV (baseline or standard condition).
Random assignment (vs random sampling):
- Random sampling: how participants are selected from a population to be in the study.
- Random assignment: how participants are allocated to experimental vs control groups after selection.
- Both are distinct steps; random sampling precedes the experiment; random assignment occurs within the study to create equivalent groups.
Experimental examples:
- Therapy study: Group 1 receives standard therapy (control); Group 2 receives a new therapy (experimental).
- Sunshine study: Group A spends 30 minutes outside (experimental); Group B has no sunshine exposure (control).
Experimental terminology:
- Experimental group: receives the manipulated variable.
- Control group: does not receive the manipulated variable; serves as a baseline comparator.
Sampling and assignment terms comparison:
- Random sampling: selecting participants from the population to participate.
- Random assignment: assigning participants to groups to avoid bias after they are in the study.

Bias, Blinding, and Ethics in Research

Bias in experiments:
- Experimenter bias: researcher's expectations might influence data collection or interpretation.
- Participant bias: participants' awareness of the study can influence their responses or behavior.
Blinding to reduce bias:
- Single-blind study: the participants do not know which condition they are in, but the researcher does.
- This helps minimize participant expectancy effects; sometimes the researcher’s knowledge could still influence data collection.
- Double-blind studies (not described in the transcript) blind both participants and researchers to condition to further reduce bias.
Informed consent:
- Participants are informed about the study, potential risks, and what participation entails.
- A form is read and signed before participation to document consent.
- Ensures participants understand potential harms and benefits; protects against ethical concerns.
Debriefing:
- Post-study explanation of the study's purpose and procedures to participants.
- Important for transparency and to mitigate any potential distress caused by the study.
Confidentiality:
- Protect participant information; ensure data privacy and secure handling of records.
General ethical issues:
- Avoid causing unnecessary emotional distress or harm.
- Care for participants’ well-being throughout the study.
- Ensure proper consent, debriefing, and confidentiality.
Meta-analysis (overview):
- Not an experiment; a research method that analyzes and synthesizes findings from multiple existing studies.
- Used to answer questions by aggregating evidence across studies rather than collecting new data.

Basic Statistics for Psychology; Mean, Standard Deviation, and Significance

Mean (average):
- Definition: the central value of a data set.
- Example data set: 5 numbers (e.g., money in wallets) could yield a mean of a certain value.
Standard Deviation (SD):
- Definition: a measure of spread; how much data vary from the mean.
- Lower SD implies data are clustered around the mean; higher SD implies more spread.
- Notation: $\sigma$ (lowercase sigma).
- Formula (conceptual): $\sigma = \sqrt{\frac{1}{N} \sum<em>{i=1}^N (x</em>i - \mu)^2}$ where $\mu$ is the mean.
Data point distance from the mean:
- Measured in units of standard deviation (i.e., z-score concept, number of SDs away from the mean).
Normal distribution and the 68-95-99.7 rule:
- About 68% of data fall within $1\sigma$ of the mean.
- About 95% of data fall within $2\sigma$ of the mean.
- About 99.7% of data fall within $3\sigma$ of the mean.
Applied example (height example):
- Average height = 5'10" (70 inches) with a SD of 3 inches.
- 68% of men are between 67 and 73 inches; 95% between 64 and 76 inches; 99.7% between 61 and 79 inches.
- Very extreme deviations (e.g., > 3σ) are rare (about 0.3% beyond ±3σ).
Five-sigma rule (particle physics):
- A result that is five standard deviations away from the mean: FDA-like standard for discovery, with a ~1 in 3,500,000 chance of being due to random fluctuation.
- expresses extremely strong evidence against random variation for that result: $5\sigma$ .
Statistical significance and p-values:
- Statistical significance assesses the probability that observed results would occur if there were no real effect.
- P-value (p): probability of obtaining results as extreme as observed, under the null hypothesis.
- Common threshold: p < 0.05 for statistical significance.
Null hypothesis (H0):
- Proposes there is no effect or no difference between groups or conditions.
- Rejection of H0 suggests a real effect or difference; failure to reject does not prove no effect, only insufficient evidence.
Practical examples used in class:
- Correlation example with p = 0.01: strong positive correlation between hours slept and happiness; r = 0.8; p = 0.01 indicates the result is unlikely due to chance (1% probability).
- Hair length vs. academic achievement: r = 0.3 with p = 0.40; weak relationship and not statistically significant (40% chance the result could occur by random variation).
Interpreting correlations and p-values together:
- A high |r| near 1 with a small p-value indicates a strong, statistically significant relationship.
- A modest r with a large p-value suggests a weak or non-significant relationship.

Data Visualization: Bar Charts, Histograms, Pie Charts, and Scatterplots

Bar charts:
- Use vertical axis to show frequency or value for distinct categories.
- Each bar represents a distinct category; bars do not touch.
- If bars touch, the chart is a histogram (continuous data on the x-axis).
- Can have multiple variables with a legend if needed.
Histograms:
- Similar appearance to bar charts, but bars touch to indicate continuous data on the x-axis.
- Common examples: test scores, time scales (months, years).
Pie charts:
- Circular graphs showing percentages; entire circle represents 100% of the data.
- Slices represent proportions, e.g., 50% = half circle, 25% = quarter, etc.
Scatterplots/scattergrams:
- Plot paired data points to examine relationships between two quantitative variables.
- Used to visualize correlations (positive or negative).
- See related video on correlations for more detail.
Operationalization and labeling:
- Identify the variables (which are IVs and DVs) in the study question.
- Label axes clearly; provide a title that reflects the operationalized variables.
Exam tip: be able to identify the appropriate graph type for a given data scenario and justify axis labels and variable roles.

Reading and Interpreting Research Scenarios (Correlational vs Experimental)

Correlational study example:
- Scientists study backgrounds of children in foster care and find abuse-related histories; conclusion: correlational, since they analyze existing records without manipulating variables.
Experimental scenario example:
- A physician tests a new drug by randomly assigning patients to receive the drug or not (coin flip), then follows up after 3 months.
- This is an experimental design because there is manipulation of the IV (drug vs no drug).
Quick rule of thumb:
- If the study involves manipulation of an IV to observe DV differences, it’s experimental.
- If the study only observes existing differences without manipulation, it’s correlational.

Quiz and Study Tips (What to Review for the Exam)

Distinguish correlational vs experimental research projects.
Review random sampling vs random assignment; know how they differ and how they are used in studies.
Understand experimenter bias and participant bias; why blinding can help reduce bias.
Be comfortable with informed consent, debriefing, and confidentiality as ethical considerations.
Be able to identify the independent variable, dependent variable, experimental group, and control group from a scenario.
Be familiar with basic statistics: mean, standard deviation, and the interpretation of the 68-95-99.7 rule.
Know how to interpret r and p values: strength and significance; what constitutes a strong vs weak correlation; what p-values imply about chance.
Recognize common data-visualization types and their appropriate use (bar chart vs histogram vs pie chart vs scatterplot).
Practice interpreting short vignettes: determine whether the described study is correlational or experimental, and identify the key variables and group assignments.
Remember to review foundational personalities and ideas from early psychology as mentioned in course materials; bolded terms and names often appear on quizzes.
Be prepared to read brief experimental descriptions and identify which part is the dependent variable, the independent variable, and the study design.

Quick Practice Scenarios (Applied Lines from Transcript)

Scenario: Researchers compare depression scores between an experimental group receiving a new drug and a control group receiving no drug.
- Determine design type: experimental (random assignment/manipulation of IV by drug condition).
Scenario: A correlation study reports r = 0.30 with p = 0.40 between hair length and academic achievement.
- Conclusion: weak, non-significant relationship; not statistically significant at conventional α = 0.05.

Key Definitions (Glossary)

Correlation coefficient: $r$ , measures the strength and direction of a linear relationship between two variables; $r \in\ [-1,1]$ .
Positive correlation: as one variable increases, the other also increases.
Negative correlation: as one variable increases, the other decreases.
No correlation: no discernible relationship pattern on scatter plot.
Independent variable (IV): the variable manipulated by the researcher.
Dependent variable (DV): the outcome measured by the researcher.
Experimental group: participants exposed to the IV manipulation.
Control group: participants not exposed to the IV manipulation; baseline for comparison.
Random sampling vs random assignment:
- Random sampling: how participants are drawn from the population.
- Random assignment: how participants are allocated to experimental vs control groups.
Bias: systematic deviation from the truth due to expectations or procedures.
Blinding: method to reduce bias by keeping participants (and sometimes researchers) unaware of the condition.
Informed consent: participants’ agreement to participate after being informed of risks and procedures.
Debriefing: post-study explanation to participants.
Confidentiality: protection of participants’ data.
Meta-analysis: synthesis of results across multiple studies to answer a research question.
Mean: the average of a data set.
Standard deviation: a measure of spread around the mean; how data vary from the mean, denoted $\sigma$ .
The 68-95-99.7 rule: about 68% within $1\sigma$ , about 95% within $2\sigma$ , about 99.7% within $3\sigma$ .
Statistical significance: probability that results would occur by chance under the null hypothesis; often tested with the p-value p < 0.05.
Null hypothesis: there is no effect or difference; no relationship between variables.
Five-sigma rule: $5\sigma$ , indicating a very high level of evidence against chance (very small probability of random fluctuation).

Note on Graphical Conventions (From Transcript)

Bar charts: discrete categories; bars do not touch.
Histograms: continuous data; bars touch.
Pie charts: circular representation of percentages; 100% total circle; slices proportional to data.
Scattergrams/scatter plots: display pairs of data to illustrate correlations.
When labeling graphs: clearly identify the axes, specify the operationalized variables, and provide a descriptive title.