Midsemester Exam

Lecture 1

Quick Maths Refresher

Operations needed: +,−,×,÷,2, +,−,×,÷,2, +, -, \times, \div, \; ^2, \sqrt{\ } only.
Order of operations: BOMDAS/BODMAS (Brackets → Orders → Multiply/Divide → Add/Subtract).
Squaring removes sign: (−3)2=9(−3)2=9(-3)^2 = 9.
Formula anatomy: ∑X∑X\sum X means “add all X’s”; formula is just a recipe.
Example mean formula: Xˉ=∑XNXˉ=N∑X\bar X = \frac{\sum X}{N}.

Why Research Methods & Statistics Matter

Ubiquity: clinical practice, HR, sports psych, marketing, media literacy—need to evaluate evidence.
Stats vaccinate against misinformation; help decide treatment efficacy, interpret polls, etc.
Mathematics anxiety: stats here uses Yr 8 math; software handles heavy lifting—concepts are key.
Careers: continual need to integrate new research; stats essential for “informed consumer” role.

Science, Psychology & Knowledge Acquisition

Psychology breadth: 56 APA divisions; thousands of specialized journals.
Peer-review process: manuscript → journal → anonymous expert reviewers → revisions → publication.
Knowledge sources:
1. Personal experience
2. Authority (good when justified)
3. Reason (logic/syllogism)
4. Empiricism (measurement) ← core of science.
Neil deGrasse Tyson quote: goal = avoid fooling ourselves (false positives/negatives).

Scientific Method Essentials

Science = process, not fact list; combines rationalism (theory) + empiricism (observation).
Key attributes:
- Systematic observation (standardised measures)
- Critical analysis & tentative conclusions
- Openness (methods, data, code sharing)
- Independence from authority (data over prestige)
- Focus on solvable, falsifiable problems.
Goals: Describe, Explain, Predict, Control (e.g., cigarette warnings, traffic signs).

Theories & Hypotheses

Scientific theory: coherent set of statements summarising existing facts & offering explanatory mechanism (e.g., Piaget’s stages of cognitive development).
Must be falsifiable: specify conditions that could disprove it (e.g., invisible dragon analogy, Benjamin Rush bloodletting example).
Hypothesis: precise, testable prediction derived from theory (theory “places the bet”).
Good hypotheses: measurable variables, clear criteria, non-circular.

Principles Recap

Empiricism & Objectivity – measurable, replicable.
Skepticism & Critical Doubt – search for alternative explanations (Occam’s Razor: simplest that fits all data).
Openness – full disclosure, replication, data/code sharing.
Tentativeness – willingness to revise or discard theories with new evidence.
Anti-authoritarian – claims judged solely by evidence.

Lifecycle of a Psychological Study

Theory/Observation → question.
Derive falsifiable hypotheses.
Design study (participants, variables, procedure, measurement tools).
Collect data (systematic, ethical).
Analyse (stats) → descriptive & inferential results.
Interpret relative to hypotheses & theory.
Report (peer-review journal, open data/code).
Community replication & cumulative evidence (support, refine, refute).

Lecture 2

The core aim of research design

Aim: to engineer logically sound procedures to test hypotheses or predictions.
Core requirement: the design must be able to determine whether manipulation of the IV yields the predicted effect on the DV.
Key concept: eliminate confounds and extraneous variability so observed differences in the DV can be attributed to the IV rather than other factors.

The sequence of a typical psychology research project (diagrammed)

Start with a theory.
Derive predictions or hypotheses from the theory.
Design a study to test those predictions.
Collect data and perform analyses.
Interpret results and write a report.
Use results to refine the theory or return to the drawing board if results are not informative.
Theoretical development and empirical testing proceed iteratively.

Quantitative vs qualitative research (contextual note)

Two broad branches of quantitative approaches: experimental and observational.
Qualitative research focuses on lived experience, feelings, and non-numeric data (e.g., interviews, focus groups, document analysis, indigenous knowledge traditions).
For the bulk of this course, the focus is on quantitative methods (experimental, quasi-experimental, correlational).
Acknowledge that qualitative methods have value for different kinds of knowledge but are not the emphasis here.

Quantitative approaches: experimental, quasi-experimental, correlational

Experimental: direct manipulation of the IV and random assignment of participants to conditions.
Quasi-experimental: manipulation is not possible or ethical (e.g., natural groups); assignment to groups is not random.
Correlational: no manipulation; examine relationships between natural variations in variables.
Central caution: correlation does not imply causation; experiments are designed to address causality.

Formulating questions and hypotheses

Hypothesis formation can start from theory or a priori hunches.
Example question: "Do people's memory capacities grow as they age?" (illustrative, not prescriptive for exams).
When designing studies, specify terms clearly to avoid ambiguity at the end of the study.
The null hypothesis (H0) and the alternative hypothesis (H1) are central:
- H0:extThereisnorelationshipbetweentheIVandtheDV.H0:extThereisnorelationshipbetweentheIVandtheDV.H_0: ext{There is no relationship between the IV and the DV.}
- H1:extThereisarelationshipbetweentheIVandtheDV.H1:extThereisarelationshipbetweentheIVandtheDV.H_1: ext{There is a relationship between the IV and the DV.}
The outcome of statistical tests: either fail to reject the null hypothesis (results are consistent with chance) or reject the null hypothesis (results are unlikely under chance), leading to tentative support for the alternative hypothesis.

IV (independent variable) and DV (dependent variable)

IV: the variable directly manipulated by the experimenter; levels are deliberately created (e.g., smoking vs non-smoking).
DV: the measure presumed to be affected by the IV (e.g., memory performance, doctor visits).
The goal is to ensure that observed DV differences are due to manipulations of the IV rather than other factors (confounds).
Operational definitions: precise, concrete definitions of how the variables are measured or manipulated to reduce ambiguity and wiggle room.
- Example: memory defined as working memory measured by recalling a list of words over a 3-minute period.
- Another example: infant imitation studied by a clearly defined criterion for tongue poking (forward tongue thrust crossing the back edge of the lower lip).
Converging operations: using multiple operational definitions or contexts to converge on the same conclusion about the underlying construct.

Operational definitions and converging operations (example)

Virginia Slaughter infant mimicry study:
- Operational definition for a mimic response: a clear, specific criterion for a tongue poke (e.g., forward thrust crossing the back edge of the lower lip).
- Counts: 0 = no response, 1 = partial response, 2 = full mimic response.
Rationale: precise definitions reduce variability and ensure consistency across observers and trials.

Variability in data: random variability, individual differences, and extraneous variables

Random variability (noise): inherent differences among individuals and conditions that obscure the signal (the effect of the IV on the DV).
Illustration: a TV signal with white noise; the signal is easier to detect when noise is reduced.
Individual differences: stable differences among individuals (e.g., age, IQ, personality, memory capacity, attention).
Situational variables: factors in the testing environment (e.g., temperature, noise, time of day, instructions, experimenter differences) that can introduce variability.
The goal of good design: minimize random variability and control for individual and situational differences when possible.
Example of constrained sampling: narrowing the sample to psychology students to reduce variability in a memory task results can yield clearer group differences.

Random allocation and control of variability

Random allocation (random assignment): each participant has an equal and independent chance of being placed in any group (e.g., coin flip with a plan to ensure equal group sizes via smarter randomization).
Purpose: to balance groups on known and unknown factors, thus reducing systematic differences other than the IV.
Example: 100 participants, randomly assigned to 50 in the smoking group and 50 in the non-smoking group; randomization distributes confounds across groups.
Control groups: baseline group that experiences everything except the critical manipulation; used to isolate the effect of the IV.
- Example: a distraction manipulation study where the control group is not distracted.
- The control condition should be identical to the experimental condition in every way except the IV of interest.

Experimental design: advantages and limitations

Advantages:
- Strongest evidence for causality when random assignment and manipulation are possible.
- Allows examination of effects outside the typical range (e.g., extreme manipulations) to test assumptions about the relationship.
Disadvantages and limitations:
- Random assignment is not always possible for certain variables (e.g., IQ, innate visual acuity).
- Ethical constraints prohibit certain manipulations (e.g., instructing people to smoke 30 cigarettes/day).
- Some questions cannot be addressed with experiments due to practicality or ethics; alternative designs are needed.

Quasi-experimental designs

Used when random assignment to conditions is not possible or ethical.
Example: compare existing smokers to non-smokers on doctor visits; groups are pre-existing and not randomly assigned.
Strengths: can study real-world groups and phenomena; ethical and practical feasibility when true experiments are not possible.
Limitations: weaker causal inferences due to potential pre-existing differences between groups not controlled by randomization.
Key distinctions from true experiments:
- IV is not directly manipulated by the experimenter.
- Participants cannot be randomly assigned to conditions.
- We attempt to equate groups on measured variables, but unmeasured differences may still confound outcomes.

How to distinguish experimental vs quasi-experimental vs correlational designs (practical guidance)

If the IV has levels created by the experimenter and participants can be in all conditions via random assignment or repeated measures, it’s an experiment.
If the IV is measured rather than manipulated and groups are pre-existing or non-random, it’s a quasi-experiment or a correlational study depending on structure.
Repeated measures: participants experience multiple conditions (e.g., with and without distraction) allowing within-subject comparisons; often more sensitive than between-subject designs.

Correlational studies: strengths and weaknesses

Strengths:
- Useful when random assignment is impractical or unethical.
- Ecological validity: naturalistic and reflective of real-world variation (e.g., people’s actual smoking behavior and doctor visits).
- Useful for exploring natural ranges of a variable (e.g., how many cigarettes smoked per day relates to doctor visits).
Weaknesses:
- Cannot infer causality: association does not imply a causal link because of potential third variables (confounds) and bidirectional possibilities.
- Susceptible to third-variable problems (e.g., risk-taking or genetic factors may influence both smoking and doctor visits).
Diagnostic example: ultrasound during pregnancy and birth weight; correlation observed historically; later research showed ultrasound is not harmful; the initial correlation was due to confounding factors and parental health status.
Concept of ecological validity: correlational designs often reflect real-world conditions more closely than tightly controlled experiments, but causality remains uncertain.

Confounding variables and extraneous variables

Confounding variable (confound): a variable that varies systematically with the IV and could account for observed DV differences; it threatens causal interpretation.
Extraneous variable: not measured or controlled but present; can contribute to variability and obscure the true relationship.
Examples of confounds and extraneous factors:
- Confounded with age in an age-by-smoking study (older age could explain more illnesses regardless of smoking).
- Time of day confounding in a sleep vs wakefulness study (morning vs afternoon testing could affect reaction times independently of sleep).
- IQ differences that correlate with reaction time or other outcomes could confound results.
Consequences of confounds:
- If confounds are present, you cannot confidently attribute DV differences to the IV.
- Effects may cancel out, producing a null result even when an effect exists.
Strategies to mitigate confounds:
- Use random assignment to balance confounds across groups.
- Use age matching, gender matching, or other matching techniques to ensure groups are comparable on potential confounds.
- Use repeated-measures designs where participants serve as their own controls to reduce between-subject confounding.
- Ensure identical procedures and instructions across groups to minimize situational confounds.
- Balance testing times (morning vs afternoon) or test all at the same time to control for time-of-day effects.

Practical design considerations and terminology

Variables:
- Confounding variable: systematically varies with the IV and threatens internal validity.
- Extraneous variable: not measured or controlled but can influence DV and introduce noise.
Two main aims in research design:
- Eliminate confounding variables so the observed relationship (IV → DV) is interpretable.
- Minimize random variability (noise) to enhance signal detection and statistical power.
Methods to reduce random variability:
- Use controlled lab settings (dim rooms, breaks, calibrated equipment).
- Keep task length manageable to avoid fatigue and maintain stable performance.
- Use homogeneous participant pools when appropriate to reduce variability, while balancing external validity needs.

Example scenarios and takeaways

Example 1: Cigarettes and doctor visits (randomized experiment)
- IV: Number of cigarettes smoked per day (e.g., 30 vs 0); DV: number of doctor visits per year.
- Random assignment to smoking vs non-smoking groups helps infer causality; confounds like baseline health and age must be controlled.
- Ethical concerns limit manipulating harmful behaviors; hence quasi-experimental or observational designs may be used in practice.
Example 2: Sleep and reaction time (confounded by time of day)
- Random assignment to sleep vs no sleep groups with testing times mismatched (afternoon vs morning) creates a confound.
- Solutions: test all at the same time, balance testing times across groups, or use a repeated-measures design.
Example 3: Ultrasound and birth weight
- Early correlational finding suggested ultrasound could affect birth weight; later research identified confounds (maternal health, reasons for ultrasound).
- Emphasizes why correlational findings require cautious interpretation and follow-up causal research where possible.

Summary of study types and decision rules

Experimental study:
- IV directly manipulated by the experimenter.
- Random assignment to conditions (balanced groups).
- Strong causal inference when feasible.
Quasi-experimental study:
- IV not manipulated; groups pre-existing or formed through non-random assignment.
- Some causal inferences possible but weaker due to potential pre-existing differences.
Correlational study:
- No manipulation of the IV; examines relationships between naturally varying variables.
- Useful for naturalistic data and hypothesis generation but cannot establish causality.

Quick decision checklist for class papers (to assess study type)

Does the IV have levels created by the experimenter?
Can participants be in any condition (random assignment or repeated measures)?
Could a participant take part in all conditions? (If yes, suggests an experiment; if no, may be quasi-experimental.)
Is the IV manipulated and randomly allocated, or merely measured?
Are there control groups that isolate the critical manipulation?

What’s coming up in the course (logistics and readings)

Next week’s schedule and tutorials:
- No tutorials next week due to public holiday; tutorials resume the following week.
- In the upcoming tutorials, you will need a laptop for spreadsheet exercises (Excel on Windows, Numbers on Mac; OpenOffice or Google Sheets as alternatives).
Quiz updates:
- Quiz 2 will go up and be available for a limited window (e.g., until 5 PM Monday).
- Quiz 1 had some non-completions; students are encouraged to complete quizzes (each worth 1% of the grade; three quizzes available to drop with no impact).
Readings for next week: Grove Chapter 2, covering material from today and next week; practice questions on Blackboard.
Instructor tips:
- Engage with the global framework of design and statistics.
- Use converging operations and multiple contexts to support conclusions.
- Consider ethical implications and practical limitations in study design.
Final note: if you have questions, email the course lecturer (address provided on Blackboard).

Key terms and formulas (LaTeX notation)

Null hypothesis: H0:extThereisnorelationshipbetweentheIVandDV.H0:extThereisnorelationshipbetweentheIVandDV.H_0: ext{There is no relationship between the IV and DV.}
Alternative hypothesis: H1:extThereisarelationshipbetweentheIVandDV.H1:extThereisarelationshipbetweentheIVandDV.H_1: ext{There is a relationship between the IV and DV.}
Independent variable (IV): the variable manipulated by the experimenter.
Dependent variable (DV): the measure presumed to be affected by the IV.
Operational definition: precise, unambiguous definitions of how variables are measured or manipulated (e.g., memory defined as the ability to recall a list of words after 3 minutes).
Correlation coefficient: rrr (e.g., Pearson's r) used to quantify the strength and direction of a linear relationship between two continuous variables.
Converging operations: using multiple operational definitions to reach the same conclusion about a construct.

Final takeaway

The core aim of research design is to create logically sound procedures that allow one to test hypotheses and draw causal inferences where possible, while carefully controlling for random variability and confounding factors. The framework covers experimental, quasi-experimental, and correlational designs, each with appropriate strengths and limitations. The emphasis is on understanding the design principles well enough to apply them across contexts and real-world problems.

Reading 1

Chapter 1 – The Scientific Method

1.1 Introduction

• Everyday stereotype: psychology = Sigmund Freud, Dr Phil, clinical work.
• Reality: > 505050 APA divisions (e.g., Military, Sport, Organisational, Law, Addictions).
• Unifying feature across this diversity = commitment to the scientific method.

1.2 Psychology & the Scientific Process

• Epistemology asks, “How do we know what we know?” Five common sources:
– Personal experience / intuition ➜ prone to selective perception & over-generalisation (jogging–knee myth).
– Authority ➜ efficient but credentials & evidence often unquestioned.
– Rationalism (reason) ➜ syllogisms valid only if premises sound (e.g., “All brown dogs are friendly” vs “All men are mortal”).
– Empiricism (observation) ➜ observations filtered by senses, culture, mood, intoxication.
– Science ➜ combines reason & empirical evidence within a self-correcting method.

1.2.5 Knowledge from Science – 5 Core Attitudes

• Objectivity
• Skepticism
• Openness / open-mindedness
• Tentativeness
• Independence from authority

1.3 Principles of the Scientific Method

• Objectivity – evidence observable by anyone; use physiological indices (heart-rate, GSR) instead of self-report.
• Skepticism – “Show me the data”; Galileo dropping objects in a vacuum disproved Aristotle.
• Openness – full method disclosure, inter-observer reliability; willingness to revise interpretations.
• Tentativeness – findings always provisional (global-warming debate; Pavlov’s conditioning replicated ∼∼\sim 1,000s1,000s1{,}000s of times).
• Independence from authority – no “because I said so”.

1.4 Assumptions of Science

• Nature is lawfully organised.
• Determinism – lawful relations allow prediction (forgetting curve example).
• Concern with solvable problems – rephrase metaphysical questions into testable forms (bystander effect derived from “Are people good?”).

1.5 Goals of Psychological Science

Describe behaviour (pilot altitude under-estimation at night).
Explain behaviour (lack of visual cues, pseudo-explanations to avoid).
Predict behaviour (night ↔ worse estimates).
Control behaviour (add ground lights ⇒ improved landings).

1.6 Theories & Hypotheses

• Theory = logically organised propositions that summarise, organise, explain & predict.
– Example: Piaget’s 444 developmental stages (sensorimotor 0!–!20!–!20!–!2 yrs … formal operations 11+11+11+ yrs).
– Operational definitions needed (e.g., “object permanence”).
• Hypothesis = specific, testable, tentative prediction derived from theory.
– Null (H0):nodifference;Alternate():nodifference;Alternate(): no difference; Alternate (H1): predicted difference.
• Sources: literature gaps, replication (exact / conceptual), serendipity.
• Falsifiability – must be possible to refute (gravity drop test analogy).
• Pitfalls: circularity, supernatural forces, ill-defined terms, lack of parsimony (Lloyd Morgan’s Canon).

1.7 Scientific vs Non-Scientific Evidence

• Criteria: empirical, objective, systematic, controlled.
• Randomised, repeatable observations resolve disputes.

1.8 Critical Evaluation Checklist

• Did study uphold objectivity, skepticism, openness, tentativeness, independence?
• Evidence test: empirical? objective? systematic? controlled?

1.9 The Scientific Process (Flow-Chart)

Theory → Hypothesis → Design → Data (describe & analyse) →
– Supported → report → replication → scientific “truth”.
– Not supported → refine / discard → new cycle.

Chapter 2 – Research & Experimental Design

2.1 Experiments & Statistics

• Experiment = systematic, objective observations under controlled conditions.
• When true experiments infeasible ➜ observational or quasi-experimental designs.

2.2 General Types of Research

2.2.1 Basic Research

• Knowledge-driven “What happens if…?” (dichotic listening → channel theory).

2.2.2 Applied Research

• Problem-solving (cell-phone use impairs simulated driving even hands-free; Strayer & Johnston, 200120012001).

2.2.3 Sub-types

• Qualitative – rich descriptions (interviews, case studies).
• Quantitative – numerical data analysed statistically (focus of course).

2.2.4 Experimental Variables

• Independent Variable (I.V.) – manipulated / grouping factor.
• Dependent Variable (D.V.) – measured outcome.

2.2.5 Unwanted Variables (Random Error)

• Situational (room temp, noise).
• Individual differences (IQ, motivation).
• Measurement error (experimenter lapse).
– Analogy: static on TV obscures signal.

2.2.6 Confounding Variables

• Vary systematically with I.V. ⇒ alternative explanation (drug A am vs drug B pm; fix by holding constant or counterbalancing).

2.2.7 Control Groups / Conditions

• Baseline for comparison; avoid empty controls; use placebo in clinical trials (magnetic blanket example).

2.3 Three Main Research Approaches

2.3.1 True Randomised Experiments

• Mill’s Joint Method: Agreement (If XXX then YYY) + Difference (If not XXX then not YYY).
• 3 requirements: ≥ 2 I.V. levels; random assignment; control confounds.
• Designs:
– Independent Groups (between-subjects). Example: robot-tickle vs self-tickle; N=32N=32N=32 → 161616 per group.
– Repeated Measures (within-subjects) – same Ps all conditions; counterbalance order; reduces individual-difference error; inappropriate when carry-over (learning) permanent.

2.3.2 Correlational Studies

• Observe natural covariance; scatterplot shows direction/strength; cannot infer causality (smoking ↔ doctor visits N=200N=200N=200).

2.3.3 Quasi-Experiments

• Groups formed by pre-existing traits (light vs heavy smokers, age groups); no random assignment ⇒ limited causal claims.

2.4 Relationships & Causality

• 3 conditions (Shaughnessy & Zechmeister): covariation, time-order, elimination of alternatives.
• Ultrasound–birth-weight case: met covariation but failed time-order & alternative-cause tests.

2.5 Measurement

2.5.1 Scales

• Nominal (labels only: male=0male=0male=0, female=1female=1female=1).
• Ordinal (rank order: 1st, 2nd, 3rd in race).
• Interval (equal units, no true zero: °C, °F).
• Ratio (equal units + absolute zero: Kelvin, reaction time in msmsms).

2.5.2 Quality of Measurement

• Reliability – consistency (rigid ruler vs floppy ruler).
• Validity
– Face validity (looks like it measures construct).
– Predictive validity (OP score predicts uni success).
– Construct validity (IQ correlates with life outcomes in line with theory).

Illustrative Examples & Analogies

• Jogging‐knee anecdote → dangers of over-generalising personal experience.
• Galileo’s feather & boulder in vacuum → skepticism + objectivity.
• Pavlov’s dogs → replication builds confidence/law.
• Kitty Genovese → bystander effect derived from reformulating ethical question.
• Magnetic blankets infomercial → need for proper placebo control.
• Saturday-morning cartoon antenna story → random variability as TV static.

Practical / Ethical / Philosophical Implications

• Tentative nature of science demands willingness to revise policy (e.g., global warming, health guidelines).
• Ethical limits restrict true experimentation (cannot randomise people to smoke). Observational designs fill gap but curb causal claims.
• Independence from authority guards against misinformation from media/political leaders.
• Parsimony cautions against anthropomorphising animal behaviour without necessity.

Lecture 3

Goals of science (clarifying "control").

Four goals of science discussed previously: predict, describe, explain, and control.
There was a moment of confusion about mentioning “explain”; the speaker confirms: describe and control are key components, with the implication that explanation involves understanding mechanisms behind observed relationships.
When scientists say they are "controlling" a variable in the context of study design, they mean reducing unwanted variability to better reveal the effect of the manipulated variables, not the everyday sense of manipulating the phenomenon to demonstrate full understanding.
Analogy: a stop sign controls behavior in everyday life; in experiments, control means limiting sources of variability (noise) so that any observed effect can be attributed more confidently to the manipulations rather than extraneous factors.

Key concepts in design and variability

Four sources of variability to manage in quantitative designs:
- Experimental (systematic) manipulation of the IV and random assignment to conditions.
- Noise/random variability: unpredictable fluctuations across participants or trials.
- Individual differences: stable differences between participants that can obscure effects.
- Confounding variables: variables that covary with the IV and DV and offer alternative explanations for observed effects.
Distinction of “control” meanings:
- In theory-building, control means understanding and manipulating a phenomenon; in design, control means minimizing extraneous variability to reveal causal relationships.
Signal-to-noise ratio: noise (extraneous variability) can obscure the true signal (the effect of interest).
Situational variables (noise): context features like room lighting, sitting position, or distractions that can affect performance independent of the manipulation.
Measurement error: inaccuracies in how outcomes are measured, which add noise and can mask true effects.
Example of measurement error history: past psych experiments relied on human observers and prone to human error; modern computer-based measures reduce this source of error but historical methods illustrate why measurement error matters.

Experimental designs: true experiments, quasi-experiments, and correlational designs

True experiments (experimental design):
- Directly manipulate the independent variable (IV).
- Random assignment of participants to conditions.
- Two core features: manipulation of IV and random assignment to control for extraneous variability.
Quasi-experiments:
- Also called “almost experiments” because they are not fully randomized or manipulated in a controlled way.
- Use existing groups (e.g., smokers vs. non-smokers) when random assignment is infeasible or unethical.
- Strengths: feasible when true experiments aren’t possible; ecological validity can be higher.
- Limitations: higher risk of confounds and lower causal inference strength; may require careful matching, but cannot guarantee equivalence.
Correlational designs:
- No random assignment; variables are measured as they occur in the world.
- Relationships are assessed with a statistical coefficient (e.g., Pearson’s r) and scatter plots.
- Cannot infer causality because direction of effect and third-variable confounds cannot be ruled out.
- Useful for naturalistic testing and when random assignment is not feasible; high ecological validity but limited causal claims.
Key terminology in different designs:
- Experimental design: independent variable (IV) is manipulated; dependent variable (DV) is measured.
- Quasi-experimental design: IV-like variable is used to group participants, but groups are not created by random assignment.
- Correlational design: predictor is the variable hypothesized to predict the other; criterion (or DV) is the outcome measured.
- In correlational designs, the predictor is typically on the x-axis of scatter plots; the criterion on the y-axis.
- In correlational work, terms IV and DV may be used loosely, but best practice in some courses uses predictor and criterion to reflect causal direction assumptions.

Terminology: IV, DV, predictor, and criterion

Independent Variable (IV): the variable that the experimenter directly manipulates in an experiment.
Dependent Variable (DV): the outcome measured, presumed to be influenced by the IV.
In quasi-experiments: the IV-like variable is used to categorize participants, not randomly assigned.
In correlational designs: there is no manipulation and no fixed IV/DV; the variable hypothesized to influence the other is termed the predictor, and the other variable is the criterion (DV).
Note on terminology in class: tutors may refer to the predictor and criterion in correlational designs; using IV/DV in correlational contexts is generally acceptable but can be flagged as imprecise by some instructors.

Noise, measurement error, and confounds: how noise can derail signals

Situational variables can introduce noise (e.g., flickering light, room distractions).
Individual differences: natural variation between participants that can obscure the effect of the IV; more critical in between-subjects designs.
Measurement error: inaccuracies in outcome measurement (e.g., human scoring mistakes in early psych experiments) that obscure true effects.
Confounding variables: variables that covary with the IV and DV, offering alternative explanations for observed effects (e.g., lighting confounds, education level differences across generations in longitudinal studies).
Example of confounding: if a memory task uses distractors in a room with flickering lights, it’s unclear whether observed effects are due to distraction or lighting.
Confounds threaten causal inference; eliminating systematic differences between groups narrows explanations to two primary possibilities:
- Differences due to chance (random variability).
- Differences due to the IV (the manipulation).

Hypotheses and statistical testing: null vs alternative, and what we test

Null hypothesis (H0, H_naught): there is no relationship between the IV and DV (or no difference between groups).
Alternative hypothesis (H1): there is a relationship or a difference; often the direction is hypothesized (e.g., distraction impairs memory).
Important point about hypothesis testing:
- Statistical tests assess the null hypothesis; rejecting H0 suggests the observed results are unlikely due to chance, thus tentatively supporting the alternative.
- We never directly test the alternative; rejection of H0 leaves us with the alternative as the plausible explanation given the data.
- The alternative is always tentative because there are infinitely many plausible alternative explanations and we cannot test them all.
Practical implications for interpretation:
- If results are significant, we say the null hypothesis is rejected; if not, we fail to reject the null.
- The presence of variability in groups does not invalidate the null hypothesis; it reflects real-world variability that must be accounted for in design and analysis.

Approaches to testing: quasi-experiments, correlational studies, and true experiments

Quasi-experiments revisited:
- Use existing groups to address questions when random assignment is not feasible or ethical.
- Examples: longitudinal “age and fluid intelligence” studies where education differences across generations can confound results; later longitudinal designs mitigate these confounds and reveal more gradual declines in fluid intelligence with age.
- Limitation: more vulnerable to confounds; stronger causal claims require careful design and interpretation.
Correlational research revisited:
- Measures a relationship between two variables without manipulating them.
- Pros: naturalistic setting; ecologically valid.
- Cons: cannot infer causality; third variables or reverse causation may explain observed relationships.
- Example: ultrasounds and birth weight; more ultrasounds may be associated with lower birth weight because high-risk pregnancies prompt more ultrasounds, not because ultrasounds cause low birth weight.
True experiments revisited:
- True experiments are distinguished by random assignment and controlled conditions, enabling stronger causal inferences.
- In practice, ethical and logistical constraints often necessitate quasi-experimental or correlational designs.
- When evaluating causal claims, the first question is whether participants were randomly assigned to conditions (random assignment is essential for strong causal claims).
- Random assignment helps ensure equivalence of groups, reducing potential confounds and distributing individual differences and other random factors evenly.

Randomization, control, and experimental integrity

Random assignment vs haphazard group allocation:
- True random assignment uses a defined random process (e.g., computer-generated random numbers) to assign participants to conditions.
- Historically, randomization used random-number tables or physical randomization; modern practice largely uses computers with a random seed based on variable inputs (e.g., current time) to ensure unpredictability.
- Pseudo-randomness is usually sufficient for research purposes; truly random numbers are not strictly necessary for good practice.
Why random assignment matters:
- Avoids systematic differences between groups (e.g., illness prevalence, personality traits) that could confound results.
- In drug trials, randomization helps mitigate placebo effects and expectation biases by equal distribution of expectations across groups.
Break and reminder on terminology:
- The lecturer postpones a formal definition of “treatment” but uses the term to describe the experimental manipulation that is expected to have an effect (e.g., the distraction condition is the treatment in a memory task).
Practical example (randomization in practice):
- Random allocation with group sizes up to 100 participants can be achieved using computer-generated randomization; pseudo-random seeds are fine due to sufficient unpredictability.

Independent groups (between-subjects) vs repeated measures (within-subjects) designs

Independent groups design (between-subjects):
- Each participant is tested in only one condition.
- Advantages: simple to implement; reduces carryover and learning effects within a participant; each person contributes one data point.
- Disadvantages: higher susceptibility to random variability due to individual differences; typically less sensitive to detecting IV effects because of between-subject noise.
- Example: tickling study where one group self-tickles and another group is tickled by a robot; random assignment balances individual differences across groups.
Repeated measures design (within-subjects):
- Each participant experiences all conditions (e.g., control and treatment) and is measured in each.
- Advantages: reduces variability due to individual differences since each person serves as their own control; increases statistical power and sensitivity to detect effects.
- Disadvantages: susceptibility to order effects, fatigue, practice, and carryover effects that can confound results.
- Counterbalancing as a solution: toggling the order of conditions across participants to distribute order effects evenly (e.g., ABBA design).
Key contrasts and implications:
- Independent groups have more noise due to individual differences; repeated measures minimize that noise but introduce order-related confounds that counterbalancing aims to mitigate.
Example with the tickling task:
- Independent groups: 16 participants per group, two separate tables of scores for self-tickling vs robot tickling.
- Repeated measures: each participant has both conditions; fetches two scores per participant, one per condition; data typically shown with lines connecting the two scores for each participant to illustrate within-subject changes.
Practical design considerations:
- Repeated measures designs require careful planning to avoid non-equivalent conditions due to carryover or fatigue; counterbalancing (e.g., ABBA) helps balance practice and fatigue across conditions.
- Balanced design concept: symmetry in the order of conditions across participants helps ensure that order effects do not favor one condition over another.
- When data collection is lengthy (e.g., EEG studies), you may prefer repeated measures for sensitivity but must plan for order effects; counterbalancing is often essential.

Carryover effects, order effects, and counterbalancing

Order effects: outcomes in later conditions are systematically influenced by having completed earlier conditions (e.g., fatigue, learning, practice effects).
Carryover effects: effects of a previous condition persist and influence subsequent conditions, complicating interpretation of the current condition.
Counterbalancing strategies:
- ABBA counterbalancing: half of participants do A then B, the other half do B then A; helps balance both order and carryover effects.
- Balanced design: arranging conditions so that potential confounds are evenly distributed across order sequences.
When counterbalancing cannot resolve concerns:
- Some effects (like long-lasting learning across different teaching methods) may not be fully counterbalanced; in such cases, researchers may need to redesign the experiment or use alternative methods.
Data representation in repeated-measures studies:
- For independent groups: two separate data tables by group with each row representing a participant.
- For repeated measures: a single data table with a row per participant and columns for each condition; you can visualize with connected lines to show within-participant changes.

Real-world and classroom examples used in the lecture

1999 self-tickling experiment (Blakemore, Frith, and Walpitt): within-subjects vs between-subjects control research on ticklishness using a robot that can move in two ways (predictable vs unpredictable). Design emphasizes intra-subject control and careful manipulation of the predictive element to elicit the hypothesized difference in ticklishness.
Spoon creativity task (hypothetical, but used to illustrate order effects): repeat task to measure creativity with different music conditions (pop vs classical); without counterbalancing, order effects would confound the effect of music type on creativity. Counterbalancing (or using two different tasks) is necessary to avoid this confound.
Grip-strength device study (hypothetical): random assignment to training device; potential confounder is researcher encouragement; solution is to equalize encouragement across groups or use a factorial design to study interaction effects (two-by-two design).

Data interpretation and visualization: what to look for

Independent groups design data illustration:
- Separate distributions for each group; significant differences indicate potential effects of the IV but may be obscured by between-group variability.
Repeated measures data illustration:
- Each participant contributes data to every condition; plots often show lines connecting a participant’s two scores to visualize within-subject changes.
Patterns to watch for:
- Large within-subject consistency suggests strong treatment effects; large between-subject variability suggests potential noise that counterbalancing and proper randomization must address.
- Non-overlapping distributions between groups in an independent groups design strengthen confidence in a treatment effect; overlapping distributions indicate higher measurement noise and lower sensitivity.

Reading, preparation, and assessment guidance

Reading assignments for the course progression:
- Grove chapters 1 and 2 (required for this lecture and upcoming assessments).
- UQ Extend modules: chapters 1 and 2 in prior weeks; chapter 3 is the novel one for this lecture.
- Aaron textbook (6th or 7th edition acceptable) as additional context.
Quiz and exam guidance:
- Quiz 3 opens in one hour and closes on Monday; covers content from this lecture and Grove readings.
- Mid-semester exam date announced: Saturday, September 6; more information to come as the date approaches.
Practical study tips:
- Focus on understanding the null hypothesis and the logic of rejecting/failing to reject it.
- Be comfortable distinguishing between true experiments, quasi-experiments, and correlational designs; know the strengths and limitations of each.
- Practice identifying potential confounds and proposing counterbalancing or design changes to mitigate them.
- Review terminology (IV, DV, predictor, criterion, treatment, control) and when each term is most appropriate.

Summary of takeaways

The central aim of research design is to maximize the ability to detect true effects by minimizing confounding factors and random noise.
Experimental control is about reducing variability from situational factors, measurement error, and individual differences to isolate the effect of the IV on the DV.
There are three main quantitative designs with distinct trade-offs: true experiments (random assignment; strong causal inference), quasi-experiments (existing groups; ethically feasible but weaker causal claims), and correlational designs (no manipulation; high ecological validity but cannot establish causality).
Independent groups and repeated measures designs each have advantages and pitfalls; counterbalancing is essential in repeated measures to manage order and carryover effects.
Random assignment is critical in true experiments to ensure equivalence of groups and to minimize selection bias; modern practice uses computer-generated randomization with a random seed.
Real-world examples (tickling study, memory distraction, educational gen/longitudinal design) illustrate how these principles are applied, the kinds of confounds that can arise, and the strategies used to address them.

Lecture 4

Acknowledgement and Course Context

Welcome note and course focus: measurement, frequency distributions, and percentiles; gentle introduction to numbers.
Mid-semester exam scope: weeks 1–4; scheduled examSaturday, September 6 (announced on Blackboard).
Course trajectory: earlier weeks covered scientific process, study design, and questions in psychology; this week moves to data after collecting numbers.
Practical relevance: data cleaning, exploration, and plotting are essential across assignments, in honors year, and in the research process.

Measurement, Constructs, and the Philosophy of Measurement

Measurement goal: assign numbers to objects/observations according to consistent rules (operational definitions).
Constructs in psychology: psychological phenomena like anxiety, memory that are not directly observable but are labeled and studied.
Operational definition: boundaries/criteria to determine whether a phenomenon (construct) is present in a measured instance (e.g., infant imitation). Researchers may disagree; scientific discourse can refine definitions over time.
Observable phenomena and empiricism: measurement relies on observable, checkable, verifiable evidence shared openly for replication.
Scientific disagreement as progress: debate over definitions/methods pushes for better processes.

Variables: Types, Qualities, and Scales

Variable: a characteristic of interest for each individual in a population/sample (e.g., memory capacity, anxiety).
Qualitative vs. quantitative variables:
- Qualitative: categories/labels (e.g., gender, eye color, political affiliation); not meaningful to compute averages.
- Quantitative: numeric measures (e.g., height, weight, income); meaningful to apply statistics.
Coding and measurement rules:
- Numbers can be used as labels (e.g., 0/1 coding for deceased/alive) but not all label-numbers support arithmetic operations.
Types of variables (overview, basic):
- Discrete: whole-number values (no meaningful halves). Example: number of cars observed in a period.
- Dichotomous: two possible values within discrete (e.g., yes/no; male/female; correct/incorrect).
- Continuous: any value within a range (e.g., height, volume).
Measurement scales (order of sophistication):
- Nominal: labels without meaningful order; e.g., color categories, political parties, jersey numbers.
- Ordinal: ordered categories where order matters but intervals are not necessarily equal.
- Interval: ordered with meaningful differences between values, but no meaningful zero. Example discussed: IQ differences; temperature scales like Celsius.
- Ratio: interval properties plus a meaningful zero, allowing ratios (e.g., height, weight, Kelvin temperature).
Examples and nuances:
- IQ: ordinal → interval when actual scores provided; distance between scores meaningful.
- Temperature: Celsius is interval (differences meaningful) but lacks a true zero; Kelvin is ratio (has meaningful zero).
- Age: often treated as ratio (meaningful zero) in many contexts; sometimes discussed as interval in teaching contexts.
Implications of scale choice for analysis: the chosen scale constrains which statistics and claims are valid.
Practical examples in measurement:
- Eye color as nominal; cannot average eye color.
- Height as ratio; allows means, proportions, comparisons like “twice as tall.”
How to report numbers: use of consistent labels and units; interpretability depends on scale properties.

Reliability and Validity of Measures

Reliability: consistency of a measure across time or raters.
- Test-retest reliability: administering the same test twice should yield similar scores if the underlying trait is unchanged.
- In practice, perfect identical scores are unrealistic due to day-to-day variation (e.g., sleep, mood).
- Reliability is quantified via correlation between scores across occasions:
- If scores on Test 1 and Test 2 are highly correlated, reliability is high.
- Inter-rater reliability: when multiple raters judge the same thing (e.g., video ratings), their scores should be correlated.
- Typical adequacy: correlations around 0.60 or higher are considered acceptable for reliability in many contexts.
- Example: alpha waves as a biological fingerprint show very high test-retest reliability over months (almost identical scores).
Validity: whether a measure actually assesses the intended construct.
- Internal validity: the extent to which observed effects are due to the manipulated variables, not confounds.
- External validity: generalizability of results beyond the lab to real-world settings (issues with WEIRD samples: Western, Educated, Industrialized, Rich, Democratic).
- Construct validity: whether the measure truly taps the theoretical construct (e.g., Beck Depression Inventory potentially overlapping with anxiety items; concerns about how well items map to depression construct).
- Content/face validity: whether the measure appears to assess the intended construct on the surface (e.g., mental math tests appearing to measure math ability; head circumference appearing to measure head size, not intelligence).
- Predictive validity: the extent to which scores on a measure predict related outcomes (e.g., ATAR predicting university performance).
Other validity considerations:
- Construct validity and evolving measures: early measures may drift as constructs are better understood; poor initial alignment may be revised.
- Content/face validity distinctions: a measure can be reliable but have low face validity if it doesn’t intuitively fit the construct.
Reliability vs validity relationship: a measure can be reliable but not valid; it must measure what it intends to measure to be useful.

Pilot Testing, Range Effects, and Study Design Considerations

Pilot testing: iterative testing of experimental design and stimuli to ensure the measurement range is appropriate.
- Goals: avoid floor effects (too hard) and ceiling effects (too easy); ensure middle-range performance to observe differences.
- Real-world example: quick demonstration with speed of stimulus presentation; initial results suggested adjustments to avoid near-zero performance.
Range effects and measurement quality:
- Ceiling effect: all participants perform near the top; little room for differentiation.
- Floor effect: all participants perform near the bottom; little room for differentiation.
- Ideal measures sit in a middle range to maximize sensitivity to group differences.
Pilot testing as a standard in research: many published studies include extensive pilot work not visible in the final paper.
Study design considerations discussed earlier in the course:
- Types of studies: experimental, randomized controlled trials; observational, quasi-experimental, correlational.
- Randomization and control groups as tools to manage confounds.
- Independent groups design vs. repeated measures design; counterbalancing as a method to balance potential confounds.
Construct-focused design notes: importance of naming and constructing meaningful constructs before measurement.

Data Presentation, Exploration, and Cleaning

Purpose of data presentation: to tell a clear story about results using figures and tables rather than lengthy narrative only.
Data are often messy: human data can include errors, non-sensical responses, and noise; cleaning is essential before analysis.
Data cleaning and exploration steps:
- Inspect raw data to identify values outside plausible ranges (e.g., 0–10 scales with a value of 20).
- Look for transcription or entry errors (e.g., too-high values in a given scale).
- Clean data and summarize before performing analyses.
Data organization example: raw data matrix (100 students × 10 true/false questions) vs. summarized representations.
Summary representations help reveal patterns quickly:
- Frequency tables: list all possible scores and the count of observations per score.
- Frequency of 0–10 scores example: helps identify most common scores and check data integrity.
Frequency tables vs. variability in data:
- With many possible scores, frequency tables become unwieldy; interval-based bins improve readability.
- Rule of thumb: 10–20 intervals (bins) balance granularity and interpretability; 15 bins often cited as a good middle ground.
Interpreting frequency data:
- Relative frequency: proportion of observations in each bin: extrelativefrequency=racfNextrelativefrequency=racfN ext{relative frequency} = rac{f}{N}
- Cumulative frequency (CF): total observations with scores at or below a given bin.
- Percentiles: boundaries where a given percentage of scores fall below that value.
Practical examples: weights of 72 male students; intervals like 60–64, 65–69, etc.; note about inclusive/exclusive bin definitions to avoid overlaps.
Why include empty/zero-edge bins: to enable certain plots (e.g., frequency polygons) that require zero values at the ends.
Frequency polygons and alternative plots:
- Frequency polygon visually connects bin midpoints to show distribution shape.
- Bar graphs for nominal data; histograms for continuous data with touching bars to show continuity.
- Box-and-whisker plots provide information about median, interquartile range, and extremes.
Bar graphs vs. histograms:
- Bar graphs: for qualitative (nominal) data; bars not touching; order is flexible to aid readability.
- Histograms: for continuous data; bars touch to indicate continuity between bins; bin intervals matter.
Frequency polygons for multiple groups:
- Example with male actual weight vs. male ideal weight; female weights and ideal weights plotted to compare distributions.
Telling a story with graphs:
- Well-chosen figures reveal patterns and differences (e.g., male vs. female weight patterns and ideal vs. actual weights).
- Graphs should be designed to convey a clear message, guiding interpretation.
Summary points for data presentation:
- Sift, clean, and present data so a reader can understand at a glance.
- Good figures prepare the data for inferential tests (e.g., verifying assumptions, handling missing data, removing outliers).
- Choose graph types that best fit the data type (qualitative vs. quantitative) and the story you want to tell.
- Use appropriate intervals (bins) when constructing histograms/frequency polygons.

Percentiles, Cumulative Frequencies, and Practical Calculations

Percentile concept: the value below which a specified percentage of scores fall.
- 90th percentile: 90% of scores are below this value.
- Percentiles are computed by ranking scores and locating the boundary that separates the specified percentage of data.
Relative frequency vs. cumulative frequency:
- Relative frequency: proportion of the total represented by a score/bin: extrelfreq=racfNextrelfreq=racfN ext{rel freq} = rac{f}{N}
- Cumulative frequency (CF): sum of frequencies for all scores up to a given point.
Percentile calculation method:
- Percentile rank = racextCFNimes100racextCFNimes100 rac{ ext{CF}}{N} imes 100
- To find the percentile of a given score, determine CF up to that score and divide by N, then multiply by 100.
- Example walkthrough: with a table of scores and frequencies, CF is calculated by summing frequencies up to the target score; percentile = (CF / N) × 100.
Inverting percentile calculations:
- To find the score corresponding to a given percentile p, compute CF = (p/100) × N, then locate the score bin whose cumulative frequency reaches CF.
Worked example (class scores):
- Suppose a small table with scores and frequencies; N = 20; to find the 35th percentile: compute CF = 0.35 × 20 = 7; find the score with CF of 7 (e.g., a score of 23). Therefore, 35th percentile corresponds to score 23.
- Interpretation: a student scoring 23 did better than 35% of the class.
More advanced example: hours of TV watched by 259 students (data from a lecture):
- Distribution across categories (0–1, 2–3, 4–5, etc.) with cumulative frequencies calculated up to seven hours.
- To find the percentile for seven hours, compute CF up to 7 hours and divide by 259, then multiply by 100; here, around the 63rd percentile.
Frequency polygon interpretation example:
- Shade region left of a percentile boundary to visualize the proportion of data below that boundary (e.g., 63% area under the curve to the left of 7 hours).
Modern data practices: most computing of percentiles and other statistics is done with software, but understanding the underlying calculations is essential for intuition and debugging.
Summary of percentiles in reporting:
- Percentiles provide meaningful benchmarks (e.g., “above 80% of the class”).
- Use cum freq and N carefully to avoid misinterpretation; ensure you read the correct cell when using rearranged equations.

Graph Types and Data Storytelling: Choosing the Right Display

Bar graphs (qualitative data):
- Display counts per category; y-axis scale should reflect observed counts; bars should not touch to emphasize categorical separation.
- Ordering the categories can help readers see patterns; the order is not inherently meaningful for nominal data but can aid interpretation.
Histograms (quantitative data):
- Bars touch to indicate a continuum between bins; choose bin width carefully to reveal distribution shape without over-smoothing or over-fragmentation.
Box-and-whisker plots: quick view of distribution shape, median, interquartile range, and extremes.
Frequency polygons: smooth representation of distributions by connecting bin midpoints; useful for comparing distributions (e.g., groups vs. groups).
Practical storytelling with plots:
- Use a graph to illustrate differences (e.g., male actual vs. ideal weight distributions) and to compare groups (e.g., male vs. female weight patterns).
- Good plots support the narrative of your results and help convey your claims effectively.

Practical Advice for Exam Preparation and Next Steps

Data workflow in research:
- Design measure with appropriate scale; pilot test to refine range and avoid floor/ceiling effects.
- Collect data, then clean and explore before formal analysis.
- Create figures that tell a story; choose graphs that fit the data type and the message.
- Prepare for inferential statistics by ensuring data meet assumptions (normality, etc.).
Mathematical basics to brush up for next week:
- Σ notation:
- \

UQ Extend

Science as a way of knowing

Psychology is a diverse discipline — 53
The unifying qualities of all psychologist disciplines is that all psychologists try to understand behaviour using the methods of science
Epistemology — branch of philosophy that is concerned with the nature and scope of knowledge
Depending on how knowledge is acquired, it may reflect real understanding about the world or it could embody misinformation

Acquiring Knowledge

Personal experience — people experience things and all these experiences contribute to our knowledge of the world. This is problematic however as it is only evaluated by you and is open to influence from your own biases.
Authority — people appeal to authority in order to live their lives. Typically, one can verify if these people are truly authorities, however the authority may not be an expert in their discipline

Scientific Method

How knowledge is gained

Logic — reason through problems to generate new knowledge, such as solving a maths question with a maths
Empiricism — gain knowledge through careful and objective observation (seeing, hearing, touching, etc)
Rational — formulation of hypotheses and theories

Main features

Systematic observation
Critical analysis of data, hypotheses, and theories
Tentative acceptance of hypotheses and theories
Openness and independence from authority

Theory, experiments and statistics

Goals for scientific research

Describe a behaviour — if we want to understand any type of behaviour; describe it in detail and give the conditions under which it occurs on however many levels
Explain behaviour — why does the behaviour occur? How do environmental factors affect the behaviour, how does the presence of other people affect behaviour
Predict behaviour — want to know when the behaviour occurs. what are the specific cognitive, emotional, social, or environmental conditions
Deep understanding — if we can control the behaviour, we can identify and manipulate the critical factors that promote or discourage a particular behaviour

Theory

Ideas about how nature works in psychology theories explain why behaviour occurs the way it does
A fully formed theory fulfills all four goals
A psychological theory is a precise statement of how events in the world affect behaviour specifically
- it summarises existing knowledge on a topic
- it outlines the relationships between the different factors involved
- explains the phenomenon of interest most importantly a theory generates specific predictions about the outcomes of situations

Hypothesis

More focused and more tentative than a theory
Tales a theoretical claim and applies it to a specific setting the hypotheses are more focused than the theory the more specific instances are found to support the repeated study
In a typical formal experience, two hypotheses are proposed
- null hypothesis — noted as each sub zero and the other is called the alternative higher hypothesis noted as some one. statement that there is no difference between the groups we are comparing or that there is no systematic change in one variable that is tied to another variable
  - example - relationship between smoking and health is that there is no relationship between the amount people smoke and their health
- alternative hypothesis — a statement that there is a difference between the groups we are comparing or that there is a systematic change in one variable that is tied to another variable
  - example - relationship between smoking and health is that there is a relationship probably that the more people smoke the less healthy they will be
  - a drawback is that they cannot make an exact prediction

Statistics

Formal mathematical procedures that allow us to decide which of the two hypotheses to favour
Allow us to rule out chance as a possible reason for the pattern of results
A test of whether or not chance can explain the observed differences between the groups.

Principles of Science

The best method we have for generating knowledge about the universe, nature, and human behaviour is the scientific method.
The scientific method generates knowledge based on evidence. Faith-based knowledge and morality-based knowledge are examples of knowledge that are not generated through the scientific method.
Scientific claims, hypotheses, and theories are all based on evidence.

Objectivity

Evidence, when offered to support a claim or hypothesis, must be observable by any person.
Offering your personal thoughts or feelings as evidence is not acceptable because you are the only person who can observe them.
Therefore, in order to provide scientific evidence you have to be creative and think of ways to make your observations objective.
- For example, a recent CNN article reported that some people’s phobia of flying is made worse when air disasters are reported in the media. To support that claim scientifically, we need to provide objective evidence that phobics are more anxious (which is a mental state not readily observable) than non-phobics when flying after seeing a news report of an air accident. We could measure heart rate, sweaty palms, or breathing rate, all of which are objective and tend to vary with levels of anxiety. These forms of evidence are more credible than simply asking a person to report how they feel because physiological responses are observable and measurable by anyone.

Skepticism

Science’s principle of skepticism requires that claims must be backed up with evidence and that this evidence must be carefully and critically evaluated.
When considering a claim that someone has made, your reflex skeptical response should be, “show me the evidence” and/or “let me see” and/or “let’s take a look”.
- For example, the claim that heavier objects fall to the ground faster than lighter objects may sound intuitively correct without much reflection. However, we don’t know whether or not that claim is correct until we scrutinise it skeptically. We can thank Galileo for conducting his classic experiments to disprove that idea. Galileo’s skeptical approach forced him to test the claim, rather than accepting it without evidence. Galileo’s main finding was that two objects of different mass, dropped simultaneously from the same height in a vacuum, will indeed reach the ground at the same time. This is another instance revealing individual intuition as an unreliable source of knowledge. Imagine a feather and a boulder dropped from the same height in a vacuum. They will indeed reach the ground at the same time. The key is that they are dropped in a vacuum, which removes the effects of wind resistance. This is difficult to visualise since we rarely observe objects moving in a vacuum.

3. Openness/open-mindedness

When reporting their observations, scientists are required to describe the conditions under which these observations were made.
This includes exactly how the measurements were taken, who the participants were, and any other details relevant to the methods of acquiring the evidence.
It is imperative that another investigator reading the description can adequately reproduce the conditions of your observations so they can see for themselves.
The standard to which you should strive is to be able to report your observations so objectively that even your enemy would have to agree with you (Agnew & Pyke, 2007).
When different observers agree, their observations are said to be reliable.
When investigators ask about inter-rater or inter-observer reliability, they are asking whether different observers agree about the same observation.

4. Tentativeness

Scientists are never 100% certain of any finding, because they know that new evidence may come along that will force them to revise their conclusions or discard them altogether.
This is a difficult characteristic of science for the general public and new scientists to accept.
Why shouldn’t a well-executed study give the definitive answer to a question? Well, any study is only as good as the available theories, technology, and evidence.
In general, scientists accept that research findings are rarely 100% clear-cut and that ambiguity comes with the territory. Patience is required as the process of science weeds out erroneous conclusions and reveals the correct ones.

5. Independence from authority

The phrase, “because I said so” does not constitute scientific evidence. Solid, carefully collected evidence is the only authority in the scientific method.
Therefore, claims made from a source, no matter how reputable, must be supported by evidence.
Even then we interpret that evidence skeptically, evaluating how strongly it supports a claim, whether there were any errors made when collecting the evidence, and so on.

Scientific Process

Media vs. Science

Contentious topics in interviews often feature “gotcha” moments → entertaining but unproductive for truth-seeking.
Scientists use measured language (“balance of evidence supports…”, “unable to replicate…”) → reflects reality of knowledge building.

Nature of Scientific Process

Slow, methodical, involves blind alleys & dead ends.
Represented as a flowchart: idea → hypothesis → prediction → study → data collection → analysis → conclusion → replication.

Steps in the Process:

Idea/Theory: Explanation of how something works (e.g., spaced learning is more effective than cramming).
Hypothesis: Testable chunk of the theory.
Prediction: Specific, measurable outcome (e.g., spaced practice improves learning more than cramming).
Study Design:
- Critical for reliable results.
- Must follow principles of objectivity and empiricism.
Data Collection:
- Questionnaires, interviews, online responses, etc.
- Raw data initially disorganized.
Data Organization & Description:
- Summarize and prepare data for analysis.
Inferential Statistics:
- Generalize from sample to real-world population.
- Decide if hypothesis is supported.
Re-evaluation:
- If unsupported → modify, retest, or discard hypothesis.
- If supported → publish results.
Replication:
- Exact or conceptual replications.
- Builds confidence if findings hold.
- Failure to replicate reduces confidence → refine or discard hypothesis.

Outcome:

Supported hypotheses may evolve into accepted theories.
Process is iterative and self-correcting.

Ethics in Psychological Research

Definition:

Ethics = guidelines/principles for moral & just treatment of others.
In research: focus on how researchers treat participants, run studies, and conduct themselves.
Based on Universal Declaration of Ethical Principles for Psychology.

Four Guiding Ethical Principles

Respect for the Dignity of Persons and Peoples

Value, acknowledge, and treat all people equally regardless of origin, beliefs, or identity.
Special care for vulnerable groups (e.g., children, minorities).
Ensure equal opportunity to be seen, heard, acknowledged.
Protect anonymity and confidentiality.
Example: Evolving gender data collection → beyond male/female binary to non-binary & open responses to show respect and inclusion.

Competent Caring for the Well-Being of Persons and Peoples

Aim for research findings to enhance well-being.
Conduct research to benefit participants or at least cause no harm.
Plan for and mitigate possible harm.
“Competent” caring → researchers must have proper training for tools/tests used.
Case Study – Tuskegee Syphilis Study (1932–1972):
- 600 African-American men (399 with syphilis) misled, denied treatment (penicillin available from 1947).
- No informed consent → participants not told study details.
- Ethics committees now prevent such abuse (require informed consent, minimal/no harm).

Integrity

Conduct research with objectivity and honesty, free from self-interest or outside influence.
Avoid exploitation and bias in reporting.
Example – Grossarth-Maticek Research:
- Linked personality types to cancer/heart disease.
- Allegations of data falsification (e.g., reclassifying participants, duplicating data).
- Funded by tobacco companies → possible conflict of interest.
- Findings not replicated → likely due to falsified data.

Responsibility to Society

Psychology should contribute to understanding the human condition and improving well-being.
Researchers must:
- Understand and follow ethical conduct.
- Reflect on and update research practices to stay ethical.

Stanford Prison Experiment (1971)

Conducted at Stanford University by Philip Zimbardo.
Setup: Mock prison in psychology building; participants randomly assigned as prisoners or guards.
Payment: $15/day.
Role of Zimbardo: Prison superintendent.
Informed Consent Issues:
- Participants given vague info, not told specifics (e.g., surprise home arrest, strip search).
- Guards encouraged to be aggressive (no physical harm) to instill fear.
Ethical Concerns:
- Prisoners who wanted to leave were told they could not.
- Planned 2 weeks → ended after 6 days when an outsider raised concerns.
- Zimbardo admitted losing objectivity due to his role.
- Formal debriefing not until years later.

Variables in Research

Key Variables

Independent Variable (IV):
- Manipulated by the experimenter (e.g., age groups, drug type).
- Sometimes cannot be directly manipulated (e.g., age).
Dependent Variable (DV):
- Measured outcome; depends on IV.
- Example: Studying mental ability vs. age → IV = age, DV = IQ score.

Unwanted (Extraneous) Variables

Variables that contaminate results and obscure the relationship between IV and DV.

Situational Variables:
- Environmental factors (temperature, noise, lighting, time of day).
- Can affect all participants differently and unpredictably.
Individual Differences:
- Natural variations between people (height, weight, motivation, anxiety).
- Combine with situational variables to increase variability.
Measurement Error:
- Inconsistencies in recording data (e.g., misreading ruler, stopwatch error).
- Linked to experimenter’s attention, training, or bias.

Effect:
- Random variability can weaken or completely hide real relationships.
- Example: Teaching method study → with unwanted variables removed, clear difference; with them present, results less consistent.

Confounding Variables

Definition: Variables that vary systematically with IV, providing an alternative explanation for results → prevents establishing causation.
Example:
- Testing two drugs on rats: all Drug A rats tested in the morning, all Drug B rats in the afternoon.
- Time of day becomes a confounding variable.

Controlling Confounding Variables:

Keep constant: Test all groups under same conditions (e.g., all in morning).
Counterbalance: Spread variations evenly (e.g., half of each group in morning, half in afternoon).

True Experimental Designs — Key Features

At least two levels of the Independent Variable (IV)
- One level can be absence of treatment (control group/placebo).
- Other = presence of treatment (experimental group).
Random Assignment
- Equal chance of being in any group (coin flip, random number table, etc.).
- Purpose: Distribute extraneous factors (motivation, ability, age, health) evenly so they don’t vary systematically with the IV.
Control for Confounding Variables
- Prevent alternative explanations for observed differences between conditions.

Independent Groups Design (Between-Subjects)

Structure:
- Two or more groups, each experiencing a different level of the IV.
- Participants randomly assigned to one group.
- Experimental group receives IV; control group does not.
Example — Tickling Experiment:
- IV: Who does the tickling (robot vs. self).
- DV: Ticklishness rating (1–10).
- 32 participants → random assignment into 2 groups of 16.
- Robot group tickled by robot; self group tickled themselves using robot arm.
- Results: Robot tickle group generally rated higher ticklishness, but some overlap.
Drawbacks:
- High variability from individual differences (e.g., natural differences in ticklishness).
- Requires more participants than repeated measures.

Repeated Measures Design (Within-Subjects)

Structure:
- Same participants tested in all conditions of the IV.
- Fewer participants needed (e.g., 16 instead of 32 in tickling example).
- Reduces random variability due to individual differences.
Order Effects:
- Experiencing one condition may influence responses in the next.
- Controlled via counterbalancing:
  - Half participants → Condition A then B.
  - Half participants → Condition B then A.
Example — Tickling Experiment:
- All 16 participants experienced both robot and self tickling.
- Order counterbalanced to control for order as a confound.
- Results showed less spread in data → reduced variability from individual differences.

When Repeated Measures is NOT Suitable

If one condition permanently changes participant responses (e.g., learning effects).
Example: Comparing teaching methods for statistics → learning from first method influences performance in second method.

Summary Table

Feature	Independent Groups (Between)	Repeated Measures (Within)
Participants per Condition	Different people in each group	Same people in all conditions
Randomization Purpose	Equalize groups	Control order effects
Main Advantage	No carryover/order effects	Reduces variability from individual differences; fewer participants
Main Disadvantage	More participants needed; variability from individual differences	Risk of order/carryover effects
Key Control Method	Random assignment	Counterbalancing

Observational Designs

Two types covered:

Correlational Design
Quasi-Experimental Design

Why not use true randomized experiments?

Sometimes impractical or unethical (e.g., cannot assign people to start smoking).

Example: Smoking & Health

Hypothesis: Smoking is bad for health.

Operational definition of health: Number of doctor visits per year.

Prediction: More smoking → more doctor visits.

Correlational Study

Method:
- Observe people who already smoke/don’t smoke.
- Measure:
  - Cigarettes smoked/day (IV)
  - Doctor visits/year (DV)
- Example: Ask 200 people about both variables.
- Create scatter plot: each point = one person’s data.
Observation: Positive relationship — heavier smokers see doctors more often.
Key point: No variables manipulated → just observation.
Limitation: Cannot conclude causation.

Quasi-Experimental Design

Similar to true experiment, but no random assignment.
Method:
- Form groups based on pre-existing characteristics (e.g., smoking habits).
- Example:
  - Light smokers: 0–10 cigarettes/day
  - Heavy smokers: 20–30 cigarettes/day
  - DV = doctor visits/year
- Plot results: heavy smokers have more doctor visits.
Key difference from true experiment: Grouping based on existing traits, not random assignment.
Other uses:
- Age (e.g., young vs. older adults)
- Health conditions (e.g., high blood pressure vs. normal)
- Income levels (e.g., wealthy vs. middle class)

Causation?

From correlational or quasi-experimental designs → No causal conclusion possible.
Unknown third variables could explain results (e.g., drinking, diet).

Conditions for Causal Inference

(Only true randomized experiments can fully meet all three)

Relationship: Regular & reliable changes in one variable associated with changes in the other.
Time Order: Cause precedes effect.
No Other Explanations: Alternative causes ruled out (via randomization).

Summary

Observational Designs = Correlational studies + Quasi-experiments.
Correlational study: Measures 2+ variables in same group, examines relationship.
Quasi-experiment: Like true experiment but without random assignment.
Limitation: Lack of randomization → cannot infer causation.
Only true experiments (with randomization) allow causal claims.

Levels of Measurement

Four levels of measurement in research: nominal, ordinal, interval, and ratio. These levels provide increasing precision and determine which statistical analyses are appropriate.
The way you measure a variable constrains the kinds of analyses you can perform and test hypotheses effectively.

Qualitative vs Quantitative; Discrete vs Continuous; Dichotomous

Qualitative measurement captures attributes that don’t have meaningful numerical values; numbers used are labels, not magnitudes (e.g., eye colour, political leanings, countries visited).
Quantitative measurement records numeric values where the numbers have meaning (e.g., height in cm).
Qualitative variables use numbers as labels only; they do not imply magnitude (e.g., 1 = brown, 2 = blue, etc.).
Discrete variables can take only whole values (no intermediate values); e.g., number of chess pieces lost.
Dichotomous variables are a subset of discrete variables with exactly two possible values (e.g., coin toss: heads or tails).
Continuous variables can take an infinite number of values between any two points (e.g., millilitres of milk, grams of flour).

The four levels of measurement with the swimming race example

Nominal (qualitative): record whether each swimmer finished the race (Yes/No). This is a dichotomous nominal variable.
Ordinal: record placings (1st, 2nd, 3rd, …). There is order, but the intervals between places are not known.
Interval: quantify differences between scale points with meaningful intervals but no true zero (e.g., differences in seconds compared to a club record, where a negative value is possible). Distinguishing feature: equal intervals, but no absolute zero.
Ratio: has a meaningful absolute zero, allowing meaningful ratios between values (e.g., actual swim times in seconds). Zero means absence of the quantity.

What counts as a measurement in practice

Measurement can be self-reported, behavioural, or physiological.
Self-report: what people say about themselves (surveys, questionnaires, interviews).
Behavioural: what participants actually do (e.g., counts of aggressive acts, reaction time).
Physiological: changes in physiological activity (heart rate, hormone levels, brain blood flow).

Key issues in measurement: reliability and validity

Reliability: consistency or stability of a measurement across time or raters.
- Example: a ruler giving consistent length measurements (test-retest reliability).
- Poor reliability: measurements with high random variability (e.g., using a rubbery ruler).
- Types include: test-retest reliability, internal consistency, and interrater reliability.
Validity: whether a measure actually measures what it is intended to measure (face validity, predictive validity, construct validity).
- Face validity: the measure appears to measure what it should (expert judgment).
- Predictive validity: how well a measurement predicts a criterion (e.g., ATAR predicting university performance).
- Construct validity: the measure aligns with theoretical concepts (e.g., IQ tests and intelligence).
All measures can be evaluated for reliability and validity.

How data are organized and displayed

Tables and graphs help summarize data visually.
Frequency table: rows for each possible value; tallies show how many times each score occurs; final frequencies show counts.
Grouped frequency table: collapse data into equal-width intervals to manage wide ranges; aim for roughly 10–20 intervals.
Stem-and-leaf plot: a compromise between tables and graphs showing stems (higher units) and leaves (lower units).
Box-and-whisker plot: shows range (min to max), interquartile range (25th to 75th percentile), and median (50th percentile).
Bar graph: qualitative data (nominal) with non-touching bars; X axis lists categories, Y axis shows frequencies.
Histogram: quantitative data (interval/ratio) with touching bars; grouped histograms extend to grouped data.
Frequency polygon: line graph version of a histogram, useful for overlaying distributions (e.g., actual vs ideal weights by gender).
Choice of graph depends on data type and whether grouping is used; grouping can simplify wide ranges but reduces precision.

Percentiles and grouped distributions

Percentile: the percent of scores at or below a given score in the dataset.
Notation: n = number of observations; SF = simple frequency; CF = cumulative frequency.
Formula (for a specific score): extPercentile=racCFnimes100.extPercentile=racCFnimes100. ext{Percentile} = rac{CF}{n} imes 100.
To compute percentiles for a single score, rank data, compute SF and CF, then apply the formula.
For grouped distributions, percentile refers to the percentage of scores at or below the highest score in a group.
Example steps: rank data; compute SF and CF for groups; compute percentiles for groups using adjusted interpretation.
Example values and steps are provided in the notes (e.g., computing percentiles for a dataset with n = 20 to obtain a percentile of 65 for a score of 14).

Percentiles in grouped data: practical interpretation

When using grouped data, the percentile for a group (e.g., 25–29) is the percentage of scores at or below the highest score in that group (here, 29).
Total n remains the number of observations; CF/n gives the proportion, multiplied by 100 gives the percentile.

Measures of central tendency

Mode: most frequently occurring score; in a distribution, the tallest bar in a histogram or the peak of a frequency polygon.
Median: middle value when data are ordered; for odd n, the middle value; for even n, the average of the two middle values.
Mean (average): ar{x} = rac{1}{n} \, \sum{i=1}^{n} xi.
Examples:
- A small set {3, 4, 4, 5} has mean 4 and median 4.
- In a skewed dataset, mean and median are pulled toward the tail; they can diverge from the mode.
Mean as balancing point: if you imagine a pile of numbers, the mean is the point where the distribution balances; extreme values pull the mean toward them.

Measures of variability

Range: difference between the highest and lowest score; simplest measure but sensitive to extreme values.
- Example: Range = max − min.
Variance: average of squared deviations from the mean; measures how spread out the data are.
- Deviation score: di = xi - ar{x}. The sum of deviations is zero, so we square them to compute variability.
- Sum of squares: SS = \, \sum{i=1}^{n} (xi - ar{x})^2.
- Population variance: \sigma^2 = rac{1}{N} \sum{i=1}^{N} (xi - \,\mu)^2.
- Sample variance: s^2 = rac{1}{n-1} \sum{i=1}^{n} (xi - \,\bar{x})^2.
Standard deviation: square root of variance; puts variability in the same units as the data.
- Population SD: σ=σ2.σ=σ2.\sigma = \sqrt{\sigma^2}.
- Sample SD: s=s2.s=s2.s = \sqrt{s^2}.
Notes:
- Variance is in squared units, which can be unintuitive; SD provides a more interpretable measure in original units.
- Examples in the notes show how variance can differ across groups (e.g., Class A variance = 81 vs Class B variance = 225).
Key concept: range is simple but sensitive to extremes; variance uses all scores; standard deviation is the intuitive spread in original units.

Shape of distributions: skewness and kurtosis

Normal curve: bell-shaped, symmetric, with tails extending to extremes; peak is the mode when symmetric.
Skewness: measure of symmetry.
- Positive skew: right tail longer; mean and median are dragged toward the right tail; example: income distributions often positively skewed due to a few very high incomes.
- Negative skew: left tail longer; mean and median are dragged toward the left tail; example: exam scores with a cluster at the high end and a few very low scores.
Kurtosis: measure of spread and peakedness.
- Leptokurtic: tall, narrow peak; scores concentrated in a narrow range, tails may extend far.
- Platykurtic: flat or plateau-like distribution; more spread-out around the center.
Relationship with measures of central tendency:
- In symmetric distributions, mean = median = mode.
- For skewed distributions, median and mean move toward the tail; mode remains at the highest point; if mode is smallest value, positive skew is indicated; if mode is largest value, negative skew is indicated.
These shape characteristics influence the appropriateness of statistical techniques and interpretations.

Putting it together: practical aspects for research design

Use as high a level of measurement as possible because you can convert downwards but not upwards (e.g., you can convert times to placings, but you cannot recover precise times from placings).
The level of measurement affects which analyses are permissible and meaningful.
In psychology, measurements include self-report, behavioural, and physiological, each with reliability and validity considerations.

Quick reference to formulas and notation

Mean: ar{x} = rac{1}{n}\sum{i=1}^{n} xi
Range: \text{Range} = \max(xi) - \min(xi)
Deviation score: di = xi - \bar{x}
Sum of squares: SS = \sum{i=1}^{n} (xi - \bar{x})^2
Variance (population): \sigma^2 = \frac{1}{N} \sum{i=1}^{N} (xi - \mu)^2
Variance (sample): s^2 = \frac{1}{n-1} \sum{i=1}^{n} (xi - \bar{x})^2
Standard deviation (population): σ=σ2σ=σ2\sigma = \sqrt{\sigma^2}
Standard deviation (sample): s=s2s=s2s = \sqrt{s^2}
Percentile: Percentile=CFn×100Percentile=nCF×100\text{Percentile} = \frac{CF}{n} \times 100 where CF is the cumulative frequency and n is the sample size.
Box-and-whisker: key components are minimum, first quartile (Q1, 25th percentile), median (Q2, 50th), third quartile (Q3, 75th), maximum; interquartile range is IQR=Q3−Q1IQR=Q3−Q1IQR = Q3 - Q1
Note: all formulas use the standard notation: n for sample size, N for population size, x_i for individual scores,
\mu for population mean, and \bar{x} for sample mean.

Connections to examples from the transcript

Nominal example: finish vs not finish in a swimming race (dichotomous nominal).
Ordinal example: placing in a swimming race (1st, 2nd, 3rd) with no information about time gaps.
Interval example: seconds slower relative to club record; negative values possible when beating the record.
Ratio example: actual swim times in seconds; zero represents no time elapsed; meaningful ratios like 60s vs 120s.
Reliability examples: test-retest reliability, internal consistency, and interrater reliability; validity examples include face validity, predictive validity (ATAR), and construct validity (IQ tests).
Data organization examples: standard and grouped frequency tables, stem-and-leaf plots, box-and-whisker plots, bar graphs, histograms, and frequency polygons; use grouping to handle wide data ranges but accept loss of precise scores within groups.
Percentile example: computation steps and interpretation for a dataset, illustrating both individual-score and grouped-percentile computations.
Central tendency and variability: discussion of when to use mode, median, or mean, and how variability (range, variance, standard deviation) describes how spread out the data are, with emphasis on how extreme scores affect the mean but not the median.

Note: These notes summarize the key ideas, definitions, examples, and formulas from the transcript to support study and exam preparation. Be sure to understand how to apply each concept to data sets and to select appropriate analyses based on the level of measurement and distribution characteristics.

Reading 2

Displaying the Order in a Group of Numbers Using Tables and Graphs

Chapter framing: Statistics as a branch of mathematics focusing on organization, analysis, and interpretation of numbers; treated here as descriptive of data in psychology and related fields.
The Two Branches of Statistical Methods
- Descriptive statistics: summarize and describe a group of numbers (tables, graphs, etc.).
- Inferential statistics: draw conclusions and make inferences beyond the observed data about a larger population.
- This chapter focuses on descriptive statistics to build intuition for later inferential methods.
What statistics is for psychologists
- To describe data, test ideas, compare groups, and evaluate research reports in media.
- Use software (e.g., SPSS) but building hands-on understanding by hand reinforces procedure.
The learning approach in the text
- Small, simple numeric examples per chapter to emphasize underlying logic.
- Emphasis on understanding steps, not just mnemonics; SPSS/end-of-chapter sections for computer-based practice.
Descriptive vs Inferential recap
- Descriptive: summarize a group of numbers.
- Inferential: generalize beyond the observed data to a larger population.
Key concepts introduced early
- Variables, values, and scores; an example using a stress rating scale from 0 to 10.
- A variable is a condition that can vary; a value is a numeric or categorical descriptor; a score is a person’s specific value.
- Levels of measurement influence what statistics are appropriate.

Some Basic Concepts

Variables, Values, and Scores (example)
- Stress level variable on a 0–10 scale; a respondent with score 6 has value 6 on the variable stress level.
- Definitions:
- Variable: a condition or characteristic that can have different values.
- Value: a possible numeric or category/or category value on a variable.
- Score: a particular person’s value on a variable.
Table 1 (terminology)
- Variable: stress level; age; gender; religion; etc.
- Value: 0,1,2,3,… or categories like male/female.
- Score: a person’s numeric or categorical value on a variable.
Levels of Measurement (Table 2 summary in the text)
- Equal-interval (numeric): differences between values reflect equal amounts measured; Examples: stress level, GPA; can be treated as numeric.
- Rank-order / Ordinal: numeric values reflect relative ranking, not equal intervals; Examples: class standing, place finished.
- Nominal / Categorical: values are categories with no natural order; Examples: gender, major, diagnoses.
- A variable can be equal-interval, ratio, or ordinal/nominal depending on how it is measured.
Equal-interval vs Ratio scale
- Equal-interval: differences between values are meaningful; not necessarily a true zero.
- Ratio: has an absolute zero; allows statements about multiplicative comparisons (e.g., twice as big).
- Examples: GPA (roughly equal-interval), stress ratings (approximate equal-interval); counts like number of siblings (ratio, has true zero).
Rank-order vs Nominal variables (discussion of information content)
- Rank-order provides relative position but less information about magnitude differences.
- Nominal provides categories with no inherent order.
Discrete vs Continuous variables
- Discrete: specific values only (e.g., number of dentist visits; number of kids: 0,1,2,…).
- Continuous: theoretically infinite values between any two values (e.g., height, time).
The stress example and Level of Measurement discussion
- Stress ratings on 0–10 are treated as numeric; often approximated as equal-interval.
- Distinctions between measurement scales affect the statistics that can be used.
Practice questions (quick checks in the book)
- Identify variable, score, range, and level of measurement for various examples.
Box 1: Poetic/historical trivia for statistics (origin, development, and early uses).
Discrete vs Continuous and Levels of Measurement are revisited with examples and warnings about interpretation.

Frequency Tables

An Example with stress ratings (n=151 in the full study, subset shown n=30 for teaching ease)
- Stress scores: 8,7,4,10,8,6,8,9,9,7,3,7,6,5,0,9,10,7,7,3,6,7,5,2,1,6,7,10,8,8
- A frequency table lists each possible value and how many times it occurred.
- Frequency = count of occurrences; Percent = frequency/total × 100.
Frequency table basics
- Step 1: List all possible values from lowest to highest, including values with 0 frequency.
- Step 2: For each score, mark its occurrence on the list.
- Step 3: Compute frequency for each value by summing marks.
- Step 4: Compute percentage for each value: extPercent=racextFrequencyNimes100extPercent=racextFrequencyNimes100 ext{Percent} = rac{ ext{Frequency}}{N} imes 100
Example: Stress rating table (Table 3) and its interpretation
- The table shows that most students clustered around 7–8; few around very low values.
Frequency tables for nominal (categorical) variables
- Example: Closest person in life (208 students): Family member, Nonromantic friend, Romantic partner, Other with corresponding frequencies and percentages (Table 4).
Another example: Social interactions diary (94 students, 10 minutes interactions)
- The Data set is a mix of numeric values (counts) that can be tabulated similarly.
Steps for a numeric variable vs nominal variable
- Numeric: steps 1–4 as above.
- Nominal: same four-step process, but values are categories; frequencies show how many fall into each category.
Grouped frequency tables
- Used when many possible values; group adjacent values into intervals (e.g., 0–4, 5–9, etc.).
- Interval width chosen to produce about 5–15 intervals and to have clean starting points (multiples of interval size).
- Intervals example: In stress ratings with interval size 2, intervals 0–1, 2–3, 4–5, 6–7, 8–9, 10–11.
- For 10-interval example with interval size 5: 0–4, 5–9, 10–14, …, 45–49.
Grouped frequency table advantages and trade-offs
- Pros: simpler pictures for many values; easy visualization.
- Cons: loses some detail about frequencies within intervals.
Histograms
- A histogram is a bar graph where bar height equals the frequency for each value (or interval midpoint for grouped tables).
- Nominal variables: bars are separated (bar graph style).
- Numeric variables: bars touch (like a skyline) to emphasize continuous distribution.
Guidelines for building histograms from grouped data
- If using grouped data, bottom axis should be interval midpoints.
- Last interval midpoint is halfway between the start of the last interval and the start of the next one (even if that next interval doesn’t exist in data).
- Heights correspond to frequencies; bars centered on the interval midpoints.
Figure 3–4 reference points
- Histograms comparing different tables demonstrate how grouping changes perceived distribution.

Shapes of Frequency Distributions

Describing the shape of a distribution
- Unimodal: one clear peak in the frequency distribution.
- Bimodal: two distinct high points; often indicates two subgroups.
- Multimodal: more than two peaks.
- Rectangular (square) distribution: frequencies roughly equal across values.
- Symmetrical vs skewed distributions
- Symmetrical: left and right halves mirror each other.
- Skewed: tail on one side longer (left-skewed = negatively skewed; right-skewed = positively skewed).
Frequency polygons
- A line-based graph connecting points corresponding to frequencies at each value; another way to visualize distributions.
Common shapes in psychology data
- Many studies approximate unimodal distributions; bimodal/multimodal occur but are less common.
- Ceiling and floor effects contribute to skewness (e.g., many scores pile up at minimum or maximum values).
Floor and ceiling effects (examples)
- Floor effect: many scores pile up at the lower end (e.g., number of children with low counts).
- Ceiling effect: many scores pile up at the upper end (e.g., test with max score).
Normal curve and kurtosis
- Normal curve: bell-shaped, unimodal, symmetric; the benchmark for many statistical techniques.
- Kurtosis: degree of peakedness or flatness relative to the normal curve; high kurtosis = more scores in tails; low kurtosis = fatter/ thinner tails.
Typical shapes in psychology distributions
- Stress and social interactions distributions are often near normal but can be skewed, reflecting measurement limits or sample characteristics.

Controversy: Misleading Graphs

Core concern: public misuses of tables and graphs can mislead beliefs about data.
Common misuses discussed:
- Unequal interval sizes in grouped data distort perception of trends (Figure 12 example: NYT graph with half-year data misleads); the fix is to use equal intervals.
- Exaggerating proportions by not starting the vertical axis at zero (Figure 13a vs 13b); starting at zero provides a more accurate visual impression.
- Distorting overall proportions by altering width/height relationships (Figure 14): standard 1:1.5 width-to-height ratio used for visual consistency; changing it misleads perception.
Guidance for fair graphs
- Use equal interval sizes for grouped data.
- Start vertical axes at zero when appropriate.
- Keep aspect ratio reasonable (not wildly tall or flat) to avoid distortion.
Public vs scholarly use
- Researchers often use frequency tables and histograms as initial steps; public plots can differ in style or formatting.

Frequency Tables and Histograms in Research Articles

Frequency tables and histograms are valuable for understanding distributions but are not always shown in articles.
Examples cited:
- Hether and Murphy (2010): Ten Most Common Health Issues for Male vs Female TV Characters (Table 8); shows sex-specific frequencies and percentages.
- Maggi, Hertzman, & Vaillancourt (2007): Adolescent smoking by age (Figure 15); young adolescents show rising rates with age; used grouped data with percentages.
- Maggi et al. provided a histogram for age groups; some graphs included gaps or percentages instead of raw counts to facilitate comparison.
- Maggi et al. used percentages rather than raw counts to normalize for differing group sizes.
Reporting norms in psychology
- Mean and standard deviation are most common; mode and median appear less frequently but can be included to describe distribution shape.
- Tables (like Table 5 and small descriptive tables) may include medians when relevant, especially in triangular/skewed distributions.
Observations about reporting practice
- In many studies, distributions are described rather than plotted; graphs appear more often in statistics-heavy papers.
- When graphs exist, the form may vary; nonstandard formats are common in applied journals.

Learning Aids, Summary, and Key Terms

Summary points highlighted in the chapter:
- Descriptive stats summarize a distribution; central tendency focuses on a “typical” value; variability describes spread.
- Most psychological data are numeric with roughly equal intervals; rank-order and nominal data use different statistics.
- Frequency tables and histograms are foundational for understanding distributions; shapes include unimodal, bimodal, rectangular, and skewed; normal curves approximate many real-world distributions.
- Misleading graphs arise from unequal intervals, nonzero baselines, and distorted aspect ratios.
- Random vs nonrandom sampling affects generalizability; SPSS is a tool to assist in computing frequencies, histograms, and basic descriptive stats.
Key terms (selected):
- statistics, descriptive statistics, inferential statistics, variable, value, score, numeric variable, equal-interval, ratio scale, rank-order, nominal, levels of measurement, discrete, continuous, frequency table, interval, grouped frequency table, histogram, frequency distribution, unimodal, bimodal, rectangular, symmetric, skewed, floor effect, ceiling effect, normal curve, kurtosis, central tendency, mean, mode, median, variance, standard deviation, sum of squares (SS), standard deviation, definitional formula, computational formula, population, sample, random selection, nonrandom sampling, p (probability), normal curve table (Table A-1), Z scores.

Example Worked-Out Problems (illustrative)

Example: Ten first-year students rated interest in graduate school on a 1–6 scale: 2,4,5,5,1,3,6,3,6,6.
- Step 1: Make a frequency table for possible scores 1–6.
- Step 2: Compute frequencies: 1→1, 2→1, 3→2, 4→1, 5→2, 6→3.
- Step 3: Compute percentages: total N=10; 1:10%, 2:10%, 3:20%, 4:10%, 5:20%, 6:30%.
- Step 4: Construct a histogram from the frequency table or grouped data.
Another worked example (mean, median, etc.)
- Stress ratings example (30 students’ scores): 193 total; mean = ar{X}= rac{193}{30}\