All

Lectures


Overview

  • Week 3 focuses on the design of a study, building on last week’s content about study design, hypothesis, and theory-to-prediction work.

  • The first half is a recap with new examples; the second half extends to additional topics in research design.

  • This lecture emphasizes that the mid-semester exam will heavily cover design concepts such as controlling variables, reducing random variability, managing individual differences, and avoiding confounding variables.

  • Acknowledges traditional land custodians and ongoing cultural connections as context for the session.

Goals of science (clarifying "control").

  • Four goals of science discussed previously: predict, describe, explain, and control.

  • There was a moment of confusion about mentioning “explain”; the speaker confirms: describe and control are key components, with the implication that explanation involves understanding mechanisms behind observed relationships.

  • When scientists say they are "controlling" a variable in the context of study design, they mean reducing unwanted variability to better reveal the effect of the manipulated variables, not the everyday sense of manipulating the phenomenon to demonstrate full understanding.

  • Analogy: a stop sign controls behavior in everyday life; in experiments, control means limiting sources of variability (noise) so that any observed effect can be attributed more confidently to the manipulations rather than extraneous factors.

Key concepts in design and variability

  • Four sources of variability to manage in quantitative designs:

    • Experimental (systematic) manipulation of the IV and random assignment to conditions.

    • Noise/random variability: unpredictable fluctuations across participants or trials.

    • Individual differences: stable differences between participants that can obscure effects.

    • Confounding variables: variables that covary with the IV and DV and offer alternative explanations for observed effects.

  • Distinction of “control” meanings:

    • In theory-building, control means understanding and manipulating a phenomenon; in design, control means minimizing extraneous variability to reveal causal relationships.

  • Signal-to-noise ratio: noise (extraneous variability) can obscure the true signal (the effect of interest).

  • Situational variables (noise): context features like room lighting, sitting position, or distractions that can affect performance independent of the manipulation.

  • Measurement error: inaccuracies in how outcomes are measured, which add noise and can mask true effects.

  • Example of measurement error history: past psych experiments relied on human observers and prone to human error; modern computer-based measures reduce this source of error but historical methods illustrate why measurement error matters.

Experimental designs: true experiments, quasi-experiments, and correlational designs

  • True experiments (experimental design):

    • Directly manipulate the independent variable (IV).

    • Random assignment of participants to conditions.

    • Two core features: manipulation of IV and random assignment to control for extraneous variability.

  • Quasi-experiments:

    • Also called “almost experiments” because they are not fully randomized or manipulated in a controlled way.

    • Use existing groups (e.g., smokers vs. non-smokers) when random assignment is infeasible or unethical.

    • Strengths: feasible when true experiments aren’t possible; ecological validity can be higher.

    • Limitations: higher risk of confounds and lower causal inference strength; may require careful matching, but cannot guarantee equivalence.

  • Correlational designs:

    • No random assignment; variables are measured as they occur in the world.

    • Relationships are assessed with a statistical coefficient (e.g., Pearson’s r) and scatter plots.

    • Cannot infer causality because direction of effect and third-variable confounds cannot be ruled out.

    • Useful for naturalistic testing and when random assignment is not feasible; high ecological validity but limited causal claims.

  • Key terminology in different designs:

    • Experimental design: independent variable (IV) is manipulated; dependent variable (DV) is measured.

    • Quasi-experimental design: IV-like variable is used to group participants, but groups are not created by random assignment.

    • Correlational design: predictor is the variable hypothesized to predict the other; criterion (or DV) is the outcome measured.

    • In correlational designs, the predictor is typically on the x-axis of scatter plots; the criterion on the y-axis.

    • In correlational work, terms IV and DV may be used loosely, but best practice in some courses uses predictor and criterion to reflect causal direction assumptions.

Terminology: IV, DV, predictor, and criterion

  • Independent Variable (IV): the variable that the experimenter directly manipulates in an experiment.

  • Dependent Variable (DV): the outcome measured, presumed to be influenced by the IV.

  • In quasi-experiments: the IV-like variable is used to categorize participants, not randomly assigned.

  • In correlational designs: there is no manipulation and no fixed IV/DV; the variable hypothesized to influence the other is termed the predictor, and the other variable is the criterion (DV).

  • Note on terminology in class: tutors may refer to the predictor and criterion in correlational designs; using IV/DV in correlational contexts is generally acceptable but can be flagged as imprecise by some instructors.

Noise, measurement error, and confounds: how noise can derail signals

  • Situational variables can introduce noise (e.g., flickering light, room distractions).

  • Individual differences: natural variation between participants that can obscure the effect of the IV; more critical in between-subjects designs.

  • Measurement error: inaccuracies in outcome measurement (e.g., human scoring mistakes in early psych experiments) that obscure true effects.

  • Confounding variables: variables that covary with the IV and DV, offering alternative explanations for observed effects (e.g., lighting confounds, education level differences across generations in longitudinal studies).

  • Example of confounding: if a memory task uses distractors in a room with flickering lights, it’s unclear whether observed effects are due to distraction or lighting.

  • Confounds threaten causal inference; eliminating systematic differences between groups narrows explanations to two primary possibilities:

    • Differences due to chance (random variability).

    • Differences due to the IV (the manipulation).

Hypotheses and statistical testing: null vs alternative, and what we test

  • Null hypothesis (H0, H_naught): there is no relationship between the IV and DV (or no difference between groups).

  • Alternative hypothesis (H1): there is a relationship or a difference; often the direction is hypothesized (e.g., distraction impairs memory).

  • Important point about hypothesis testing:

    • Statistical tests assess the null hypothesis; rejecting H0 suggests the observed results are unlikely due to chance, thus tentatively supporting the alternative.

    • We never directly test the alternative; rejection of H0 leaves us with the alternative as the plausible explanation given the data.

    • The alternative is always tentative because there are infinitely many plausible alternative explanations and we cannot test them all.

  • Practical implications for interpretation:

    • If results are significant, we say the null hypothesis is rejected; if not, we fail to reject the null.

    • The presence of variability in groups does not invalidate the null hypothesis; it reflects real-world variability that must be accounted for in design and analysis.

Approaches to testing: quasi-experiments, correlational studies, and true experiments

  • Quasi-experiments revisited:

    • Use existing groups to address questions when random assignment is not feasible or ethical.

    • Examples: longitudinal “age and fluid intelligence” studies where education differences across generations can confound results; later longitudinal designs mitigate these confounds and reveal more gradual declines in fluid intelligence with age.

    • Limitation: more vulnerable to confounds; stronger causal claims require careful design and interpretation.

  • Correlational research revisited:

    • Measures a relationship between two variables without manipulating them.

    • Pros: naturalistic setting; ecologically valid.

    • Cons: cannot infer causality; third variables or reverse causation may explain observed relationships.

    • Example: ultrasounds and birth weight; more ultrasounds may be associated with lower birth weight because high-risk pregnancies prompt more ultrasounds, not because ultrasounds cause low birth weight.

  • True experiments revisited:

    • True experiments are distinguished by random assignment and controlled conditions, enabling stronger causal inferences.

    • In practice, ethical and logistical constraints often necessitate quasi-experimental or correlational designs.

    • When evaluating causal claims, the first question is whether participants were randomly assigned to conditions (random assignment is essential for strong causal claims).

    • Random assignment helps ensure equivalence of groups, reducing potential confounds and distributing individual differences and other random factors evenly.

Randomization, control, and experimental integrity

  • Random assignment vs haphazard group allocation:

    • True random assignment uses a defined random process (e.g., computer-generated random numbers) to assign participants to conditions.

    • Historically, randomization used random-number tables or physical randomization; modern practice largely uses computers with a random seed based on variable inputs (e.g., current time) to ensure unpredictability.

    • Pseudo-randomness is usually sufficient for research purposes; truly random numbers are not strictly necessary for good practice.

  • Why random assignment matters:

    • Avoids systematic differences between groups (e.g., illness prevalence, personality traits) that could confound results.

    • In drug trials, randomization helps mitigate placebo effects and expectation biases by equal distribution of expectations across groups.

  • Break and reminder on terminology:

    • The lecturer postpones a formal definition of “treatment” but uses the term to describe the experimental manipulation that is expected to have an effect (e.g., the distraction condition is the treatment in a memory task).

  • Practical example (randomization in practice):

    • Random allocation with group sizes up to 100 participants can be achieved using computer-generated randomization; pseudo-random seeds are fine due to sufficient unpredictability.

Independent groups (between-subjects) vs repeated measures (within-subjects) designs

  • Independent groups design (between-subjects):

    • Each participant is tested in only one condition.

    • Advantages: simple to implement; reduces carryover and learning effects within a participant; each person contributes one data point.

    • Disadvantages: higher susceptibility to random variability due to individual differences; typically less sensitive to detecting IV effects because of between-subject noise.

    • Example: tickling study where one group self-tickles and another group is tickled by a robot; random assignment balances individual differences across groups.

  • Repeated measures design (within-subjects):

    • Each participant experiences all conditions (e.g., control and treatment) and is measured in each.

    • Advantages: reduces variability due to individual differences since each person serves as their own control; increases statistical power and sensitivity to detect effects.

    • Disadvantages: susceptibility to order effects, fatigue, practice, and carryover effects that can confound results.

    • Counterbalancing as a solution: toggling the order of conditions across participants to distribute order effects evenly (e.g., ABBA design).

  • Key contrasts and implications:

    • Independent groups have more noise due to individual differences; repeated measures minimize that noise but introduce order-related confounds that counterbalancing aims to mitigate.

  • Example with the tickling task:

    • Independent groups: 16 participants per group, two separate tables of scores for self-tickling vs robot tickling.

    • Repeated measures: each participant has both conditions; fetches two scores per participant, one per condition; data typically shown with lines connecting the two scores for each participant to illustrate within-subject changes.

  • Practical design considerations:

    • Repeated measures designs require careful planning to avoid non-equivalent conditions due to carryover or fatigue; counterbalancing (e.g., ABBA) helps balance practice and fatigue across conditions.

    • Balanced design concept: symmetry in the order of conditions across participants helps ensure that order effects do not favor one condition over another.

    • When data collection is lengthy (e.g., EEG studies), you may prefer repeated measures for sensitivity but must plan for order effects; counterbalancing is often essential.

Carryover effects, order effects, and counterbalancing

  • Order effects: outcomes in later conditions are systematically influenced by having completed earlier conditions (e.g., fatigue, learning, practice effects).

  • Carryover effects: effects of a previous condition persist and influence subsequent conditions, complicating interpretation of the current condition.

  • Counterbalancing strategies:

    • ABBA counterbalancing: half of participants do A then B, the other half do B then A; helps balance both order and carryover effects.

    • Balanced design: arranging conditions so that potential confounds are evenly distributed across order sequences.

  • When counterbalancing cannot resolve concerns:

    • Some effects (like long-lasting learning across different teaching methods) may not be fully counterbalanced; in such cases, researchers may need to redesign the experiment or use alternative methods.

  • Data representation in repeated-measures studies:

    • For independent groups: two separate data tables by group with each row representing a participant.

    • For repeated measures: a single data table with a row per participant and columns for each condition; you can visualize with connected lines to show within-participant changes.

Real-world and classroom examples used in the lecture

  • 1999 self-tickling experiment (Blakemore, Frith, and Walpitt): within-subjects vs between-subjects control research on ticklishness using a robot that can move in two ways (predictable vs unpredictable). Design emphasizes intra-subject control and careful manipulation of the predictive element to elicit the hypothesized difference in ticklishness.

  • Spoon creativity task (hypothetical, but used to illustrate order effects): repeat task to measure creativity with different music conditions (pop vs classical); without counterbalancing, order effects would confound the effect of music type on creativity. Counterbalancing (or using two different tasks) is necessary to avoid this confound.

  • Grip-strength device study (hypothetical): random assignment to training device; potential confounder is researcher encouragement; solution is to equalize encouragement across groups or use a factorial design to study interaction effects (two-by-two design).

Data interpretation and visualization: what to look for

  • Independent groups design data illustration:

    • Separate distributions for each group; significant differences indicate potential effects of the IV but may be obscured by between-group variability.

  • Repeated measures data illustration:

    • Each participant contributes data to every condition; plots often show lines connecting a participant’s two scores to visualize within-subject changes.

  • Patterns to watch for:

    • Large within-subject consistency suggests strong treatment effects; large between-subject variability suggests potential noise that counterbalancing and proper randomization must address.

    • Non-overlapping distributions between groups in an independent groups design strengthen confidence in a treatment effect; overlapping distributions indicate higher measurement noise and lower sensitivity.

Reading, preparation, and assessment guidance

  • Reading assignments for the course progression:

    • Grove chapters 1 and 2 (required for this lecture and upcoming assessments).

    • UQ Extend modules: chapters 1 and 2 in prior weeks; chapter 3 is the novel one for this lecture.

    • Aaron textbook (6th or 7th edition acceptable) as additional context.

  • Quiz and exam guidance:

    • Quiz 3 opens in one hour and closes on Monday; covers content from this lecture and Grove readings.

    • Mid-semester exam date announced: Saturday, September 6; more information to come as the date approaches.

  • Practical study tips:

    • Focus on understanding the null hypothesis and the logic of rejecting/failing to reject it.

    • Be comfortable distinguishing between true experiments, quasi-experiments, and correlational designs; know the strengths and limitations of each.

    • Practice identifying potential confounds and proposing counterbalancing or design changes to mitigate them.

    • Review terminology (IV, DV, predictor, criterion, treatment, control) and when each term is most appropriate.

Summary of takeaways

  • The central aim of research design is to maximize the ability to detect true effects by minimizing confounding factors and random noise.

  • Experimental control is about reducing variability from situational factors, measurement error, and individual differences to isolate the effect of the IV on the DV.

  • There are three main quantitative designs with distinct trade-offs: true experiments (random assignment; strong causal inference), quasi-experiments (existing groups; ethically feasible but weaker causal claims), and correlational designs (no manipulation; high ecological validity but cannot establish causality).

  • Independent groups and repeated measures designs each have advantages and pitfalls; counterbalancing is essential in repeated measures to manage order and carryover effects.

  • Random assignment is critical in true experiments to ensure equivalence of groups and to minimize selection bias; modern practice uses computer-generated randomization with a random seed.

  • Real-world examples (tickling study, memory distraction, educational gen/longitudinal design) illustrate how these principles are applied, the kinds of confounds that can arise, and the strategies used to address them.


Acknowledgement and Course Context

  • Welcome note and course focus: measurement, frequency distributions, and percentiles; gentle introduction to numbers.

  • Mid-semester exam scope: weeks 1–4; scheduled examSaturday, September 6 (announced on Blackboard).

  • Course trajectory: earlier weeks covered scientific process, study design, and questions in psychology; this week moves to data after collecting numbers.

  • Practical relevance: data cleaning, exploration, and plotting are essential across assignments, in honors year, and in the research process.

Measurement, Constructs, and the Philosophy of Measurement

  • Measurement goal: assign numbers to objects/observations according to consistent rules (operational definitions).

  • Constructs in psychology: psychological phenomena like anxiety, memory that are not directly observable but are labeled and studied.

  • Operational definition: boundaries/criteria to determine whether a phenomenon (construct) is present in a measured instance (e.g., infant imitation). Researchers may disagree; scientific discourse can refine definitions over time.

  • Observable phenomena and empiricism: measurement relies on observable, checkable, verifiable evidence shared openly for replication.

  • Scientific disagreement as progress: debate over definitions/methods pushes for better processes.

Variables: Types, Qualities, and Scales

  • Variable: a characteristic of interest for each individual in a population/sample (e.g., memory capacity, anxiety).

  • Qualitative vs. quantitative variables:

    • Qualitative: categories/labels (e.g., gender, eye color, political affiliation); not meaningful to compute averages.

    • Quantitative: numeric measures (e.g., height, weight, income); meaningful to apply statistics.

  • Coding and measurement rules:

    • Numbers can be used as labels (e.g., 0/1 coding for deceased/alive) but not all label-numbers support arithmetic operations.

  • Types of variables (overview, basic):

    • Discrete: whole-number values (no meaningful halves). Example: number of cars observed in a period.

    • Dichotomous: two possible values within discrete (e.g., yes/no; male/female; correct/incorrect).

    • Continuous: any value within a range (e.g., height, volume).

  • Measurement scales (order of sophistication):

    • Nominal: labels without meaningful order; e.g., color categories, political parties, jersey numbers.

    • Ordinal: ordered categories where order matters but intervals are not necessarily equal.

    • Interval: ordered with meaningful differences between values, but no meaningful zero. Example discussed: IQ differences; temperature scales like Celsius.

    • Ratio: interval properties plus a meaningful zero, allowing ratios (e.g., height, weight, Kelvin temperature).

  • Examples and nuances:

    • IQ: ordinal → interval when actual scores provided; distance between scores meaningful.

    • Temperature: Celsius is interval (differences meaningful) but lacks a true zero; Kelvin is ratio (has meaningful zero).

    • Age: often treated as ratio (meaningful zero) in many contexts; sometimes discussed as interval in teaching contexts.

  • Implications of scale choice for analysis: the chosen scale constrains which statistics and claims are valid.

  • Practical examples in measurement:

    • Eye color as nominal; cannot average eye color.

    • Height as ratio; allows means, proportions, comparisons like “twice as tall.”

  • How to report numbers: use of consistent labels and units; interpretability depends on scale properties.

Reliability and Validity of Measures

  • Reliability: consistency of a measure across time or raters.

    • Test-retest reliability: administering the same test twice should yield similar scores if the underlying trait is unchanged.

    • In practice, perfect identical scores are unrealistic due to day-to-day variation (e.g., sleep, mood).

    • Reliability is quantified via correlation between scores across occasions:

    • If scores on Test 1 and Test 2 are highly correlated, reliability is high.

    • Inter-rater reliability: when multiple raters judge the same thing (e.g., video ratings), their scores should be correlated.

    • Typical adequacy: correlations around 0.60 or higher are considered acceptable for reliability in many contexts.

    • Example: alpha waves as a biological fingerprint show very high test-retest reliability over months (almost identical scores).

  • Validity: whether a measure actually assesses the intended construct.

    • Internal validity: the extent to which observed effects are due to the manipulated variables, not confounds.

    • External validity: generalizability of results beyond the lab to real-world settings (issues with WEIRD samples: Western, Educated, Industrialized, Rich, Democratic).

    • Construct validity: whether the measure truly taps the theoretical construct (e.g., Beck Depression Inventory potentially overlapping with anxiety items; concerns about how well items map to depression construct).

    • Content/face validity: whether the measure appears to assess the intended construct on the surface (e.g., mental math tests appearing to measure math ability; head circumference appearing to measure head size, not intelligence).

    • Predictive validity: the extent to which scores on a measure predict related outcomes (e.g., ATAR predicting university performance).

  • Other validity considerations:

    • Construct validity and evolving measures: early measures may drift as constructs are better understood; poor initial alignment may be revised.

    • Content/face validity distinctions: a measure can be reliable but have low face validity if it doesn’t intuitively fit the construct.

  • Reliability vs validity relationship: a measure can be reliable but not valid; it must measure what it intends to measure to be useful.

Pilot Testing, Range Effects, and Study Design Considerations

  • Pilot testing: iterative testing of experimental design and stimuli to ensure the measurement range is appropriate.

    • Goals: avoid floor effects (too hard) and ceiling effects (too easy); ensure middle-range performance to observe differences.

    • Real-world example: quick demonstration with speed of stimulus presentation; initial results suggested adjustments to avoid near-zero performance.

  • Range effects and measurement quality:

    • Ceiling effect: all participants perform near the top; little room for differentiation.

    • Floor effect: all participants perform near the bottom; little room for differentiation.

    • Ideal measures sit in a middle range to maximize sensitivity to group differences.

  • Pilot testing as a standard in research: many published studies include extensive pilot work not visible in the final paper.

  • Study design considerations discussed earlier in the course:

    • Types of studies: experimental, randomized controlled trials; observational, quasi-experimental, correlational.

    • Randomization and control groups as tools to manage confounds.

    • Independent groups design vs. repeated measures design; counterbalancing as a method to balance potential confounds.

  • Construct-focused design notes: importance of naming and constructing meaningful constructs before measurement.

Data Presentation, Exploration, and Cleaning

  • Purpose of data presentation: to tell a clear story about results using figures and tables rather than lengthy narrative only.

  • Data are often messy: human data can include errors, non-sensical responses, and noise; cleaning is essential before analysis.

  • Data cleaning and exploration steps:

    • Inspect raw data to identify values outside plausible ranges (e.g., 0–10 scales with a value of 20).

    • Look for transcription or entry errors (e.g., too-high values in a given scale).

    • Clean data and summarize before performing analyses.

  • Data organization example: raw data matrix (100 students × 10 true/false questions) vs. summarized representations.

  • Summary representations help reveal patterns quickly:

    • Frequency tables: list all possible scores and the count of observations per score.

    • Frequency of 0–10 scores example: helps identify most common scores and check data integrity.

  • Frequency tables vs. variability in data:

    • With many possible scores, frequency tables become unwieldy; interval-based bins improve readability.

    • Rule of thumb: 10–20 intervals (bins) balance granularity and interpretability; 15 bins often cited as a good middle ground.

  • Interpreting frequency data:

    • Relative frequency: proportion of observations in each bin: extrelativefrequency=racfNextrelativefrequency=racfNextrelativefrequency=racfNext{relative frequency} = rac{f}{N}

    • Cumulative frequency (CF): total observations with scores at or below a given bin.

    • Percentiles: boundaries where a given percentage of scores fall below that value.

  • Practical examples: weights of 72 male students; intervals like 60–64, 65–69, etc.; note about inclusive/exclusive bin definitions to avoid overlaps.

  • Why include empty/zero-edge bins: to enable certain plots (e.g., frequency polygons) that require zero values at the ends.

  • Frequency polygons and alternative plots:

    • Frequency polygon visually connects bin midpoints to show distribution shape.

    • Bar graphs for nominal data; histograms for continuous data with touching bars to show continuity.

    • Box-and-whisker plots provide information about median, interquartile range, and extremes.

  • Bar graphs vs. histograms:

    • Bar graphs: for qualitative (nominal) data; bars not touching; order is flexible to aid readability.

    • Histograms: for continuous data; bars touch to indicate continuity between bins; bin intervals matter.

  • Frequency polygons for multiple groups:

    • Example with male actual weight vs. male ideal weight; female weights and ideal weights plotted to compare distributions.

  • Telling a story with graphs:

    • Well-chosen figures reveal patterns and differences (e.g., male vs. female weight patterns and ideal vs. actual weights).

    • Graphs should be designed to convey a clear message, guiding interpretation.

  • Summary points for data presentation:

    • Sift, clean, and present data so a reader can understand at a glance.

    • Good figures prepare the data for inferential tests (e.g., verifying assumptions, handling missing data, removing outliers).

    • Choose graph types that best fit the data type (qualitative vs. quantitative) and the story you want to tell.

    • Use appropriate intervals (bins) when constructing histograms/frequency polygons.

Percentiles, Cumulative Frequencies, and Practical Calculations

  • Percentile concept: the value below which a specified percentage of scores fall.

    • 90th percentile: 90% of scores are below this value.

    • Percentiles are computed by ranking scores and locating the boundary that separates the specified percentage of data.

  • Relative frequency vs. cumulative frequency:

    • Relative frequency: proportion of the total represented by a score/bin: extrelfreq=racfNextrelfreq=racfNextrelfreq=racfNext{rel freq} = rac{f}{N}

    • Cumulative frequency (CF): sum of frequencies for all scores up to a given point.

  • Percentile calculation method:

    • Percentile rank = racextCFNimes100racextCFNimes100racextCFNimes100rac{ ext{CF}}{N} imes 100

    • To find the percentile of a given score, determine CF up to that score and divide by N, then multiply by 100.

    • Example walkthrough: with a table of scores and frequencies, CF is calculated by summing frequencies up to the target score; percentile = (CF / N) × 100.

  • Inverting percentile calculations:

    • To find the score corresponding to a given percentile p, compute CF = (p/100) × N, then locate the score bin whose cumulative frequency reaches CF.

  • Worked example (class scores):

    • Suppose a small table with scores and frequencies; N = 20; to find the 35th percentile: compute CF = 0.35 × 20 = 7; find the score with CF of 7 (e.g., a score of 23). Therefore, 35th percentile corresponds to score 23.

    • Interpretation: a student scoring 23 did better than 35% of the class.

  • More advanced example: hours of TV watched by 259 students (data from a lecture):

    • Distribution across categories (0–1, 2–3, 4–5, etc.) with cumulative frequencies calculated up to seven hours.

    • To find the percentile for seven hours, compute CF up to 7 hours and divide by 259, then multiply by 100; here, around the 63rd percentile.

  • Frequency polygon interpretation example:

    • Shade region left of a percentile boundary to visualize the proportion of data below that boundary (e.g., 63% area under the curve to the left of 7 hours).

  • Modern data practices: most computing of percentiles and other statistics is done with software, but understanding the underlying calculations is essential for intuition and debugging.

  • Summary of percentiles in reporting:

    • Percentiles provide meaningful benchmarks (e.g., “above 80% of the class”).

    • Use cum freq and N carefully to avoid misinterpretation; ensure you read the correct cell when using rearranged equations.

Graph Types and Data Storytelling: Choosing the Right Display

  • Bar graphs (qualitative data):

    • Display counts per category; y-axis scale should reflect observed counts; bars should not touch to emphasize categorical separation.

    • Ordering the categories can help readers see patterns; the order is not inherently meaningful for nominal data but can aid interpretation.

  • Histograms (quantitative data):

    • Bars touch to indicate a continuum between bins; choose bin width carefully to reveal distribution shape without over-smoothing or over-fragmentation.

  • Box-and-whisker plots: quick view of distribution shape, median, interquartile range, and extremes.

  • Frequency polygons: smooth representation of distributions by connecting bin midpoints; useful for comparing distributions (e.g., groups vs. groups).

  • Practical storytelling with plots:

    • Use a graph to illustrate differences (e.g., male actual vs. ideal weight distributions) and to compare groups (e.g., male vs. female weight patterns).

    • Good plots support the narrative of your results and help convey your claims effectively.

Practical Advice for Exam Preparation and Next Steps

  • Data workflow in research:

    • Design measure with appropriate scale; pilot test to refine range and avoid floor/ceiling effects.

    • Collect data, then clean and explore before formal analysis.

    • Create figures that tell a story; choose graphs that fit the data type and the message.

    • Prepare for inferential statistics by ensuring data meet assumptions (normality, etc.).

  • Mathematical basics to brush up for next week:

    • Σ notation:

    • \


Distribution Shape

  • Three features to characterize a distribution: shape, central tendency, and variability.

  • Shape asks: What is the overall form of the distribution?

  • Normal distribution (bell curve, Gaussian) is a key reference shape in this course; many variables approximate it when there is enough data.

  • Visualizing shape: use histograms and frequency polygons; binning choice affects how features are seen.

  • Small to moderate samples (typical in psychology): using about 10–20 bins (e.g., ~15) is common; very large data sets allow more detailed binning.

  • Example with many data points (heights of 5,000 high school boys): using many bins reveals a smooth, continuous bell-shaped curve that matches the normal distribution.

  • Normal distributions enable a lot of math tricks and inferences; we’ll exploit this next week with Z scores.

  • Real data often depart from normality: small samples show deviations; most statistics discussed (correlations, t tests) are robust to small normality deviations.

  • Positively skewed distributions: tail extends to the right (e.g., house prices, reaction times).

  • Negatively skewed distributions: tail extends to the left (e.g., exam scores in a hard course).

  • Skewness affects which measures of central tendency are most appropriate; skew and ceiling/floor effects matter for interpretation.

  • If distribution is roughly symmetric and bell-shaped, mean, median, and mode are close to one another.

  • If distribution is not symmetric, middle measures diverge in informative ways (median often preferred for skewed data).

  • Z view of next steps: next week we’ll see how Z scores relate to standardization and later to t tests.

Central Tendency: Mean, Median, and Mode

  • Central tendency answers: where do most scores cluster around?

  • Mode (most frequent value)

    • Simple to identify; useful for nominal data (eye color, political preference, etc.).

    • Example: data 1-2-3-3-4-4-5-5-5-6-7-7, mode is 5; if another value ties, the distribution becomes bimodal (e.g., modes at 5 and 7).

    • Strengths: unchanged by extreme scores; represents the most common value.

    • Weaknesses: can be unstable with small samples; not informative for most statistical calculations.

    • Important note: mode is the only sensible descriptor for strictly nominal data; it cannot be used for many inferential procedures.

  • Median (middle value of ordered data; 50th percentile)

    • Calculation: order data; if odd n, the middle score; if even n, the average of the two middle scores.

    • Robust to extreme scores; good for skewed distributions (e.g., house prices).

    • Example: data with 6 scores: 10, 20, 30, 40, 50, 60 → median is the average of the 3rd and 4th values (here (30+40)/2 = 35).

    • In skewed data, the median better represents a typical value than the mean.

    • In news media, the median is often used for reported incomes or house prices because it’s less affected by extreme values.

  • Mean (arithmetic average)

    • Formula for a sample: ar{x} = rac{1}{n} ext{ with } x1, x2, \,\dots, xn, ormoreexplicitlyormoreexplicitlyormoreexplicitlyor more explicitly ar{x} = rac{1}{n} \sum{i=1}^{n} x_i.

    • The mean is the balancing point or fulcrum of the distribution; it uses every score in the dataset.

    • Strengths: most informative statistic; mathematically convenient; basis for many formulas and tests; tends to be relatively stable with more data.

    • Weaknesses: sensitive to extreme scores (outliers) and skewed distributions; can be a poor summary of the center when data are highly skewed.

  • Notation and population vs. sample

    • Sample: use regular Latin letters; mean is denoted as ar{x} ar{x}  or sometimes mm m  in this course.

    • Population: use Greek letters; the population mean is bc bc  (mu).

    • The sample mean is an unbiased estimator of the population mean: over many repeated samples, the average of the sample means converges to the true population mean.

  • Practical guidance on choosing a measure

    • Symmetric, unimodal distributions: mean ≈ median ≈ mode; mean is often used.

    • Skewed distributions or distributions with outliers: median is often a better descriptor of a “typical” value; mode can be informative for nominal-type data but not for most numeric analyses.

    • Bimodal distributions: mode(s) are informative; mean/median may be less representative of the most typical values.

  • Examples illustrating central tendency choices

    • Salary example (skewed distribution): six salaries, one very high at the top drags the mean above most values; median (e.g., 50,500) better represents a typical salary in a skewed dataset; mode (e.g., 38,000) may reflect the most common salary but not the typical value for planning.

    • Bi-modal example (playground ages vs. parents’ ages): two modes (young and older group) suggest reporting the modes rather than the mean/median alone.

  • Summary guidance for central tendency measures

    • Mode: useful for nominal data; best when reporting “the most frequent category.”

    • Median: robust to outliers and skew; preferred for skewed distributions.

    • Mean: uses all data; most informative in symmetric distributions; sensitive to outliers; useful for further calculations and inferential statistics.

Variability: Range, Variance, and Standard Deviation

  • Variability measures describe how spread out the scores are around the center.

  • Range

    • Definition: difference between the highest and lowest score.

    • Example: two datasets with the same center can have different spreads; range can be similar even if data are very differently distributed in between.

    • Drawbacks: highly sensitive to extreme scores; provides minimal information about the distribution beyond the endpoints.

  • Deviation scores

    • Definition: deviation of each score from the mean: di = xi - ar{x} </p></li><li><p>Sumofdeviationsiszero:</p></li><li><p>Sum of deviations is zero: \sum{i=1}^{n} (xi - ar{x}) = 0. </p></li><li><p>Thiszerosumpropertymotivatesthemovetosquareddeviationsforausablevariabilitymeasure.</p></li></ul></li><li><p>Variance</p><ul><li><p>Definition:averageofsquareddeviations;measuresthespreadinsquaredunits.</p></li><li><p>Formulaforasample(asusedinthiscourse):</p></li><li><p>This zero-sum property motivates the move to squared deviations for a usable variability measure.</p></li></ul></li><li><p>Variance</p><ul><li><p>Definition: average of squared deviations; measures the spread in squared units.</p></li><li><p>Formula for a sample (as used in this course): s^2 = rac{1}{n} \sum{i=1}^{n} (xi - ar{x})^2. </p></li><li><p>Interpretation:theaveragesquareddistancefromthemean;thequantityisinunits2,whichcanbehardtointerpretdirectly.</p></li><li><p>SumsofSquares(SS)shorthand:</p></li><li><p>Interpretation: the average squared distance from the mean; the quantity is in units^2, which can be hard to interpret directly.</p></li><li><p>Sums of Squares (SS) shorthand: SS = \sum{i=1}^{n} (xi - ar{x})^2, sosososo s^2 = \frac{SS}{n}. </p></li><li><p>Relationshiptodata:varianceincreaseswithdispersion;higherSSorhigheraveragesquareddeviationslargervariance.</p></li></ul></li><li><p>Standarddeviation</p><ul><li><p>Definition:squarerootofthevariance;bringsthemetricbacktooriginalunits.</p></li><li><p>Formula:</p></li><li><p>Relationship to data: variance increases with dispersion; higher SS or higher average squared deviations ⇒ larger variance.</p></li></ul></li><li><p>Standard deviation</p><ul><li><p>Definition: square root of the variance; brings the metric back to original units.</p></li><li><p>Formula: s = \sqrt{s^2} = \sqrt{ \frac{1}{n} \sum{i=1}^{n} (xi - ar{x})^2 }. </p></li><li><p>Interpretation:typicaldistanceofascorefromthemeanintheoriginalunits;e.g.,ifunitsarecentimeters,SDisincentimeters.</p></li></ul></li><li><p>Whyusevarianceandstandarddeviation</p><ul><li><p>Varianceisaconvenientsteppingstonetomanyformulas(leastsquares,ANOVA,regression,etc.).</p></li><li><p>Standarddeviationismoreinterpretablebecauseitisinthesameunitsasthedata.</p></li></ul></li><li><p>Examplewalkthrough(smalldataset)</p><ul><li><p>Data:<br></p></li><li><p>Interpretation: typical distance of a score from the mean in the original units; e.g., if units are centimeters, SD is in centimeters.</p></li></ul></li><li><p>Why use variance and standard deviation</p><ul><li><p>Variance is a convenient stepping stone to many formulas (least squares, ANOVA, regression, etc.).</p></li><li><p>Standard deviation is more interpretable because it is in the same units as the data.</p></li></ul></li><li><p>Example walkthrough (small dataset)</p><ul><li><p>Data:<br> x = [2, 4, 8, 10],
      n = 4; \, \bar{x} = \frac{2+4+8+10}{4} = 6.

    • Deviations: d=[2−6,4−6,8−6,10−6]=[−4,−2,2,4].d=[2−6,4−6,8−6,10−6]=[−4,−2,2,4]. d = [2-6, 4-6, 8-6, 10-6] = [-4, -2, 2, 4].

    • Squared deviations: d2=[16,4,4,16].d2=[16,4,4,16]. d^2 = [16, 4, 4, 16].

    • Sum of squares: SS=40;s2=SS/n=40/4=10;s=10≈3.16.SS=40;s2=SS/n=40/4=10;s=10​≈3.16. SS = 40; \, s^2 = SS/n = 40/4 = 10; \, s = \sqrt{10} \approx 3.16. </span></p></li><li><p>Interpretation:typicaldeviationfromthemeanisabout3.16units.</p></li></ul></li><li><p>Largerexample(fromthelecture)</p><ul><li><p>Data:10valueswithmean16andSS=168;<br><span>s2=168/10=16.8,s=16.84.10.s2=168/10=16.8,s=16.84.10.</span></p></li><li><p>Interpretation: typical deviation from the mean is about 3.16 units.</p></li></ul></li><li><p>Larger example (from the lecture)</p><ul><li><p>Data: 10 values with mean 16 and SS = 168;<br><span>s2=168/10=16.8,s=16.8≈4.10.s2=168/10=16.8,s=16.8​≈4.10. s^2 = 168/10 = 16.8, \, s = \sqrt{16.8} \approx 4.10. </span></p></li><li><p>Note:thestandarddeviationofabout4.10givesasenseofspreadaroundthemean;avalue 4fromthemeancoversthecentralportionofthedata.</p></li></ul></li><li><p>Importantpropertiesandinterpretations</p><ul><li><p>Units:varianceinunits2;standarddeviationinoriginalunits.</p></li><li><p>Thenormaldistributionhasaspecial,wellknownrelationshipwithSDviathe689599.8rule(thenextpoint).</p></li><li><p>Thestandarddeviationisthekeydescriptorofvariabilityusedinmanyinferentialtechniques(e.g.,confidenceintervals,zscores,ttests)becauseitconnectsthespreadtothemeaninadirectlyinterpretableway.</p></li></ul></li><li><p>The689599.8rule(fornormallydistributeddata)</p><ul><li><p>About68</span></p></li><li><p>Note: the standard deviation of about 4.10 gives a sense of spread around the mean; a value ~4 from the mean covers the central portion of the data.</p></li></ul></li><li><p>Important properties and interpretations</p><ul><li><p>Units: variance in units^2; standard deviation in original units.</p></li><li><p>The normal distribution has a special, well-known relationship with SD via the 68-95-99.8 rule (the next point).</p></li><li><p>The standard deviation is the key descriptor of variability used in many inferential techniques (e.g., confidence intervals, z-scores, t-tests) because it connects the spread to the mean in a directly interpretable way.</p></li></ul></li><li><p>The 68-95-99.8 rule (for normally distributed data)</p><ul><li><p>About 68% of data fall within one standard deviation of the mean:&nbsp;<span>xˉ±sxˉ±s \bar{x} \pm s </span></p></li><li><p>About95</span></p></li><li><p>About 95% within two standard deviations:&nbsp;<span>xˉ±2sxˉ±2s \bar{x} \pm 2s </span></p></li><li><p>About99.8</span></p></li><li><p>About 99.8% within three standard deviations:&nbsp;<span>xˉ±3sxˉ±3s \bar{x} \pm 3s

    • This rule helps interpret how typical values lie relative to the mean in a normal distribution and underpins standardization via Z scores.

  • Practical implications of variability

    • Low variability around the mean means individuals are close to the mean; in school planning, you can tailor a lesson around the mean with confidence that most students perform similarly.

    • High variability means some individuals will be far from the mean; teaching, testing, or evaluation should accommodate a broader range of abilities.

    • In decision-making (e.g., selecting players, setting policies), knowing variability informs risk and planning (e.g., two players with same mean but different variability differ in reliability).

Population vs. Sample; Parameters vs. Statistics

  • Population vs. sample concepts

    • Population: the entire group of interest (e.g., all psych 1040 students, all Australians, all humans).

    • Sample: a subset drawn from the population (ideally randomly) to estimate population characteristics.

  • Notation and terminology

    • Population mean: μμ \mu  (mu) — a parameter (true mean of the population).

    • Sample mean: xˉxˉ \bar{x}  or sometimes mm m  — a statistic used to estimate the population mean.

    • The idea of an estimator: a statistic (like xˉxˉ \bar{x} ) used to estimate a population parameter (like μμ \mu ).

    • Unbiasedness of the sample mean: across repeated random samples, the average of the sample means converges to the true population mean.

  • Why sampling matters

    • In practice, you rarely measure the entire population due to cost and feasibility; random sampling provides estimates that are informative about the population.

    • The sample mean as an estimator is central to many statistical methods; its unbiasedness supports inferences about the world.

  • Population parameters vs. sample statistics in research practice

    • Population parameter examples: population mean μμ \mu , population variance, etc. (unknown in most real-world cases).

    • Sample statistic examples: sample mean xˉxˉ \bar{x} </span>,samplevariance,samplestandarddeviation,etc.</p></li></ul></li><li><p>Realworldimplications</p><ul><li><p>Pollingandmarketresearchrelyonrandomsamplestoestimatepopulationpreferences(e.g.,voting,consumerbehavior).</p></li><li><p>Medicalandpsychologyresearchgeneralizesfromsamplestopopulationswithcaveatsaboutrepresentativenessandsamplingerror.</p></li></ul></li></ul><h3id="fefd0e53f7e846909e65f48b85e57434"datatocid="fefd0e53f7e846909e65f48b85e57434"collapsed="false"seolevelmigrated="true">PuttingItAllTogether:PracticalTakeawaysandNextSteps</h3><ul><li><p>Whentousewhichmeasureofcentraltendency</p><ul><li><p>Ifdataareroughlysymmetricandnotheavilyskewed:meanisagooddefault;itusesalldataandsupportsmanyformulas.</p></li><li><p>Ifdataareskewedorhavemeaningfuloutliers:medianprovidesamorerobusttypicalvalue.</p></li><li><p>Ifdataarenominal:modeistheprimarydescriptivestatistic;notsuitableformanycalculations.</p></li><li><p>Inbimodaldistributions:reportmode(s)andconsiderthecontext;mean/mediancanbemisleadingaboutthemosttypicalvalues.</p></li></ul></li><li><p>Whentoreportwhichmeasureofvariability</p><ul><li><p>Formanypurposes,reportthestandarddeviationbecauseitisinthesameunitsasthedataandalignswiththemeantodescribespreadaroundthecenter.</p></li><li><p>Rangecanbereportedforaquick,roughsenseofspreadbutdoesnotdescribethedistributionbetweenendpoints.</p></li></ul></li><li><p>Relationshiptonexttopicsinthecourse</p><ul><li><p>WewillbuildonZscores(standardization)toenablecomparisonsacrossdifferentscalesandconditions.</p></li><li><p>Zscoresunderpinttestsandotherinferentialmethodsintroducedlaterinthecourse.</p></li></ul></li><li><p>Practicalcourseguidancediscussedinthelecture</p><ul><li><p>Midsemesterexamcoverage:weeks14;calculatorspermitted(approvedmodelsviatheBlackboardpage).</p></li><li><p>Emphasisonusingtutorialsforassignments;tutorialsoftenspecifyexactlywhatisrequiredtoearnfullmarks.</p></li><li><p>Recommendedreadings:chapter2oftheAarontextbook;finishproblemset2.1(questions14);extendmodulematerials;revisionforthemidsemesterexam.</p></li></ul></li><li><p>Ethicalandpracticalnotes</p><ul><li><p>Itisimportanttoconsiderwhetheryoursampleisrepresentativeofthepopulationwhenmakinginferences.</p></li><li><p>Randomsamplinghelpsensurerepresentativeness;biasedsamplescanleadtomisleadingparameterestimates.</p></li><li><p>Inappliedcontexts(education,healthpsychology),understandingcentraltendencyandvariabilitysupportsfairandeffectivedecisionmakingandpolicy.</p></li></ul></li></ul><h3id="094936c688ff4aa3ba5a2aa3f995e26d"datatocid="094936c688ff4aa3ba5a2aa3f995e26d"collapsed="false"seolevelmigrated="true">KeyFormulas(recap)</h3><ul><li><p>Mean(sample):<br></span>, sample variance, sample standard deviation, etc.</p></li></ul></li><li><p>Real-world implications</p><ul><li><p>Polling and market research rely on random samples to estimate population preferences (e.g., voting, consumer behavior).</p></li><li><p>Medical and psychology research generalizes from samples to populations with caveats about representativeness and sampling error.</p></li></ul></li></ul><h3 id="fefd0e53-f7e8-4690-9e65-f48b85e57434" data-toc-id="fefd0e53-f7e8-4690-9e65-f48b85e57434" collapsed="false" seolevelmigrated="true">Putting It All Together: Practical Takeaways and Next Steps</h3><ul><li><p>When to use which measure of central tendency</p><ul><li><p>If data are roughly symmetric and not heavily skewed: mean is a good default; it uses all data and supports many formulas.</p></li><li><p>If data are skewed or have meaningful outliers: median provides a more robust “typical” value.</p></li><li><p>If data are nominal: mode is the primary descriptive statistic; not suitable for many calculations.</p></li><li><p>In bimodal distributions: report mode(s) and consider the context; mean/median can be misleading about the most typical values.</p></li></ul></li><li><p>When to report which measure of variability</p><ul><li><p>For many purposes, report the standard deviation because it is in the same units as the data and aligns with the mean to describe spread around the center.</p></li><li><p>Range can be reported for a quick, rough sense of spread but does not describe the distribution between endpoints.</p></li></ul></li><li><p>Relationship to next topics in the course</p><ul><li><p>We will build on Z scores (standardization) to enable comparisons across different scales and conditions.</p></li><li><p>Z scores underpin t tests and other inferential methods introduced later in the course.</p></li></ul></li><li><p>Practical course guidance discussed in the lecture</p><ul><li><p>Mid-semester exam coverage: weeks 1–4; calculators permitted (approved models via the Blackboard page).</p></li><li><p>Emphasis on using tutorials for assignments; tutorials often specify exactly what is required to earn full marks.</p></li><li><p>Recommended readings: chapter 2 of the Aaron textbook; finish problem set 2.1 (questions 1–4); extend module materials; revision for the mid-semester exam.</p></li></ul></li><li><p>Ethical and practical notes</p><ul><li><p>It is important to consider whether your sample is representative of the population when making inferences.</p></li><li><p>Random sampling helps ensure representativeness; biased samples can lead to misleading parameter estimates.</p></li><li><p>In applied contexts (education, health psychology), understanding central tendency and variability supports fair and effective decision-making and policy.</p></li></ul></li></ul><h3 id="094936c6-88ff-4aa3-ba5a-2aa3f995e26d" data-toc-id="094936c6-88ff-4aa3-ba5a-2aa3f995e26d" collapsed="false" seolevelmigrated="true">Key Formulas (recap)</h3><ul><li><p>Mean (sample):<br> ar{x} = rac{1}{n} \sum{i=1}^{n} xi </p></li><li><p>Variance(sample):<br></p></li><li><p>Variance (sample):<br> s^2 = rac{1}{n} \sum{i=1}^{n} (xi - ar{x})^2 </p></li><li><p>Standarddeviation(sample):<br></p></li><li><p>Standard deviation (sample):<br> s = \sqrt{s^2} = \sqrt{ \frac{1}{n} \sum{i=1}^{n} (xi - ar{x})^2 } </p></li><li><p>Sumsofsquares(SS):<br></p></li><li><p>Sums of squares (SS):<br> SS = \sum{i=1}^{n} (xi - ar{x})^2 </p></li><li><p>Sumofdeviationsfromthemean:<br></p></li><li><p>Sum of deviations from the mean:<br> \sum{i=1}^{n} (xi - \bar{x}) = 0

    • Normal distribution intuition (68-95-99.8 rule):

    • Within one standard deviation: ar{x} - s \leq x \leq \bar{x} + s ar{x} - s \leq x \leq \bar{x} + s  contains about 68% of the data; within two standard deviations contains about 95%; within three contains about 99.8%.


    Z-Scores and the Normal Distribution (Lecture Notes)

    • Acknowledgment of country: recognize the traditional owners of the lands where we meet, their ancestors and descendants, and their cultural and spiritual connections to country, acknowledging their contributions to Australian and global society and that these lands have been sites of education and research for millennia.

    • Week-to-week building: this week builds on last week's topic (distributions) and sets up next week's focus (correlations), which rely on z-scores. Correlations use z-scores in their calculation and are a core method in research and the basis of the upcoming assignment.

    • Recap: distributions have three key characteristics to describe them:

      • Shape

      • Measure of central tendency (where the center sits; the middle score)

      • Measure of spread (how spread out the scores are)

      • When a distribution is symmetric around its central tendency (the mean for a normal distribution), it tends to form a normal distribution (bell-shaped curve, Gaussian).

    • Core concept for this week: z-scores are normal scores expressed in units of standard deviations (SD). They are a standardization of scores from any distribution, enabling comparisons across different scales.

    • Connection to standard deviation (SD): we already learned how to compute SD last week. Z-scores convert raw scores into standard deviation units using that SD. This is a unit conversion, not a change in the relative position of scores.

    • Why standardization matters: converting to z-scores allows comparisons across apples and oranges (different scales), and enables precise probability calculations under the standard normal curve.

    • What you’ll learn and how it fits into the course:

      • This week builds on SDs and normality; next week covers correlations (which use z-scores) and form a basis for the assignment.

      • The normal distribution is a mathematical construct that can be described with a formula, allowing calculations beyond direct empirical data, and enabling a uniform framework across many variables.

    • What is a normal distribution? a family of symmetric, bell-shaped curves where:

      • The mean, median, and mode coincide (in a perfectly normal distribution).

      • The spread is characterized by SD; different distributions can have different means and SDs.

      • With enough data, many real-world variables (e.g., height, IQ) approach this ideal shape.

    • Why can we rely on a normal shape? The central limit theorem underpins parametric tests (e.g., t-tests). If the data are roughly normally distributed or the sample size is large, many statistical procedures work well and yield inferences about populations.

    • The standard deviation as a central concept:

      • It reflects the typical distance of scores from the mean.

      • It is the unit used to express dispersion; converting data to SD units yields z-scores.

    • Units and conversions (intuition):

      • Converting to SD units is a unit change, akin to converting height between centimeters and inches. The actual value doesn’t change, only the label changes.

      • Example intuition: Phil’s height (in inches vs. cm) is the same measurement represented differently; the same goes for standard deviation when converting to z-scores.

    • What is a z-score? The deviation of a score from the mean, expressed in units of SD.

      • Positive z-scores: above the mean; negative z-scores: below the mean; z ≈ 0: around the mean.

      • Z-score is a standard score: it converts raw scores into standard units, enabling comparisons across distributions.

    • Notation for mean and SD (population vs. sample):

      • Population mean:

      • μ (mu)

      • Population SD: σ (sigma)

      - Sample mean:

      m (often denoted as b5 or sometimes bc; here referenced as m)

      • Sample SD: s.d. (often denoted as s or s_d)

      • In practice, if a formula uses μ and σ you’re dealing with population parameters; if it uses m and s (or s.d.) you’re dealing with a sample.

    • The z-score formula (transformation to standard normal):

      • For a score x from a distribution with mean μ and SD σ:
        z=x−μσz=σx−μ​z = \frac{x - \mu}{\sigma}</span></p></li><li><p>Conversely,givenazscoreandthedistributionparameters,theoriginalscorecanberecoveredby:<br><span>x=zσ+μx=zσ+μ</span></p></li><li><p>Conversely, given a z-score and the distribution parameters, the original score can be recovered by:<br><span>x=zσ+μx=zσ+μx = z\,\sigma + \mu</span></p></li><li><p>Forasample,replaceμwiththesamplemeanmandσwiththesampleSDs:<br><span>z=xmsz=sxm</span></p></li><li><p>For a sample, replace μ with the sample mean m and σ with the sample SD s:<br><span>z=x−msz=sx−m​z = \frac{x - m}{s}</span><br>and<br><span>x=zs+mx=zs+m</span><br>and<br><span>x=zs+mx=zs+mx = z\,s + m

    • Why z-scores are useful:

      • They put different distributions on a common scale (standard normal with mean 0 and SD 1).

      • They allow meaningful comparisons across different measures (e.g., test scores from different subjects).

      • They enable precise probability statements about where a score lies in its distribution, using the standard normal curve.

    • Three distributions: a demonstration of standardization across different spreads:

      • Case A: mean = 100, SD = 10, score x = 110

      • Deviation = 110 - 100 = 10; z = 10/10 = 1.

      • Case B: mean = 100, SD = 15, score x = 110

      • Deviation = 10; z = 10/15 ≈ 0.67.

      • Case C: mean = 100, SD = 25, score x = 110

      • Deviation = 10; z = 10/25 = 0.40.

      • Observation: same raw score (110) is more unusual in Case A (higher SD means more spread) than in Case C, when viewed in SD units. Z-scores reveal equivalent relative positions across different distributions.

    • Why the same score can be equally 'unusual' across distributions:

      • If two distributions have the same relative position (e.g., 1 SD above the mean) but different spreads, the z-score places them at the same relative location in standard units. This allows fair comparison of scores from different contexts.

    • Practical example: comparing two courses with different grading schemes

      • If calculus has mean 60 and SD 10, and another subject has mean 90 and SD 15, a raw score of 70 vs 105 cannot be judged by raw scores alone. Converting to z-scores lets you compare who performed better relative to their course’s distribution.

    • Real-world examples (illustrative):

      • Don Bradman (cricket) vs. Ted Williams (baseball):

      • By converting their sport-specific scores to z-scores, you can compare dominance within their peers despite different scales and sports.

      • Example approximations mentioned in the lecture: Bradman’s z-score was extremely high (described as around 4 to 5 SDs above the mean in the example). Williams also scored highly in his own distribution, but the z-score difference highlights relative outperformance within each sport.

      • Einstein’s IQ (reported around 180):

      • IQ tests are designed to have μ = 100 and σ = 15, so z = (180 - 100)/15 ≈ 5.33. Such a score is extraordinarily rare under the normal assumption.

    • Worked examples (practice with z-scores and back-conversion):

      • Example 1 (reverse from z to raw score):

      • Given mean μ = 55, SD σ = 3, and desired z = 1.5

      • Convert to raw score: x=μ+zσ=55+1.5×3=59.5x=μ+zσ=55+1.5×3=59.5x = μ + z\,σ = 55 + 1.5\times 3 = 59.5</span></p></li><li><p>Example2(reverseforanothersubject):</p></li><li><p>Givenμ=60,σ=10,z=0.4</p></li><li><p><span>x=60+(0.4)×10=56x=60+(0.4)×10=56</span></p></li><li><p>Example 2 (reverse for another subject):</p></li><li><p>Given μ = 60, σ = 10, z = -0.4</p></li><li><p><span>x=60+(−0.4)×10=56x=60+(−0.4)×10=56x = 60 + (-0.4)\times 10 = 56</span></p></li><li><p>Theseshowthatconvertingtozscoresandbackisstraightforwardandconsistent.</p></li></ul></li><li><p>Thestandardnormaldistributionanditsproperties:</p><ul><li><p>Ifwestandardizeanynormaldistribution,wegetthestandardnormaldistributionwith:<br></span></p></li><li><p>These show that converting to z-scores and back is straightforward and consistent.</p></li></ul></li><li><p>The standard normal distribution and its properties:</p><ul><li><p>If we standardize any normal distribution, we get the standard normal distribution with:<br>\muZ = 0, \quad \sigmaZ = 1</p></li><li><p>Theshapeispreservedinrelativepositions;onlytheaxislabelschange.</p></li><li><p>Thenormalcurveissymmetric;probabilitiesononesidemirrortheother.</p></li><li><p>Theareaunderthecurvefromtois1(i.e.,100</p></li><li><p>The shape is preserved in relative positions; only the axis labels change.</p></li><li><p>The normal curve is symmetric; probabilities on one side mirror the other.</p></li><li><p>The area under the curve from -∞ to ∞ is 1 (i.e., 100% of data).</p></li><li><p>The point of inflection occurs at one SD from the mean (in a standard normal, at Z = ±1).</p></li></ul></li><li><p>Reading and using z-tables (the practical tool):</p><ul><li><p>A z-table provides, for a given Z,:</p></li><li><p>The area between the mean (0) and Z: P(0 ≤ Z ≤ z)</p></li><li><p>The area beyond Z: P(Z ≥ z) or P(Z ≤ -z) depending on tail orientation</p></li><li><p>Because the standard normal is symmetric, you can obtain negative-side probabilities from positive-side values by symmetry.</p></li><li><p>Common reference values:</p></li><li><p>For Z = 1, the area between the mean and Z is 34.13% (0.3413).</p></li><li><p>For Z = 1.96, the cumulative tail beyond Z is about 2.5% on one side (hence 5% two-tailed).</p></li><li><p>The 68-95-99.7 rule (empirical rule):</p></li><li><p>Approximately 68% of data lie within ±1 SD</p></li><li><p>Approximately 95% lie within ±2 SDs</p></li><li><p>Approximately 99.7% lie within ±3 SDs</p></li><li><p>These probabilities apply to any normally distributed variable after standardization.</p></li></ul></li><li><p>Percentiles and z-tables in practice:</p><ul><li><p>Percentile rank of a score is the percentage of scores below that value.</p></li><li><p>To find a percentile from a z-score, use the z-table to get the area between the mean and the z-score, and add 50% (to account for the below-mean half).</p></li><li><p>Example: IQ = 115 with μ = 100, σ = 15</p></li><li><p>z = (115 - 100)/15 = 1</p></li><li><p>Area between mean and Z = 34.13% → percentile = 50% + 34.13% = 84.13%</p></li><li><p>For very high percentiles (e.g., 98th), the z-table may not go that far; interpolate or use the approximate z-value (e.g., Z ≈ 2.05 for the 98th percentile in many tables).</p></li></ul></li><li><p>Worked percentile examples and applications:</p><ul><li><p>Mensa cutoff: 98th percentile on IQ tests. Steps illustrated:</p></li><li><p>98% area corresponds to the upper tail area of 0.02 (two-tailed includes the lower tail as well for a two-tailed test; for the upper tail, use the 0.48 area beyond the 0.50 marker).</p></li><li><p>The corresponding z-score is about Z ≈ 2.05.</p></li><li><p>Convert back to raw score:&nbsp;<span>X=μ+Zσ=100+2.05×15≈130.75X=μ+Zσ=100+2.05×15≈130.75X = μ + Z\,σ = 100 + 2.05\times 15 \approx 130.75</span></p></li><li><p>Between100and115onIQ(μ=100,σ=15):</p></li><li><p>z=(115100)/15=1</p></li><li><p>Percentilebetweenmeanand115is34.13</span></p></li><li><p>Between 100 and 115 on IQ (μ = 100, σ = 15):</p></li><li><p>z = (115 - 100)/15 = 1</p></li><li><p>Percentile between mean and 115 is 34.13%</p></li><li><p>Therefore, percentile from leftmost to 115 is 50% + 34.13% = 84.13%</p></li><li><p>Top 5% scores on IQ:</p></li><li><p>Find z such that P(Z ≥ z) = 0.05 (tail probability)</p></li><li><p>z ≈ 1.64 (upper-tail cutoff)</p></li><li><p>Convert back to raw IQ score:&nbsp;<span>X=μ+zσ=100+1.64×15≈124.6X=μ+zσ=100+1.64×15≈124.6X = μ + z\,σ = 100 + 1.64\times 15 ≈ 124.6</span></p></li><li><p>Twotailed5</span></p></li><li><p>Two-tailed 5% significance example:</p></li><li><p>For a two-tailed test with α = 0.05, each tail is 2.5%.</p></li><li><p>Critical Z values: Z = ±1.96.</p></li><li><p>Corresponding IQ cutoffs: upper ≈ 100 + 1.96×15 = 129.4; lower ≈ 100 - 1.96×15 = 70.6.</p></li></ul></li><li><p>Practical workflow using z-scores in research:</p><ul><li><p>Step 1: Check if the data are approximately normally distributed (or use a large enough sample via the Central Limit Theorem).</p></li><li><p>Step 2: Standardize scores to z-scores using&nbsp;<span>z=x−μσz=σx−μ​z = \frac{x - \mu}{\sigma}.

      • Step 3: Use the standard normal curve to determine probabilities, percentiles, and relative standing via z-tables (or calculators).

      • Step 4: If needed, convert back to raw scores using x=zσ+μx=zσ+μx = z\,\sigma + \mu to report concrete values.

      • Step 5: For comparisons across different measures, compare z-scores rather than raw scores.

    • Important practical implications and uses:

      • Z-scores provide a precise, standardized way to understand how far a score is from the mean, in SD units.

      • They enable meaningful comparisons across different measures and scales (e.g., cross-subject performance, different sports, or different tests).

      • They underpin probabilities and expectations about populations, enabling precise predictions and clinical cutoffs (e.g., thresholds like prosopagnosia ~5% cutoff).

      • They form the backbone of the standard normal curve, so results about probabilities and percentiles generalize across all normally distributed variables.

    • Real-world connections and philosophical notes:

      • The normal distribution is pervasive in psychology and natural phenomena; many variables converge to normality with sufficient data collection due to sampling processes and the central limit theorem.

      • Statistical reasoning using z-scores is a key skill for making inferences about populations from samples.

      • The practice has ethical and practical implications when used for clinical cutoffs or decisions (e.g., diagnosing conditions, eligibility for programs). Cutoffs are based on percentile/tail criteria, and the choice of α (e.g., 0.05) reflects conventions about balancing false positives and false negatives.

    • Quick connections to next topics:

      • Next week: correlations (which rely on z-scores for computation and interpretation).

      • Correlation analysis uses standardized scores to measure relationships between variables on different scales.

    • Summary takeaways:

      • Z-scores convert raw scores into SD units, enabling unit-free comparisons and precise probability calculations under the normal curve.

      • The standard normal distribution (mean 0, SD 1) provides a universal reference for all normal distributions.

      • Use z-tables (or calculators) to obtain percentile ranks and tail probabilities, and convert back to raw scores when needed.

      • The 68-95-99.7 rule gives quick intuition about where most data lie relative to the mean in a normal distribution.

    • Readings and prep for next lecture:

      • Reading: Chapter 3 of the textbook (this week).

      • Preparation for next week's lecture: Chapter 11 (Correlations).

      • Practice problems in UQ Extend Module 6.

    • Exam reminder (brief): details about permitted items and logistics for the upcoming exam are provided in class materials; bring pencils, ID, and any approved resources as specified.

    • Final note: the aim of this content is not just to pass an exam but to understand how distributions work in the world and how we can make precise inferences about them using z-scores and the normal distribution.


    Introduction to Hypothesis Testing, Probability, and Sampling Distributions

    This week's lecture, delivered by Josh Sabio, focuses on introducing hypothesis testing, probability, and sampling distributions. It builds upon previous lectures and introduces a cornerstone concept for inferential statistics.

    Acknowledgement of Country

    The University of Queensland acknowledges the traditional owners and their custodianship of the lands on which we meet, paying respects to their ancestors and descendants who maintain cultural and spiritual connections to country, and recognizing their valuable contributions to Australian and global society.

    Recalling the Normal Distribution

    We begin by recalling the normal distribution, a fundamental concept from previous lectures. It is useful because many variables of interest to psychologists, such as IQ, short-term memory capacity, personality traits, and even facial attractiveness, tend to be normally distributed. If a distribution is normal, we can use z-tables to determine the probability of certain values occurring within it. For example:

    • The probability that an individual's score is one standard deviation below the mean.

    • The probability that values fall between specific intervals (e.g., within $ ext{one standard deviation} $ of the mean).

    However, the utility of z-tables is limited to variables that are explicitly normally distributed. If a variable of interest is not normal, direct application of z-tables or finding the area under the curve is not possible. This week introduces a distribution that is always normal (under certain conditions), forming the foundation of inferential statistics.

    Core Concepts of Inferential Statistics

    This lecture lays the groundwork for understanding inferential statistics, covering:

    • Characteristics of populations versus samples.

    • Factors affecting sampling, such as sampling variability and sampling error.

    • The supremely important sampling distribution of the mean, including its characteristics, the standard error of the mean (SEM), and its connection to the Central Limit Theorem.

    • Practical applications in exam-style questions.

    Populations vs. Samples

    • Population: Refers to the entire group of individuals or observations that share a particular characteristic and to which researchers wish to generalize their conclusions. Its size and characteristics depend on how it's defined (e.g., all Australian citizens, all marmosets). Researchers typically aim to make conclusions about the population at large.

    • Sample: A subset of the population, chosen for research due to feasibility and affordability limitations (e.g., it's impractical to study all $30$ million Australian residents). In rare cases (e.g., studying a rare disease or a specific, small cohort like all students in a course), access to a full population might be possible.

    Notation: Parameters vs. Statistics
    • Parameters: Characteristics of a Population, described using Parameters and Greek alphabet.

      • Population mean: $ ext{mu} ext{ } ( ext{ } oldsymbol{oldsymbol{ ext{ extmu}}} ext{ }) $.

      • Population standard deviation: $ ext{sigma} ext{ } ( ext{ } oldsymbol{oldsymbol{ ext{ extsigma}}} ext{ }) $.

    • Statistics: Characteristics of a Sample, described using Statistics and Roman alphabet.

      • Sample mean: $ M $.

      • Sample standard deviation: $ S $.

    Sampling Variability and Sampling Error

    • Random Sampling: The principle that individuals are selected from a population such that each has an equal and independent chance of being chosen. This aligns with the concept of independent draws.

    • Sampling Error: The inherent difference between a randomly drawn sample's statistics and the corresponding population parameters. A sample's mean will inevitably deviate from the population's mean (e.g., a handful of balls from a basket will have a mean different from the whole basket).

    • Sampling Variability: The fact that, due to chance, two random samples drawn from the same population will have different statistics. Iteratively drawing samples will demonstrate this fluctuation (e.g., multiple petri dish samples from the same pool of bacteria will show varying bacterial counts).

    Literary Digest Example (1936 Presidential Election)

    This historical example illustrates the importance of unbiased random sampling. Literary Digest predicted a Landon victory based on $2$ million responses from $10$ million questionnaires. However, Roosevelt won. The survey was biased because it used car registries and telephone numbers during the Great Depression, inherently sampling wealthier individuals who could afford such luxuries, thus not representing the general population.

    Probability - First Principles

    Understanding probability is crucial for inferential statistics.

    • Basic Rule: For any event, the $ ext{probability that it will occur} (P( ext{event})) + ext{the probability that it will not occur} (P( ext{not event})) = 1 $.

    • Exact Probabilities: These can be derived from frequency distributions through recursive computation.

    Birthday Problem Example
    • The probability that $2$ of $3$ people share a birthday is $ P = 0.008 $. This is computed by considering the probability that each person does not share a birthday with the previous ones ($365/365 imes 364/365 imes 363/365$, etc.) and subtracting from $1$.

    • When plotted, the probability of a shared birthday rapidly increases with the number of people in a room. In a classroom of $30$ people, the probability of at least one shared birthday is approximately $70.63 ext{\textperthousand} $.

    Jar of Balls Example
    • Jar 1: $10$ green, $10$ red balls ($20$ total). $P( ext{red ball}) = 10/20 = 0.5$ ($50 ext{\textperthousand} $).

    • Jar 2: $19$ green, $1$ red ball ($20$ total). $P( ext{red ball}) = 1/20 = 0.05$ ($5 ext{\textperthousand} $).

    • Conclusion: If we know the composition of a population, we can make statements about the probability of events. This connection is fundamental to inferential statistics.

    Certainty and Statistical Convention
    • $ P=1.0 $: Absolutely certain (e.g., death and taxes).

    • $ P=0.5 $: $50/50$ chance (e.g., pulling a red card from a deck).

    • $ P=0.25 $: Pulling a diamond from a deck ($13 ext{ diamonds } / 52 ext{ cards } = 0.25$).

    • $ P=0.038 $: Pulling a red two from a $52$-card deck ($2 ext{ red } 2s / 52 ext{ cards } ext{ } oldsymbol{oldsymbol{ ext{ extapprox }}} 0.038$).

    • Convention: In statistics, $ oldsymbol{oldsymbol{ ext{P } < 0.05}} $ is arbitrarily adopted as the threshold for a rare event or a significant effect. This means an event occurring only $5 ext{\textperthousand} $ of the time is considered unusual or truly different.

    The Gambler's Fallacy and Independent Draws
    • Independent Draws: The outcome of one trial does not influence the distribution of outcomes in the next (e.g., one coin flip doesn't affect the next). The


    Introduction to Hypothesis Testing

    • Welcome and importance of the lecture

    • This lecture builds on previous concepts, focusing on hypothesis testing as it is critical for conducting experiments and interpreting results in science.

    Overview of Topics Covered

    • Statistical inference and decision-making processes under uncertainty

    • Understanding null hypothesis and alternative hypothesis

    • Explanation of statistical significance

    • Review of sampling distributions

    • Identifying decision errors in hypothesis testing

    Concept of Statistical Inference

    • Science often deals with uncertainty when asking questions (example: Is smoking harmful?).

    • Hypothesis testing provides a framework for answering these questions, giving us likelihood rather than certainty.

    • Statistical tests can tell us the probability of results being due to chance but do not confirm the correctness of a theory or the quality of the experiment conducted.

    Null and Alternative Hypotheses

    • Null Hypothesis (H0): Assumes that there is no effect or no difference; serves as a default position.

      • Example: "Distraction does not impair memory performance."

    • Alternative Hypothesis (H1): Represents what the researcher aims to prove, stating that there is an effect or a difference.

      • Example: "Distraction impairs memory performance."

    • Importance of accurately formulating these hypotheses when designing experiments.

    Statistical Significance

    • Statistical Significance: A result is significant if it is unlikely to have occurred under the null hypothesis, typically denoted as $p < 0.05$.

    • The p-value indicates the probability of observing the results if the null hypothesis is true.

    • Conventionally, a p-value of less than 0.05 is accepted for statistical significance, implying a 5% chance of committing a Type I error.

    Sampling Distributions

    • Defined characteristics of samples: shape, central tendency (mean, median, mode), and spread (range, variance, standard deviation).

    • Sampling Distribution of the Mean: Distributions are constructed by repeatedly sampling from the population and calculating means.

    • Central Limit Theorem states that as sample size increases, the distribution of sample means approaches a normal distribution.

      • This applies regardless of the population distribution.

    • Standard Error of the Mean (SEM): Indicates how much the sample mean is expected to vary from the true population mean; defined as:
      SEM=racextPopulationStandardDeviation(extσ)extSquareRootofSampleSize(n)SEM=racextPopulationStandardDeviation(extσ)extSquareRootofSampleSize(n)SEM = rac{ ext{Population Standard Deviation} ( ext{σ})}{ ext{Square Root of Sample Size} (n)}

    Decision Errors in Hypothesis Testing

    • Decision Errors: Errors that can be made during hypothesis testing, specifically Type I and Type II.

      • Type I Error (False Positive): Rejecting the null hypothesis when it is true.

      • Example: Concluding a drug is effective when it actually is not. Occurs with probability $α$ (commonly set at 0.05).

      • Type II Error (False Negative): Retaining the null hypothesis when it is false.

      • Example: Concluding a drug is ineffective when it actually works. Occurs with probability $β$.

    • The balance between reducing Type I and Type II errors is essential; reducing one can often increase the other.

    Importance of Proper Sample Size and Design

    • The size of the sample impacts the SEM; larger samples typically lead to a smaller SEM, thus increasing the likelihood of detecting true differences.

    • Proper experimental design helps mitigate bias in sampling and reduces potential errors.

    Conclusion

    • Understanding hypothesis testing, its foundations, and implications is crucial for conducting scientific research and interpreting data results clearly and accurately.

    • Emphasis on continuous learning as the concepts will be revisited in future lectures.

    • Next topics will include practical applications of t-tests in hypothesis testing and practical examples.

    • Reminder about important readings and quiz deadlines for reinforcing learning.


    Introduction and Review of Previous Lecture

    • Start of class after break

    • Reminder for students to have taken their time off during break

    • Focus: Reviewing the sampling distribution of the mean

      • Understanding means expected from a particular population

    Sampling Distribution of the Mean

    • Definition:

      • A population has a known mean ($\mu$) and standard deviation ($\sigma$).

      • From this population, one can calculate the distribution of means using sample size ($n$).

      • z-tables can be used to find probabilities for these means.

    • Focus of today's class:

      • Building on previous content regarding z-scores and introducing t-distributions.

      • Explain how today’s lesson consolidates the previous one with a small but critical adjustment.

    Consolidation of Previous Content

    • z-tests were the main focus of prior lectures:

      • Understanding the z-score for individual scores and sample means.

      • z-distribution is rarely known in real-world phenomena.

      • Exception: IQ scores that have known means and standard deviations.

    • Transition to t-distributions:

      • Real-world data often does not provide population means or standard deviations.

      • Use of t-distributions to conduct statistical analysis based on estimated population parameters.

    Acknowledgment of Indigenous Lands

    • Pay respect to traditional owners of the land.

    Review of Key Concepts from Previous Lecture

    • Population Distribution:

      • Population mean ($\mu$).

      • Sample from population creating a distribution of means by sampling repeatedly.

      • Mean of the distribution of means is equal to the population mean ($\mu$).

      • Spread is defined by standard error of the mean (SEM):
        (SEM=σn)(SEM=n​σ​)(SEM = \frac{\sigma}{\sqrt{n}}).

      • SEM quantifies the error in mean estimates.

    • Hypothesis Testing:

      • Definition of the null hypothesis ($H0$) and alternative hypothesis ($H1$).

      • Emphasis on testing the null hypothesis and not the alternative hypothesis.

      • Errors in hypothesis testing: Type I error (false positive) and Type II error (false negative).

      • Importance of controlling confounding factors to validate hypotheses.

    Procedures for Hypothesis Testing

    • Calculating the z-score for means:

      • Given a distribution of sample means, determine the likelihood of a sample mean under the null hypothesis.

      • Typical z-score threshold for significance: 1.96 (for a 5% level).

      • Example: If a mean of a sample falls within expected range, retain $H_0$. If outside, reject it.

    Reiteration of Key Statistical Concepts

    • Null Hypothesis ($H_0$): No difference or effect expected in the measured variables (e.g., reaction times between alcoholics and non-drinkers).

    • Alternative Hypothesis ($H_1$): Expect a difference or effect.

    • Important to establish how to statistically support ($H_1$) through careful experimental design.

    Transition to T-Distributions

    • Definition of t-distribution:

      • Similar shape to normal distribution but varies based on degrees of freedom ($df$).

      • t-distributions account for variability in sample size.

      • Critical t and z values differ; t-values become more extreme with fewer degrees of freedom.

    Understanding Degrees of Freedom

    • Degrees of freedom ($df$) explained:

      • $df = n - 1$ for a single sample.

      • Concept: how many values can vary independently when estimating a parameter.

    • Example of calculating degrees of freedom:

      • If $n=4$, $df$ will be $3$. One value is fixed to satisfy constraints.

    Variance and Sample Variance

    • Explanation of the variance calculation:

      • Population variance ($\sigma^2$) vs sample variance ($s^2$).

      • Sample variance is calculated with correction: s2=SSn−1s2=n−1SS​s^2 = \frac{SS}{n - 1} to avoid underestimating variance.

    • Mean of sample means is unbiased, but variance calculations are biased unless corrected.

    • Bessel's Correction: Using $n-1$ corrects bias, making variance an unbiased estimator.

    Summing Up Statistics Steps

    • Steps in statistical tests:

      1. Calculate means, standard deviations, apply hypothesis testing frameworks.

      2. Transition between t-tests and z-tests depending on known versus unknown parameters.

      3. Use proper formulae to guide through variance estimation and the implications in reporting results.

    Single Sample T-Test Explained

    • Condition where population mean is known, but not standard deviation leads to t-tests.

    • Basis for using t-tests involves estimating population parameters and applying statistical testing.

      • Defined as observing if the sample mean is significantly different from the known population mean.

    Repeated Measures Example

    • Example scenario:

      • Comparing conditions: Self-tickling vs experimenter-tickling.

    • Operationalization and hypothesis framing.

    • Evaluating differences between two conditions using previous statistical methods discussed.

    Final Notes and Homework Suggestions

    • Close out by assigning relevant exercises from Chapter 7.

    • Preparation for next week's lecture on independent groups t-tests covering further intricacies in analysis.


    Introduction

    • This lecture is focused on the final statistical tests of the semester, transitioning into exam preparation in the upcoming weeks.

    • The speaker acknowledges the traditional owners of the land, showing respect for their heritage and contributions to societal development, emphasizing the long-standing traditions of research and learning on these lands.

    Overview of Statistical Testing Logic

    • The overall logic of the statistical tests discussed centers around measuring a mean and determining its relationship to a known population distribution.

    • Hypothesis Testing Framework:

      • When a mean falls within a specific range of values (likely area), it is assumed to originate from the hypothesized population distribution, leading to retention of the null hypothesis.

      • If the mean is in an unlikely area, the null hypothesis is rejected, suggesting the mean may come from a different population.

    • The primary goal is to minimize human bias in making scientific conclusions through strict criteria for determining significant effects.

    • Inclusion of these criteria aids in objective decision making, allowing the evaluation of means based on set statistical cutoffs (e.g., t or z cutoffs).

    Central Limit Theorem and Distribution Characteristics

    • The Central Limit Theorem asserts that regardless of a population’s distribution, the distribution of sample means will approach a normal distribution as sample size increases.

    • Key Characteristics of Distributions:

      • Shape: Determining whether the distribution is normal.

      • Center: Mean of the sampling distribution approximating the population mean.

      • Spread: Standard deviation of the distribution, which is crucial for analysis.

    • Most complexity in calculations arises from estimating or working with standard deviations, particularly when unknown.

    Statistical Tests Overview

    • Recap of statistical tests from the semester:

      • Z Test: Applicable when population standard deviation is known.

      • T Test: Used when the population standard deviation is unknown, requiring estimation.

      • Repeated Measures T Test: Involves one dataset examining the distribution of differences in scores related to factors like fatigue or carryover influences.

      • Independent Groups T Test: Compares two different groups with independent data distributions, focusing on estimating variance based on sample means and a pooled approach for variance when populations are assumed equal.

    Detailed Explanation of Statistical Calculations

    • When population variance is unknown, adjustments are made, specifically using (n−1)(n−1)(n-1) for degrees of freedom in variance calculations:
      s2=racextsumsofsquaresn−1s2=racextsumsofsquaresn−1s^2 = rac{ ext{sums of squares}}{n-1} .

    • The critical part of hypothesis testing revolves around confirming whether your observed mean difference is significant compared to the null expectation.

    • Emphasis on using a pooled variance weighted towards individual group sizes for estimating the population variance when using independent samples.

    Independent Groups T Test Process

    • Logistics of conducting an independent t-test entail:

      • Establishing and confirming assumptions (normal distribution, homogeneity of variance, independence of observations).

      • Calculating individual group means, variances, and pooled variance (the latter being weighted based on sample sizes). This combines estimates based on degrees of freedom to draw sound conclusions on population characteristics.

      • Use variance sum law: the variance of a distribution of differences equal to the sum of individual variances:
        extVar(Aext−B)=extVar(A)+extVar(B)extVar(Aext−B)=extVar(A)+extVar(B) ext{Var}(A ext{ - } B) = ext{Var}(A) + ext{Var}(B) </span></p></li><li><p>Thefinaltstatisticisderived:<br></span></p></li><li><p>The final t-statistic is derived:<br> t = rac{(ar{X}1 - ar{X}2)}{S{ ext{diff}}} wherewherewherewhereS{ ext{diff}}$$ is the standard error of the mean difference. The results inform whether to reject the null hypothesis based on observed versus expected outcomes from samples.

    Example: Learning to Juggle

    • A practical example demonstrates an independent groups t-test applied to learning efficiency in juggling under two conditions: distributed learning (15 participants) and massed learning (20 participants).

    • Assign variables confirming the different learning approaches impacts: comparing their performance in terms of catches made within the initial practice hours.

    • The hypothesis explores:

      • Null Hypothesis (H0): There is no significant difference in performance between learning schedules.

      • Alternative Hypothesis (H1): One learning schedule significantly outperforms the other.

    • Upon calculating means for each group from observed data:

      • Distributed learning mean = 7.4 catches

      • Massed learning mean = 5.2 catches

    • Statistical Testing: Conducting independent groups t-tests incorporated the estimated variances and applied the expected null hypothesis to assess whether recorded differences fell outside normal sampling error—effectively making the case for one approach being better than the other.

    Conclusion and Key Takeaways

    • The independence of the groups is crucial for valid statistical testing.

    • The speaker emphasizes the importance of correct experimental design and how poor design leads to confounding results that statistics alone cannot clarify.

    • Preparing for the exam requires focus on comprehensive understanding—working definitions and practice problems across different statistical tests.

    • The next week's material will include discussions of effect sizes and confidence intervals, critical tools for interpreting test results in practical applications.


    Overview of Current Lecture Content

    • Introduction and reminders about assignments.

      • Assignment due in three hours.

      • Expectation that all have submitted and that everything is okay.

    • Transition from discussing statistical tests to exploring additional questions in hypothesis testing.

    Disappointment in Hypothesis Testing

    • A common sentiment exists regarding hypothesis testing, specifically its limitations:

      • Main Limitation: We can only assess if a result is likely due to chance.

      • We cannot directly support the alternative hypothesis.

    • The course will cover additional methods to address significant questions such as:

      • How big is the effect likely to be?

      • Confidence intervals and their significance in understanding data sets.

    Confidence Intervals

    • Definition: A confidence interval provides a plausible range for estimates, beyond the single point estimate.

    • Application: The lecture today will discuss:

      • How to calculate confidence intervals within various contexts and statistical tests.

      • The construction involves wrapping a "buffer zone" around point estimates to account for potential error.

    Example of Confidence Intervals

    • Confidence intervals are essential for understanding variability and uncertainty:

      • For example, when taking a sample mean, we can extend the analysis to determine probable ranges where the true mean exists.

      • This leads to determining the confidence interval for sample data, ensuring estimates consider sampling errors.

    Relations to Other Statistical Concepts

    • Connection Between Hypothesis Testing and Confidence Intervals:

      • Confidence intervals provide equal information to what p-values offer. Both are now standard practice in scientific reporting.

    • Example Statement in Scientific Writing:

      • When writing journal articles, results typically include:

      • Mean improvements

      • T-value

      • P-value

      • 95% confidence interval

      • Effect size such as Cohen's d.

    Evolution of Statistical Practices

    • Discussion of historical context regarding the replication crisis in psychology statistics:

      • Approximately 15 years ago highlighted an absence of replicable results in many established studies.

      • Field has increasing awareness of statistical limitations and is transitioning towards better statistical practices.

    Effect Sizes

    • Transition into effect sizes as a concept tied closely with confidence intervals:

      • Definition: An effect size measures how substantial or impactful a result is, focusing on how big an effect is compared to other variables.

    Exploring Effect Sizes Further

    • Cohen's d: The most common effect size measure, standardizing effects across different studies.

    • It addresses additional concerns surrounding nuances of significance vs. meaningfulness.

    Small vs. Big Effects

    • The influence of sample size on the detection of effects:

      • A significant finding may not equate to clinical or practical importance, especially if effects are tiny (e.g., 6.33 milliseconds in a tickling condition).

      • Importance of reporting effect sizes in studies.

    Statistical Understanding Evolution

    • Awareness that misconception previously existed regarding interpreting p-values.

      • Historical views equated low p-values (e.g., 0.01) with a high likelihood of the alternative hypothesis being true (incorrect).

      • Importance of establishing educational norms in statistical understanding.

    Bayesian vs. Frequentist Approaches

    • Suggestion to recognize additional approaches for analyzing data:

      • Bayesian Statistics: Emphasizes prior knowledge and is complicated, typically taught later in academic programs.

      • Estimation Approach: Suggests a focus on estimating parameters instead of relying solely on conventional p-values.

    Recap of Lecture

    • Acknowledgment of traditional owners of lecture grounds to foster a respectful learning environment.

    • Recap of confidence intervals, hypothesis testing, and effect sizes as intertwined yet crucial statistical practices for a robust understanding of data.

    • Suggestion that students should prepare for midterm evaluations and exercises based on discussed exercises.

    Practical Exercises with Statistical Tests

    • Application examples with confidence intervals, p-values, t-tests explored in detail:

      • Illustrating how to conduct single sample t-tests vs. independent groups t-tests.

    • Working through examples to bolster conceptual understanding of statistical concepts and applications to real-world observational data.

    Conclusion

    • Discussion of upcoming material and expectations for quizzes and course evaluations:

      • Students are encouraged to give feedback on course content and resources.


    Final Lecture Overview

    • The final lecture marks the end of the course where students will focus on exam preparation and course revision.

    Exam Expectations

    • The exam is scheduled for the 19th at 8 AM.

    • Important to confirm the exam location in advance to avoid getting lost.

    • The exam carries a weight of 45% of the overall course assessment.

    • Caution: This exam will be more challenging than the mid-semester exam.

    • The scope of the exam covers the entire course material, including prior lectures.

    Course Evaluations

    • Course evaluations are open for one more week; student participation is encouraged.

    • Evaluations help educators assess teaching effectiveness and provide feedback for improvement.

    • Positive feedback also benefits tutors who are the first point of contact for students.

    Review of Key Concepts

    Principles of Science

    • Psychology is presented as an empirical science, involving measurement and theory development.

    • Importance of skepticism in evaluating results:

      • Skepticism implies questioning results and considering alternative explanations until substantial evidence supports conclusions.

    • Introduced tentativeness in reporting findings:

      • Researchers express findings with caution, using phrases like "may cause" instead of asserting causal relationships definitively.

    • The role of openness in research:

      • Sharing methods, data, and software used for analyses promotes error correction and transparency.

    • Emphasis on anti-authoritarianism in science:

      • Results should be scrutinized regardless of the laurels of the researcher or their accolades.

    Research Methodology

    • Not just a statistics course but focused on research methods critical for proper experimental design.

    • Types of Designs:

      • Experimental Designs: Allow manipulation of variables and the establishment of causal relationships through random allocation (independent vs. repeated measures).

      • Observational Studies: Quasi-experimental designs cannot establish causality and are subject to confounding factors.

    • Limitations of observational studies in asserting cause-effect relationships.

    Measurement Scales

    • Four measurement scales:

      • Nominal: Categorical data (counts of occurrences).

      • Ordinal: Ordered categories without defined intervals.

      • Interval: Defined intervals without a true zero (e.g., temperature).

      • Ratio Scale: Defined intervals with a true zero point (e.g., weight).

    • Knowledge of different data representation methods is crucial, especially in interpreting graphical data.

    Data Interpretation in Exam

    • Understanding of concepts like central tendency (mean, median, mode) and variability (variance, standard deviation) is essential.

    • Familiarity with normal distribution, z-scores, and their interpretation is vital for exam preparedness.

      • Example problem: Calculating IQ scores that bound the middle 95% based on a mean of 100 and standard deviation of 16:

        • Cutoffs established using z-tables (1.96 for 95% confidence).

    • Tips for questions regarding graphs include reading instructions carefully to ensure the correct data representation is utilized.

    Understanding Hypothesis Testing

    • Hypothesis testing is central in determining the significance of findings.

      • Null Hypothesis (H₀): Assumes no effect or difference.

      • Alternative Hypothesis (H₁): Represents the effect or difference researchers seek to establish.

    • Test types to differentiate:

      • Z-tests and T-tests based on given population parameters (considering known vs. unknown population standard deviations).

    • Understanding Type I (false positive) and Type II (false negative) errors, the significance levels (p < 0.05), and their consequences in research outcomes.

    • Statistical power is the likelihood of detecting a true effect.

    T-Tests and Confidence Intervals

    • Familiarity with various t-tests (e.g., single sample, paired samples, independent groups), and when to apply them is crucial.

    • Understanding of confidence intervals and their application for hypothesis testing (whether the interval contains the null hypothesis).

      • Example: Calculating confidence intervals for mean CO₂ levels.

    • Knowledge of effect sizes (Cohen's d) that help interpret the magnitude of findings beyond mere significance.

    Exam Format & Preparation Strategies

    • The exam format includes 44 multiple-choice questions over 2 hours.

    • Resources allowed include an unmarked dictionary, formula sheet, and statistical tables (provided during the exam).

    • Recommendations for effective exam preparation include:

      • Practice with past questions and develop familiarity with the course materials using real exam tools.

      • Engaging in self-testing to enhance retention and understanding of the content.

      • Creating personalized quiz questions as reinforcement strategies.

    Final Notes

    • A reminder about the final quiz opening shortly.

    • Students should prepare thoroughly for the upcoming exam, ensuring they understand the logistics and material covered throughout the course.

    • Open invitation for students to seek advice regarding further psychological studies and career pathways.

    Modules

    Science as a way of knowing

    • Psychology is a diverse discipline — 53

    • The unifying qualities of all psychologist disciplines is that all psychologists try to understand behaviour using the methods of science

    • Epistemology — branch of philosophy that is concerned with the nature and scope of knowledge

    • Depending on how knowledge is acquired, it may reflect real understanding about the world or it could embody misinformation

    Acquiring Knowledge

    • Personal experience — people experience things and all these experiences contribute to our knowledge of the world. This is problematic however as it is only evaluated by you and is open to influence from your own biases.

    • Authority — people appeal to authority in order to live their lives. Typically, one can verify if these people are truly authorities, however the authority may not be an expert in their discipline

    Scientific Method

    How knowledge is gained

    • Logic — reason through problems to generate new knowledge, such as solving a maths question with a maths

    • Empiricism — gain knowledge through careful and objective observation (seeing, hearing, touching, etc)

    • Rational — formulation of hypotheses and theories

    Main features

    • Systematic observation

    • Critical analysis of data, hypotheses, and theories

    • Tentative acceptance of hypotheses and theories

    • Openness and independence from authority

    Theory, experiments and statistics

    Goals for scientific research

    1. Describe a behaviour — if we want to understand any type of behaviour; describe it in detail and give the conditions under which it occurs on however many levels

    2. Explain behaviour — why does the behaviour occur? How do environmental factors affect the behaviour, how does the presence of other people affect behaviour

    3. Predict behaviour — want to know when the behaviour occurs. what are the specific cognitive, emotional, social, or environmental conditions

    4. Deep understanding — if we can control the behaviour, we can identify and manipulate the critical factors that promote or discourage a particular behaviour

    Theory

    • Ideas about how nature works in psychology theories explain why behaviour occurs the way it does

    • A fully formed theory fulfills all four goals

    • A psychological theory is a precise statement of how events in the world affect behaviour specifically

      • it summarises existing knowledge on a topic

      • it outlines the relationships between the different factors involved

      • explains the phenomenon of interest most importantly a theory generates specific predictions about the outcomes of situations

    Hypothesis

    • More focused and more tentative than a theory

    • Tales a theoretical claim and applies it to a specific setting the hypotheses are more focused than the theory the more specific instances are found to support the repeated study

    • In a typical formal experience, two hypotheses are proposed

      • null hypothesis — noted as each sub zero and the other is called the alternative higher hypothesis noted as some one. statement that there is no difference between the groups we are comparing or that there is no systematic change in one variable that is tied to another variable

        • example - relationship between smoking and health is that there is no relationship between the amount people smoke and their health

      • alternative hypothesis — a statement that there is a difference between the groups we are comparing or that there is a systematic change in one variable that is tied to another variable

        • example - relationship between smoking and health is that there is a relationship probably that the more people smoke the less healthy they will be

        • a drawback is that they cannot make an exact prediction

    Statistics

    • Formal mathematical procedures that allow us to decide which of the two hypotheses to favour

    • Allow us to rule out chance as a possible reason for the pattern of results

    • A test of whether or not chance can explain the observed differences between the groups.

    Principles of Science

    • The best method we have for generating knowledge about the universe, nature, and human behaviour is the scientific method.

    • The scientific method generates knowledge based on evidence. Faith-based knowledge and morality-based knowledge are examples of knowledge that are not generated through the scientific method.

    • Scientific claims, hypotheses, and theories are all based on evidence.

    Objectivity

    • Evidence, when offered to support a claim or hypothesis, must be observable by any person.

    • Offering your personal thoughts or feelings as evidence is not acceptable because you are the only person who can observe them.

    • Therefore, in order to provide scientific evidence you have to be creative and think of ways to make your observations objective.

      • For example, a recent CNN article reported that some people’s phobia of flying is made worse when air disasters are reported in the media. To support that claim scientifically, we need to provide objective evidence that phobics are more anxious (which is a mental state not readily observable) than non-phobics when flying after seeing a news report of an air accident. We could measure heart rate, sweaty palms, or breathing rate, all of which are objective and tend to vary with levels of anxiety. These forms of evidence are more credible than simply asking a person to report how they feel because physiological responses are observable and measurable by anyone.

    Skepticism

    • Science’s principle of skepticism requires that claims must be backed up with evidence and that this evidence must be carefully and critically evaluated.

    • When considering a claim that someone has made, your reflex skeptical response should be, “show me the evidence” and/or “let me see” and/or “let’s take a look”.

      • For example, the claim that heavier objects fall to the ground faster than lighter objects may sound intuitively correct without much reflection. However, we don’t know whether or not that claim is correct until we scrutinise it skeptically. We can thank Galileo for conducting his classic experiments to disprove that idea. Galileo’s skeptical approach forced him to test the claim, rather than accepting it without evidence. Galileo’s main finding was that two objects of different mass, dropped simultaneously from the same height in a vacuum, will indeed reach the ground at the same time. This is another instance revealing individual intuition as an unreliable source of knowledge. Imagine a feather and a boulder dropped from the same height in a vacuum. They will indeed reach the ground at the same time. The key is that they are dropped in a vacuum, which removes the effects of wind resistance. This is difficult to visualise since we rarely observe objects moving in a vacuum.

    3. Openness/open-mindedness

    • When reporting their observations, scientists are required to describe the conditions under which these observations were made.

    • This includes exactly how the measurements were taken, who the participants were, and any other details relevant to the methods of acquiring the evidence.

    • It is imperative that another investigator reading the description can adequately reproduce the conditions of your observations so they can see for themselves.

    • The standard to which you should strive is to be able to report your observations so objectively that even your enemy would have to agree with you (Agnew & Pyke, 2007).

    • When different observers agree, their observations are said to be reliable.

    • When investigators ask about inter-rater or inter-observer reliability, they are asking whether different observers agree about the same observation.

    4. Tentativeness

    • Scientists are never 100% certain of any finding, because they know that new evidence may come along that will force them to revise their conclusions or discard them altogether.

    • This is a difficult characteristic of science for the general public and new scientists to accept.

    • Why shouldn’t a well-executed study give the definitive answer to a question? Well, any study is only as good as the available theories, technology, and evidence.

    • In general, scientists accept that research findings are rarely 100% clear-cut and that ambiguity comes with the territory. Patience is required as the process of science weeds out erroneous conclusions and reveals the correct ones.

    5. Independence from authority

    • The phrase, “because I said so” does not constitute scientific evidence. Solid, carefully collected evidence is the only authority in the scientific method.

    • Therefore, claims made from a source, no matter how reputable, must be supported by evidence.

    • Even then we interpret that evidence skeptically, evaluating how strongly it supports a claim, whether there were any errors made when collecting the evidence, and so on.

    Scientific Process

    Media vs. Science

    • Contentious topics in interviews often feature “gotcha” moments → entertaining but unproductive for truth-seeking.

    • Scientists use measured language (“balance of evidence supports…”, “unable to replicate…”) → reflects reality of knowledge building.

    Nature of Scientific Process

    • Slow, methodical, involves blind alleys & dead ends.

    • Represented as a flowchart: idea → hypothesis → prediction → study → data collection → analysis → conclusion → replication.

    Steps in the Process:

    1. Idea/Theory: Explanation of how something works (e.g., spaced learning is more effective than cramming).

    2. Hypothesis: Testable chunk of the theory.

    3. Prediction: Specific, measurable outcome (e.g., spaced practice improves learning more than cramming).

    4. Study Design:

      • Critical for reliable results.

      • Must follow principles of objectivity and empiricism.

    5. Data Collection:

      • Questionnaires, interviews, online responses, etc.

      • Raw data initially disorganized.

    6. Data Organization & Description:

      • Summarize and prepare data for analysis.

    7. Inferential Statistics:

      • Generalize from sample to real-world population.

      • Decide if hypothesis is supported.

    8. Re-evaluation:

      • If unsupported → modify, retest, or discard hypothesis.

      • If supported → publish results.

    9. Replication:

      • Exact or conceptual replications.

      • Builds confidence if findings hold.

      • Failure to replicate reduces confidence → refine or discard hypothesis.

    Outcome:

    • Supported hypotheses may evolve into accepted theories.

    • Process is iterative and self-correcting.

    image.png

    image.png

    Ethics in Psychological Research

    Definition:

    • Ethics = guidelines/principles for moral & just treatment of others.

    • In research: focus on how researchers treat participants, run studies, and conduct themselves.

    • Based on Universal Declaration of Ethical Principles for Psychology.

    Four Guiding Ethical Principles

    Respect for the Dignity of Persons and Peoples

    • Value, acknowledge, and treat all people equally regardless of origin, beliefs, or identity.

    • Special care for vulnerable groups (e.g., children, minorities).

    • Ensure equal opportunity to be seen, heard, acknowledged.

    • Protect anonymity and confidentiality.

    • Example: Evolving gender data collection → beyond male/female binary to non-binary & open responses to show respect and inclusion.

    Competent Caring for the Well-Being of Persons and Peoples

    • Aim for research findings to enhance well-being.

    • Conduct research to benefit participants or at least cause no harm.

    • Plan for and mitigate possible harm.

    • “Competent” caring → researchers must have proper training for tools/tests used.

    • Case Study – Tuskegee Syphilis Study (1932–1972):

      • 600 African-American men (399 with syphilis) misled, denied treatment (penicillin available from 1947).

      • No informed consent → participants not told study details.

      • Ethics committees now prevent such abuse (require informed consent, minimal/no harm).

    Integrity

    • Conduct research with objectivity and honesty, free from self-interest or outside influence.

    • Avoid exploitation and bias in reporting.

    • Example – Grossarth-Maticek Research:

      • Linked personality types to cancer/heart disease.

      • Allegations of data falsification (e.g., reclassifying participants, duplicating data).

      • Funded by tobacco companies → possible conflict of interest.

      • Findings not replicated → likely due to falsified data.

    Responsibility to Society

    • Psychology should contribute to understanding the human condition and improving well-being.

    • Researchers must:

      • Understand and follow ethical conduct.

      • Reflect on and update research practices to stay ethical.

    Stanford Prison Experiment (1971)

    • Conducted at Stanford University by Philip Zimbardo.

    • Setup: Mock prison in psychology building; participants randomly assigned as prisoners or guards.

    • Payment: $15/day.

    • Role of Zimbardo: Prison superintendent.

    • Informed Consent Issues:

      • Participants given vague info, not told specifics (e.g., surprise home arrest, strip search).

      • Guards encouraged to be aggressive (no physical harm) to instill fear.

    • Ethical Concerns:

      • Prisoners who wanted to leave were told they could not.

      • Planned 2 weeks → ended after 6 days when an outsider raised concerns.

      • Zimbardo admitted losing objectivity due to his role.

      • Formal debriefing not until years later.

    Variables in Research

    Key Variables

    1. Independent Variable (IV):

      • Manipulated by the experimenter (e.g., age groups, drug type).

      • Sometimes cannot be directly manipulated (e.g., age).

    2. Dependent Variable (DV):

      • Measured outcome; depends on IV.

      • Example: Studying mental ability vs. age → IV = age, DV = IQ score.

    Unwanted (Extraneous) Variables

    • Variables that contaminate results and obscure the relationship between IV and DV.

    1. Situational Variables:

      • Environmental factors (temperature, noise, lighting, time of day).

      • Can affect all participants differently and unpredictably.

    2. Individual Differences:

      • Natural variations between people (height, weight, motivation, anxiety).

      • Combine with situational variables to increase variability.

    3. Measurement Error:

      • Inconsistencies in recording data (e.g., misreading ruler, stopwatch error).

      • Linked to experimenter’s attention, training, or bias.

    • Effect:

      • Random variability can weaken or completely hide real relationships.

      • Example: Teaching method study → with unwanted variables removed, clear difference; with them present, results less consistent.

    Confounding Variables

    • Definition: Variables that vary systematically with IV, providing an alternative explanation for results → prevents establishing causation.

    • Example:

      • Testing two drugs on rats: all Drug A rats tested in the morning, all Drug B rats in the afternoon.

      • Time of day becomes a confounding variable.

    Controlling Confounding Variables:

    1. Keep constant: Test all groups under same conditions (e.g., all in morning).

    2. Counterbalance: Spread variations evenly (e.g., half of each group in morning, half in afternoon).

    True Experimental Designs — Key Features

    1. At least two levels of the Independent Variable (IV)

      • One level can be absence of treatment (control group/placebo).

      • Other = presence of treatment (experimental group).

    2. Random Assignment

      • Equal chance of being in any group (coin flip, random number table, etc.).

      • Purpose: Distribute extraneous factors (motivation, ability, age, health) evenly so they don’t vary systematically with the IV.

    3. Control for Confounding Variables

      • Prevent alternative explanations for observed differences between conditions.

    Independent Groups Design (Between-Subjects)

    • Structure:

      • Two or more groups, each experiencing a different level of the IV.

      • Participants randomly assigned to one group.

      • Experimental group receives IV; control group does not.

    • Example — Tickling Experiment:

      • IV: Who does the tickling (robot vs. self).

      • DV: Ticklishness rating (1–10).

      • 32 participants → random assignment into 2 groups of 16.

      • Robot group tickled by robot; self group tickled themselves using robot arm.

      • Results: Robot tickle group generally rated higher ticklishness, but some overlap.

    • Drawbacks:

      • High variability from individual differences (e.g., natural differences in ticklishness).

      • Requires more participants than repeated measures.

    Repeated Measures Design (Within-Subjects)

    • Structure:

      • Same participants tested in all conditions of the IV.

      • Fewer participants needed (e.g., 16 instead of 32 in tickling example).

      • Reduces random variability due to individual differences.

    • Order Effects:

      • Experiencing one condition may influence responses in the next.

      • Controlled via counterbalancing:

        • Half participants → Condition A then B.

        • Half participants → Condition B then A.

    • Example — Tickling Experiment:

      • All 16 participants experienced both robot and self tickling.

      • Order counterbalanced to control for order as a confound.

      • Results showed less spread in data → reduced variability from individual differences.

    When Repeated Measures is NOT Suitable

    • If one condition permanently changes participant responses (e.g., learning effects).

    • Example: Comparing teaching methods for statistics → learning from first method influences performance in second method.

    Summary Table

    Feature

    Independent Groups (Between)

    Repeated Measures (Within)

    Participants per Condition

    Different people in each group

    Same people in all conditions

    Randomization Purpose

    Equalize groups

    Control order effects

    Main Advantage

    No carryover/order effects

    Reduces variability from individual differences; fewer participants

    Main Disadvantage

    More participants needed; variability from individual differences

    Risk of order/carryover effects

    Key Control Method

    Random assignment

    Counterbalancing

    Observational Designs

    Two types covered:

    1. Correlational Design

    2. Quasi-Experimental Design

    Why not use true randomized experiments?

    • Sometimes impractical or unethical (e.g., cannot assign people to start smoking).

    Example: Smoking & Health

    Hypothesis: Smoking is bad for health.

    Operational definition of health: Number of doctor visits per year.

    Prediction: More smoking → more doctor visits.

    Correlational Study

    • Method:

      • Observe people who already smoke/don’t smoke.

      • Measure:

        • Cigarettes smoked/day (IV)

        • Doctor visits/year (DV)

      • Example: Ask 200 people about both variables.

      • Create scatter plot: each point = one person’s data.

    • Observation: Positive relationship — heavier smokers see doctors more often.

    • Key point: No variables manipulated → just observation.

    • Limitation: Cannot conclude causation.

    Quasi-Experimental Design

    • Similar to true experiment, but no random assignment.

    • Method:

      • Form groups based on pre-existing characteristics (e.g., smoking habits).

      • Example:

        • Light smokers: 0–10 cigarettes/day

        • Heavy smokers: 20–30 cigarettes/day

        • DV = doctor visits/year

      • Plot results: heavy smokers have more doctor visits.

    • Key difference from true experiment: Grouping based on existing traits, not random assignment.

    • Other uses:

      • Age (e.g., young vs. older adults)

      • Health conditions (e.g., high blood pressure vs. normal)

      • Income levels (e.g., wealthy vs. middle class)

    Causation?

    • From correlational or quasi-experimental designs → No causal conclusion possible.

    • Unknown third variables could explain results (e.g., drinking, diet).

    Conditions for Causal Inference

    (Only true randomized experiments can fully meet all three)

    1. Relationship: Regular & reliable changes in one variable associated with changes in the other.

    2. Time Order: Cause precedes effect.

    3. No Other Explanations: Alternative causes ruled out (via randomization).

    Summary

    • Observational Designs = Correlational studies + Quasi-experiments.

    • Correlational study: Measures 2+ variables in same group, examines relationship.

    • Quasi-experiment: Like true experiment but without random assignment.

    • Limitation: Lack of randomization → cannot infer causation.

    • Only true experiments (with randomization) allow causal claims.

    Measurement in research

    • type of data we collect determines how we can analyse and test hypotheses

    • measuring = assigning numbers to observations

    Quantitative vs qualitative

    • quantitative: numbers have meaningful numeric value (e.g. ml milk, cm height)

    • qualitative: categories with no numeric meaning (e.g. eye colour, political leaning, countries visited)

    Numbers used as labels for qualitative

    • numbers assigned to categories are only labels (e.g. 1 = brown eyes, 2 = blue)

    • numbers do not imply magnitude

    Discrete vs continuous

    • discrete: cannot be subdivided into meaningful smaller units, no values in between (e.g. number of chess pieces lost)

    • dichotomous: special type of discrete with only two possible values (e.g. heads/tails, yes/no)

    • continuous: infinite values between points (e.g. milk measured 250, 250.2, 250.57 ml)

    Levels/scales of measurement (lowest to highest)

    Nominal

    • qualitative only

    • numbers do not show magnitude or order

    • example: finished race vs not finished (yes/no)

    • dichotomous nominal possible

    Ordinal

    • ordering is implied

    • but distance between ranks unknown/equal or not equal

    • example: 1st, 2nd, 3rd place; we know order but not how much faster one is over another

    Interval

    • equal intervals between scale points

    • can have negative values

    • example: seconds slower/faster than club record (e.g. –2, –1, +3)

    Ratio

    • interval properties plus absolute zero point (cannot go below zero)

    • can talk about ratios (twice as much)

    • example: actual swim time in seconds, grams, millilitres

    Choosing level of measurement

    • consider: magnitude? equal intervals? absolute zero?

    • none → nominal

    • magnitude only → ordinal

    • magnitude + equal intervals → interval

    • magnitude + equal intervals + absolute zero → ratio

    Why level of measurement matters

    • determines what statistical analyses can be used

    • aim for highest level possible

    • can convert down (ratio → ordinal) but not up (ordinal → ratio)

    Role in research design

    • selecting appropriate measurement level is critical to valid data analysis and hypothesis testing

    Types of measures in psychology

    • three main kinds:

      • self-report: questionnaires, surveys, interviews (what people say they think/do)

      • behavioural: what participants actually do (e.g. aggressive acts counted, reaction time, bar presses in rats)

      • physiological: body/brain responses (e.g. heart rate, hormones, blood flow)

    Two major issues in measurement

    • reliability

    • validity

    Reliability

    • stability/consistency of measurement

    • example: solid ruler produces near-identical results each measurement

    • unreliable example: flexible/rubber ruler = inconsistent results

    Types of reliability

    • test-retest reliability: two similar test versions given at different times, compare patterns

    • internal consistency: compare scores on first half vs second half of test

    • interrater reliability: do different raters score the same behaviour similarly?

    Validity

    • the degree to which the measure actually measures what it claims to measure

    Types of validity

    • face validity:

      • does the measure look like it is measuring the right thing?

      • subjective judgement (e.g. math test should contain math problems)

    • predictive validity:

      • does the measure predict what it is supposed to predict?

      • example: ATAR predicting university performance

    • construct validity:

      • central to psychology

      • does the measure relate to other measures in ways theory says it should?

      • example: IQ test performance correlates with real-world indicators of intelligence (jobs, articulation, academic performance)

    Summary

    • psychologists use self-report, behavioural, and physiological measures

    • every measure should be evaluated on reliability (consistency) and validity (does it measure what it should measure)

    Organising raw data

    • first step: organise visually to get a sense of distribution

    • can use tables or graphs

    Frequency tables (ungrouped)

    • used when range of scores is small and manageable

    • steps:

      • list every possible score from lowest to highest

      • tally occurrences

      • convert tallies to frequencies

    • can convert frequencies to percentages

    Grouped frequency tables

    • used when range of scores is large (too many rows if ungrouped)

    • create equal-sized score intervals (rule of thumb: produce 10–20 rows)

    • tally scores into intervals

    • benefit: more manageable summary

    • drawback: loss of precision (cannot see exact individual values)

    Stem and leaf plots

    • middle form between table and graph

    • stem = higher unit (e.g. tens)

    • leaf = final digit

    • preserves individual scores while showing distribution visually

    • longest leaf row = interval with most scores

    Box and whisker plots

    • visual summary of range and quartiles

    • whiskers = lowest and highest scores

    • box = middle 50% (interquartile range)

    • middle of box = median (50th percentile)

    • box edges = 25th and 75th percentiles

    Bar graphs

    • used for qualitative/nominal data

    • bars are spaced apart

    • X axis = categories

    • Y axis = frequencies

    • Y axis scale must accommodate highest frequency

    Histograms

    • used for quantitative data

    • bars touch (no spacing)

    • can be standard or grouped

    • grouped histogram uses same intervals as grouped frequency table

    Frequency polygons

    • histogram turned into line graph

    • plot a point for each interval frequency then connect points

    • allows overlay comparison (e.g. male vs female actual vs ideal weight)

    Choosing table/graph format

    • choose based on data type (qualitative vs quantitative)

    • choose grouping based on range of scores (larger ranges require grouping)

    • aim for clearest visual snapshot of distribution

    Purpose of percentiles

    • used to understand how a score compares to the rest of the data set

    • percentile = % of scores at or below a given score

    • helps interpret standing relative to others

    Steps for computing percentiles (individual scores)

    • start with raw data (hard to interpret directly)

    • rank order scores (highest to lowest or lowest to highest)

    • determine n (total number of observations)

    • calculate simple frequency (SF) = how many times each score occurs

    • calculate cumulative frequency (CF) = number of scores at or below the given score

    • formula: percentile = (CF ÷ n) × 100

    • example: score of 14 with CF = 13 in dataset of n = 20 → percentile = 65

    Grouped frequency distributions

    • used when too many individual score rows would be required

    • scores are grouped in equal sized intervals (e.g. 0–4, 5–9)

    • simple frequency = number of scores within each interval

    • loses precision (exact values unknown, only interval counts known)

    Percentiles in grouped data

    • CF refers to total number of scores at or below the upper bound of the group

    • same formula used: (CF ÷ n) × 100

    • meaning of percentile here shifts: % of scores equal to or below the highest score in that group

    • example: group 25–29 with CF = 21 in dataset of n = 25 → percentile = 84

    Summary definitions

    • n = total number of scores in dataset

    • simple frequency = how many times a score occurs (or count within group)

    • cumulative frequency = number of scores equal to or below that score (or group upper limit)

    • percentile = percent of all scores at or below a particular score (or group upper limit)

    Once you have designed a study, the next step is to collect data and that involves measuring are variables - the characteristics of interest. After the data is collected, the job of a researcher is to sift through the data, clean it up and present it so it is understandable at a glance and more amenable to statistical tests and analysis.

    In this module, we looked at both measurement and ways to summarise, organise and display data. Below is a short snapshot of each topic.

    1. Levels of measurement: nominal; ordinal; interval and ratio

    Summary of levels of measurement table

    2. Reliability and validity

    • Psychologists employ self-report, behavioural, and physiological measures in research.

    • Every measure can be assessed on its reliability and its validity.

    • Reliability refers to the stability or consistency of a measure.

    • Validity refers to the extent to which a measure is assessing what it is meant to assess.

    3. Presenting data

    • Tables and graphs are used to provide a quick visual snapshot or summary of the data.

    • Choose the one that tells the story most effectively.

    • Consider whether to use grouping or not based on the range of scores in your data set.

    • Consider which graph is appropriate based on whether you have qualitative or quantitative data.

    4. Percentiles

    • N refers to the number of scores or observations in our data set.

    • The simple frequency of a score refers to how many times that score appears in the data set.

    • The cumulative frequency of a given score refers to the number of scores in the set that are equal to or less than that score.

    • A percentile is the percent of all scores at or below a given score in the set.

    Shape of distributions

    • the classic “normal curve” is symmetrical and bell-shaped

    • tails = ends/extremes of distribution

    • peak = highest point = most frequent score

    Skew (symmetry)

    • skew = lack of symmetry in distribution

    Positive skew

    • right tail extends further (tail goes toward larger scores on X axis)

    • example: income (few very high values → long right tail)

    Negative skew

    • left tail extends further (tail goes toward lower scores)

    • example: exam scores when most students do well and only a few do poorly

    Kurtosis (spread + peakedness)

    • kurtosis describes whether the distribution is tall/thin or flat/broad

    Leptokurtic

    • tall and narrow

    • most scores cluster tightly around centre

    • tails may extend far

    • memory trick: “leap” (tall and thin)

    Platykurtic

    • flatter, squashed

    • more scores spread out across a wider middle range

    • memory trick: “plateau” (flat top)

    Shape and central tendency

    • mode = score with highest frequency = top of curve

    • if perfectly symmetrical (normal): mean = median = mode

    • if skewed: mean and median move toward tail (direction of skew)

    Using mean/median/mode to detect skew

    • positive skew: mean and median > mode (mode smallest)

    • negative skew: mean and median < mode (mode biggest)

    Summary

    • skew = symmetry of distribution

    • kurtosis = peakedness/spread of distribution

    • both concepts matter for many statistical analyses and interpretations

    Measures of central tendency

    • statistics that identify the most representative / typical score in a dataset

    • goal = find the central point of a distribution

    Three measures

    • mode

    • median

    • mean

    Mode

    • most frequently occurring score

    • in graphs = tallest bar / highest point

    • can be unimodal, bimodal, or multimodal

    • advantage: not affected by extreme scores

    • example use: elections (party with most votes wins = modal response)

    • drawback: can have multiple modes → not always helpful for a single “typical” score

    Median

    • the middle score once data are ordered lowest → highest

    • divides dataset into top 50% and bottom 50%

    • = 50th percentile

    • odd n → exact middle score

    • even n → midpoint between two middle scores

    • advantage: not affected by extreme scores

    • example use: median house price (better indicator when extreme values exist)

    Mean (average)

    • sum of all scores ÷ number of scores

    • influenced by every score

    • disadvantage: heavily affected by extreme scores

    • can think of as the “balancing point” of distribution (extreme scores pull mean towards them)

    Summary

    • mode: unaffected by outliers, but may have many modes

    • median: unaffected by outliers, good for skewed distributions

    • mean: familiar and widely used, but distorted by extreme values

    Variability

    • how spread out scores are in a dataset

    • two sets can have same mean but very different variability → affects interpretation (e.g. teaching a class)

    Three measures of variability

    • range

    • variance

    • standard deviation

    Range

    • highest score minus lowest score

    • simplest index of variability

    • unstable because one extreme score can change it dramatically

    Variance

    • based on deviation scores

    • deviation score = individual score minus mean

    • deviations can be positive or negative → they sum to zero

    • to remove negative signs, deviations are squared

    • variance = mean of squared deviation scores

    • notation: SD² or sometimes SS/N

    • large variance = greater spread

    Standard deviation

    • square root of variance

    • expressed in original units (not squared units)

    • easier to interpret than variance

    • indicates typical distance of scores from the mean

    Summary

    • range: easy, but distorted by outliers

    • variance: stable, uses all scores, but expressed in squared units

    • standard deviation: best interpretation of spread, uses all data, in original units of measurement

    In this module,  we covered the different shapes of distributions and measures of central tendency and variability.

    Shape of distributions

    We  considered how to evaluate the shape of a distribution of scores in terms of its symmetry and spread, starting with a bell shaped curve, called the normal curve.

    2 statistical properties of distribution shape are:

    • Skew – the symmetry of the shape

    positive skewnegative skew

    Drawing of the symmetric, bimodal, positively and negatively skewed data shapses.
    • Kurtosis – the spread and peakedness of the shape

    leptokurtic

    platykurtic

    Picture of kurtotic data distributions

    Source: Dorland, W (2012) Dorland's Illustrated Medical Dictionary 32nd Edition, Saunders/Elsevier. License: Statutory Educational License

    Central tendency

    • The three measures of central tendency are the mode, the median, and the mean.

    • The mode is the most frequently occurring score in a distribution. In a frequency distribution, the mode is represented by the tallest bar in a bar graph or the highest point in a frequency polygon, for example.

    • The median is the middle most score after the data have been arranged from lowest to highest. The median divides the data set in half so an equal number of the scores are above and below it.

    • The mean is another word for the average and is the sum of all the scores divided by the number of scores you have. We summarise it with this formula:

    Table of mode, median and mean advantages and disadvatnage

    Variability

    • The range is the simplest measure of variability. It is the difference between the highest score and the lowest score in the data set. The range is easy to compute but it is affected by extreme scores in the data.

    • The variance takes every score into account and therefore is more stable than the range. The formula for variance is  ;  It is expressed in squared units and so it is not an intuitive description of variability in a data set.

    • The standard deviation has all the features of variance and the added benefit of being expressed in the original units of the data set and so it is an intuitive description of variability in a data set. The formula for standard deviation is  ;

    Standard scores (z-scores)

    • used to express how far above or below average a score is

    • allows comparison of different measurements on a common scale

    Purpose

    • compares an individual score to the distribution it came from

    • answers: “how many standard deviations from the mean is this score?”

    • converts different metrics into the same unit (standard deviations)

    Definition

    • z-score = (raw score – mean) ÷ standard deviation

    • sign tells direction: + = above mean, − = below mean

    • magnitude tells how far from the mean in SD units

    Examples

    • mean response time = 1.25 sec, SD = 0.25 sec

    • Bruce: 1.75 sec → +2 SD → slower than average

    • Nancy: 1.00 sec → –1 SD → faster than average

    Why they are useful

    • allow precise comparison across different tests/attributes

    • baseline comparisons like we informally do daily (smart, fast, musical, etc.)

    • can compare “apples vs oranges” (scores measured differently)

    Applied example

    • Larry’s marks:

      • maths: 65% (class M=50, SD=10) → z = +1.5 (well above average)

      • music: 75% (class M=60, SD=15) → z = +1 (above average but not as much)

    • interpretation: Larry stands out more in math relative to his peers than in music

    Summary

    • z-scores show exact position of score within distribution

    • reported in standard deviation units

    • enable cross-comparison across different measures and scales

    Standard normal distribution

    • a normal distribution expressed in z-scores

    • very important in psychology because many traits approximate normality (IQ, memory scores, etc)

    Properties of normal distributions

    • unimodal and bell-shaped

    • symmetrical

    • tails approach x-axis but never touch

    • specific predictable % of scores fall within set SD units of mean

      • 50% below mean, 50% above mean

      • 34.13% between mean and +1 SD (and same between mean and –1 SD)

      • 13.59% between +1 and +2 SD

      • 2.14% between +2 and +3 SD

    Transforming any normal distribution to standard normal

    • different normal distributions can have different means and SDs

    • when raw scores are converted to z-scores → distribution becomes “standard normal”

    • standard normal always has:

      • mean = 0

      • SD = 1

    • mean becomes 0 because subtracting mean from every score centres it

    • SD becomes 1 because we divide by SD in the z-score formula

    z-tables (tables of areas)

    • show area under the standard normal curve for given z-scores

    • table usually only lists positive z-scores (symmetry means negative side is identical)

    • table columns show:

      • area between mean and that z-value

      • area above that z-value in tail

    Why this matters

    • lets us calculate exact percentage of population within any standard deviation range

    • lets us compare different measures using a common scale (z) and known proportions under curve

    Using z-tables

    • z-tables let us find % of scores below, above, between values, or find a raw value from a percentile

    • ALWAYS draw a diagram first

    • remember: normal distribution is symmetrical → 50% below mean, 50% above mean

    Example 1: percentile rank of a raw score

    • IQ mean = 100, SD = 15

    • what is percentile rank of IQ = 115?

    • convert to z: (115−100)/15 = 1

    • area between mean and z=1 = 34.13%

    • 50% below mean = 84.13%

    • percentile rank ≈ 84th percentile

    Example 2: % of scores between two values

    • IQ mean = 100, SD = 15

    • what % is between 100 and 115?

    • z = 1

    • area from mean to z=1 = 34.13%

    • answer: 34.13%

    Example 3: find the raw score that cuts off the top 5%

    • IQ mean = 100, SD = 15

    • top 5% = area beyond z

    • area between mean and boundary = 50% − 5% = 45%

    • find z where table area is ~0.4495 → z = 1.64

    • convert z back to raw: X = M + z×SD = 100 + 1.64×15 = 124.6

    • top 5% have IQ > 124.6

    Example 4: find raw scores cutting off top and bottom 2.5%

    • IQ mean = 100, SD = 16

    • area in tail 2.5% → z = ±1.96

    • lower boundary: X = 100 − 1.96×16 = 68.64

    • upper boundary: X = 100 + 1.96×16 = 131.36

    • 2.5% have IQ < 68.64 and 2.5% have IQ > 131.36

    Summary

    • combine: z-scores + standard normal + z-table to find:

      • percentile rank of raw score

      • % between values

      • raw score for a given percentile

      • boundary cut-offs for any tail area

    • ALWAYS sketch the area and use symmetry to avoid mistakes

    In this module,  we introduced z-scores, z-tables and the normal distribution. We also looked at how z-tables can be used to work out specific areas under the standard normal curve in order to answer questions like:

    1)    What percentage of the scores fall below a given point?

    2)    What percentage of the scores fall above a given point?

    3)    What percentage of the scores fall between two values?

    4)    What is the actual value based on a percentile?

    Here is a short summary of what we covered in this module.

    z-scores

    • z-scores are a way of reporting a score’s precise position within a distribution.

    • They indicate a score’s distance from the mean in standard deviation units.

    • z-scores are a way of comparing performances that are measured in different units. In common terms, z-scores allow us to compare apples and oranges.

    The normal distribution

    • Many of the attributes we measure in psychology tend to distribute as a normal distribution.

    • Normal distributions are unimodal and bell shaped.

    • The standard normal distribution has a mean of 0 and a standard deviation of 1.

    • We can use tables to work out specific areas under the curve.

    z-scores and percentiles

    We can use z-scores, the z-tables and the standard normal distribution to:

    • determine the percentile rank of specific scores

    • determine the specific score at a given percentile rank

    • find out the percentage of scores that fall within a given range

    • find the boundary values that mark off specific ranges in the distribution.

    TIP!

    To avoid careless mistakes, draw a picture of what you are looking for and remember that the normal distribution is symmetrical with 50% of the scores below the mean and 50% above it.

    Correlation (overview)

    • describes the relationship between two variables

    • used constantly in psychology (brain activity vs behaviour, study time vs test score, etc)

    Scatterplots

    • primary visual display for correlations

    • plots two variables on one graph

    • shows pattern/trend (if any)

    • linear relationship → points cluster around a straight line

    • positive relationship → higher X goes with higher Y

    Pearson’s r (correlation coefficient)

    • numerical index capturing direction + strength of linear relationship

    • uses z-scores to compare how each person’s X score and Y score sit relative to their means

    Logic of correlation using quadrants

    • vertical line = mean of X

    • horizontal line = mean of Y

    • creates 4 quadrants

    • upper right + lower left = both z-scores same sign → positive cross-products

    • upper left + lower right = opposite signs → negative cross-products

    • large magnitudes = greater influence on r

    Computing Pearson’s r (conceptually)

    1. convert raw scores → z-scores

    2. multiply z-score pairs (cross-products)

    3. sum cross-products

    4. divide by N

    • result ranges from −1 to +1

    • +1 = perfect positive, −1 = perfect negative, 0 = no linear relationship

    Why z-scores matter here

    • removes units of measurement

    • allows correlating different measures on different scales fairly

    • patterns remain identical after conversion, but now we can directly compute r

    Summary

    • correlation quantifies whether above-average scores in one variable go with above or below average scores in another

    • uses z-scores + cross-products to produce a unitless index (Pearson’s r)

    • sign = direction, magnitude = strength

    Correlation calculation using spreadsheets

    • dataset: 40 participants → hours studied vs test score (%)

    • first step: check data for errors or outliers using a scatterplot

      • impossible scores removed (e.g. negative hours or test scores)

      • after cleaning → 38 participants remain

    Descriptive statistics

    • mean hours studied = 3.882

    • mean test score = 57.658

    Step 1: deviation scores

    • subtract mean from each raw score

    • gives distance of each value from the mean

    Step 2: variance and standard deviation

    • square deviation scores and sum

    • divide by N (38)

    • take square root to get SD

      • SD (hours studied) = 2.005

      • SD (test score) = 22.930

    Step 3: z-scores

    • convert each deviation score to a z-score

      (deviation ÷ SD)

    Step 4: cross-products

    • multiply each pair of X and Y z-scores

    • sum of cross-products = 36.263

    Step 5: compute Pearson’s r

    • r = Σ(zx × zy) / N

    • result: r(38) = .95

    • indicates a very strong positive linear correlation between study time and test score

    Understanding Pearson’s r

    • measures how well data fit a straight line (line of best fit)

    • higher r → points cluster closer to line

    • perfect ±1 correlation = 100% shared variance, zero error variance

    Coefficient of determination (shared variance)

    • r² = proportion of shared variance

    • for r = 0.95 → r² = 0.90 → 90% shared variance

    • meaning: 90% of variability in test scores is explained by hours studied

    Reporting

    • standard reporting format:

      r(38) = .95, p < .001, r² = .90

    • includes both correlation coefficient and proportion of shared variance

    Summary

    • correlation analysis steps:

      1. clean data (remove impossible/outlier scores)

      2. compute means and SDs

      3. convert to z-scores

      4. compute cross-products

      5. sum, divide by N → get Pearson’s r

    • r² × 100 = % shared variance (coefficient of determination)

    Meaning of r (correlation coefficient)

    • magnitude of r = strength of relationship

      • ~.20–.30 = small

      • ~.60 = strong

      • 1.0 = perfect relationship

    • r² = proportion of variance explained → increases as r increases

    • sign (positive/negative) = direction, but magnitude strength interpreted the same whether + or −

    Direction interpretation

    • positive r → variables move together (↑X → ↑Y)

      • example: height ↑ → weight ↑

    • negative r → inverse relationship (↑X → ↓Y)

      • example: BAC ↑ → driving performance ↓

    • r = 0 → no linear association

    Zero correlation could mean:

    1. truly no relationship (data points random)

    2. no variation in one variable (flat line)

    Important cautions when interpreting correlation

    • correlation detects linear relationships only

      • non-linear (e.g. Yerkes-Dodson law: arousal vs performance) → r can hide real relationship → will look like “no correlation”

      • always check scatterplot first

    Range restriction problems

    • small/narrow range of one variable → r artificially smaller (underestimation)

      • example: measuring children’s height only between ages 8.5–9.5 → hides true height-age relationship

    • truncated range (only top end or bottom end) → same issue

      • example: only sampling top OP students → hides full negative relationship between OP and GPA (lower OP = better performance)

    Outliers

    • extreme scores can massively distort r

    • scatterplot review is critical before interpreting results

    Correlation ≠ causation

    • correlation shows association, NOT cause

    • possible explanations when r exists:

      • X causes Y (e.g. increasing temperature increases pressure in sealed container)

      • Y caused by third variable (e.g. sleeping with shoes on and headaches both caused by alcohol consumption)

      • spurious coincidence (e.g. mozzarella consumption correlates with engineering doctorates)

    Summary

    • r magnitude tells strength, sign tells direction

    • r² tells proportion of shared variance

    • must consider: linearity, range restriction, truncation, outliers, third variables

    • always check scatterplot → never infer causation from correlation alone

    Introducing correlation

    • Correlation captures the extent to which above average scores on one variable, go with scores above or below average on another variable.

    • z-sores are an index of a score’s position relative to the mean of a set of scores; it removes the units of measurement for that score.

    • Cross products refers to multiplying the z-scores of one variable with the associated z-scores of the other variable.

    • The sign of the sum of the cross products indicates the direction of the relationship

    • expresses the relationship in standard form with a maximum value of +/- 1.

    Calculating correlation

    • There are several steps involved in a typical correlation analysis.

    • The main analysis to arrive at the correlation coefficient for Pearson’s r uses the formula

    • An additional analysis gives us the coefficient of determination: , also referred as the proportion of shared variance.

    • The percentage of shared variance is calculated by .

    Interpreting correlation

    Factors that influence your interpretation of a correlation coefficient:

    1. Magnitude

    2. Direction

    3. Linearity

    4. Range restriction

    5. Range truncation

    6. Extreme scores

    7. Attribution of result.

    Probability + hypothesis testing context

    • humans are poor at intuitively judging probabilities

    • inferential statistics exist to formalise probability reasoning

    Example: coin flip vs birthday paradox

    • coin flip: 50% chance

    • only ~23 people needed in a room for ~50% chance two share a birthday

    • illustrates how unintuitive probability is

    Descriptive vs inferential statistics

    • descriptive stats: describe samples

    • inferential stats: draw conclusions about population from sample

    Populations vs samples

    • population = entire group of interest (e.g. all Australians, all PD patients)

    • sample = subset of population used to estimate population characteristics

    • population parameters use Greek letters (μ = population mean, σ = population SD)

    • sample stats use Latin letters (M = sample mean, SD = sample SD)

    Probability example using marbles

    • if population composition known → can compute exact probability of selecting specific types

    • e.g. 10 red + 10 green → P(red) = .5

    • 1 red + 19 green → P(red) = .05 (rare outcome)

    Normal distribution and probability

    • can determine probability of obtaining a particular score (or score range) if distribution is normal

    • 34.13% of scores fall between mean and +1 SD (same below)

    • 95% of scores fall between ±1.96 SD

    • tails beyond ±1.96 SD = only 5% of area → very unlikely event

    Implication

    • if an observation falls in an extreme tail, it is unlikely to be due to chance

    • inferential statistics allow us to evaluate how surprising / probable an observation is given population distribution

    Summary

    • probability intuition is unreliable

    • inferential stats use probability to generalise sample results to populations

    • knowing the population distribution (particularly normal) lets us quantify how likely an observation is

    Sampling and inferential statistics

    • inferential stats = making claims about a population using a sample

    • representative sample must be random → every individual has equal + independent chance of selection

    • biased sampling → leads to misleading conclusions (e.g. polling only rich)

    Sampling error + sampling variability

    • sampling error = sample statistic differs slightly from population parameter

    • sampling variability = each sample will give slightly different values

    • even if population mean = 4, random samples may get means like 3.5, 4.5 etc

    Sampling distributions

    • if we repeatedly draw samples (e.g. sample size = 25) and compute sample mean each time → distribution of sample means forms

    • distribution of sample means becomes normal shaped

    • mean of sampling distribution = population mean

    Likely vs unlikely sample means

    • central 95% of sample means = “likely”

    • outer 5% in tails = “unlikely” (2.5% each tail)

    • means in tails are surprising if population truly has that μ

    Defining the sampling distribution fully

    To describe a distribution we need:

    1. mean

    2. SD

    3. shape

    1) Mean

    • mean of distribution of sample means = population mean (μ)

    2) SD → Standard Error of the Mean (SEM)

    • variance of sampling distribution = population variance ÷ sample size

    • take square root → SEM

    • SEM = σ / √n

    3) Shape

    • central limit theorem: sampling distribution ≈ normal if n ≥ 30 (regardless of original population shape)

    Using SEM + z-tables example

    • population: mean = 70, SD = 20, sample size = 25

    • SEM = 20 / √25 = 20 / 5 = 4

    • central 95% cut-off = ±1.96 SD

    • lower limit = 70 − (1.96×4) = 62.16

    • upper limit = 70 + (1.96×4) = 77.84

      → 95% of sample means will fall between 62.16 and 77.84

    Summary

    • sampling error + variability = sample stats differ from population stats and from each other

    • sampling distribution = distribution of all possible sample means

    • properties of sampling distribution:

      • mean = μ

      • shape = normal (if n ≥ 30)

      • SD = SEM = σ / √n

    • SEM + z-tables can estimate probability of observing a particular sample mean

    Hypothesis testing & sampling distributions

    • distribution of sample means = model of all possible outcomes if samples are drawn randomly from a population

    • shows what we’d expect if chance were the only factor

    • lets us objectively decide whether a given sample is likely or unlikely to represent that population

    Likely vs unlikely events

    • middle 95% of sample means = likely region

    • outer 5% (2.5% in each tail) = unlikely regions

    • “p < 0.05” → observed sample mean fell in one of these tails

    • “statistically significant” = sample mean is unlikely under chance (if population parameters are true)

    Example – IQ population

    • population: μ = 100, σ = 15

    • sample size = 9

    • SEM = σ / √n = 15 / 3 = 5

    • z = ±1.96 defines 95% boundaries

      • lower = 100 − 1.96×5 = 90.2

      • upper = 100 + 1.96×5 = 109.8

        → any sample mean between 90.2–109.8 = likely

        → means below 90.2 or above 109.8 = unlikely

    Linking to hypotheses

    • Null hypothesis (H₀): no real effect; chance explains results

      • sample mean in likely region → consistent with H₀

    • Alternative hypothesis (H₁): something beyond chance is happening

      • sample mean in tail → consistent with H₁

      • suggests sample may come from different population (e.g., different mean IQ)

    Interpretation

    • tails = unusual samples → raise suspicion of different underlying population

    • statistical test doesn’t tell why — only that it’s improbable under H₀

    • explanation comes from theory or context (e.g., schooling quality, pollution, etc.)

    Summary

    • distribution of sample means = model of expected outcomes if sampling from known population

    • define likely (central 95%) vs unlikely (tail 5%) regions using z = ±1.96 and SEM

    • sample mean within likely region → supports Null

    • sample mean in tails → supports Alternative

    Single sample z test (consolidation)

    Concept

    • we are judging if a sample mean is likely to have come from a known population

    • we use the distribution of sample means as the model of what is expected by chance

    • if the sample mean falls in the central 95% → likely (consistent with Null)

    • if it falls in the outer 5% (top/bottom 2.5%) → unlikely (consistent with Alternative)

    Example 1 (training program)

    • known population (no-training workers): μ = 53, σ = 7

    • sample after training: N = 25, sample mean M = 48

    • calculate SEM = σ / √n = 7 / 5 = 1.4

    • then compute z = (M − μ) / SEM = (48 − 53) / 1.4 = −3.57

    • critical cut-offs = +/− 1.96 (p = .05)

    • |−3.57| > 1.96 → sample mean is in the tail → unlikely it came from no-training population

      → conclude training reduced errors (statistically significant)

    Example 2 (fake cavemen)

    • known cavemen population: μ = 142, σ = 20

    • sample: N = 16, M = 163

    • SEM = 20 / 4 = 5

    • z = (163 − 142) / 5 = 4.2

    • 4.2 > 1.96 → in tail → unlikely to be cavemen

      → sample is probably just modern humans

    Why this matters

    • z test is the direct statistical formalisation of:

      “is this sample mean so extreme that chance is not a reasonable explanation?”

    • this exact logic underlies many later tests (single sample t, paired t, independent t)

    • if you understand this, you understand the core reasoning of inferential statistics

    Summary

    • use population μ and σ to build the sampling distribution

    • convert sample mean to a z score using SEM

    • compare to critical cut-offs (usually ±1.96)

    • reject Null if z falls in tail (p < .05)

    Introduction to hypothesis testing

    • We are very bad at judging probabilities off the top of our heads.

    • Inferential statistics are the set of rules we use to correctly evaluate probabilities.

    • Inferential statistics involves making a generalisation from a sample of data to the population we are interested in.

    • When we have sufficient information about the population we are sampling from, we can make judgments about how representative an observation is.

    Samples, populations, and the distribution of sample means

    • Sampling error and sampling variability - any sample we select from a population will have slightly different statisitics than the population from which it was selected and from other samples drawn from that population.

    • The distribution of samples means is a distribution representing all possible samples drawn from a population.

    • The distribution of sample means has three critical features:

      • its mean is the same as the population

      • it is a normal shaped distribution

      • the standard error of the mean is equal to the standard deviation of the population divided by the square-root of the sample size.

    • We can use the z-tables to answer questions about probabilities associated with any given sample mean.

    Using the sampling distribution

    • The distribution of sample means is a model of expected samples drawn from a population we know about.

    • Conceptually, we set up regions in the distribution delimiting values of sample means we would deem likely and those we would deem unlikely.

    • After describing the distribution of sample means – by declaring its mean, shape and standard error of the mean, we employ critical z-score values and the standard error of the mean to determine the likeliness of our sample.

    • A likely sample is consistent with the Null hypothesis

    • An unlikely sample is consistent with the Alternative hypothesis.

    Single sample z-test

    • These two examples are of a legitimate statistical test, the z test. This test is used often in psychological research.

    • The logic of this test forms the basis of several other statistical tests like the single sample t-test, the dependent groups t-test and even the independent groups t-test.

    • If you followed all these steps and understand the concepts, you are well prepared for the statistics we are going to explore.

    • If you don't fully understand the concepts, please go back and revise.

    Null vs Alternative Hypotheses

    Why we need them

    • the Alternative hypothesis (the “interesting” one) is always vague in real research

      (“sleep deprivation slows reaction time”, “studying increases grades”)

    • you can’t test vague — you need a precise prediction

    • the Null Hypothesis gives us a precise prediction: ZERO difference

    Null Hypothesis (H₀)

    • predicts that nothing special is happening

    • the sample is from the known population

    • any difference between sample mean and population mean is just sampling error

    • the mean of the sample = the mean of the known population (within random fluctuation)

    Alternative Hypothesis (H₁)

    • predicts there IS some effect / difference

    • sample mean is NOT from that known population

    • the sample mean came from a different population (with a different mean)

    How this connects to the distribution of sample means

    • we use the sampling distribution as a model of what chance produces under H₀

    • central 95% (green) = likely → retain H₀

    • extreme 5% (tails) = unlikely → reject H₀ and support H₁

    Example (alcoholism and reaction time)

    • population non-alcoholics: μ = 375ms

    • H₀: alcoholics’ mean = 375ms (sample comes from that distribution)

    • H₁: alcoholics’ mean ≠ 375ms (sample is from a different distribution)

    Summary

    • H₁ is the idea/theory we care about — but it’s not mathematically testable as stated

    • H₀ is crafted to be mathematically exact (no difference = 0)

    • hypothesis testing uses the Null Hypothesis as the model to test against:

      • sample mean in central region → consistent with H₀

      • sample mean in tails → inconsistent with H₀ → evidence for H₁

    Key steps in hypothesis testing

    context

    • research Q: do chronic alcoholics have slower reaction times than non-alcoholics?

    • known population (non-alcoholics): mean = 375ms, SD = 120

    • sample of 16 alcoholics: mean = 475ms

    hypotheses

    • H₁ = alcoholics differ (slower) — vague

    • H₀ = alcoholics same as population (mean difference = zero) — precise + testable

    logic

    • always start by assuming H₀ true

    • build model of what sample means look like if H₀ is true

    • compute probability of getting our sample mean under H₀

    decision system

    • sample mean in central 95% of sampling distribution → probability high → retain H₀ → difference due to chance

    • sample mean in outer 2.5% tails → probability low → reject H₀ → difference not due to chance → support H₁

    working the example

    • SEM = √(120² / 16) = 30

    • z = (475 − 375) / 30 = 3.33

    • critical z ±1.96 (5% cutoff)

    conclusion

    • z = 3.33 is beyond 1.96 → very unlikely under H₀ → reject H₀

    • therefore adopt H₁: alcoholics have slower reaction times

    summary

    • assume H₀

    • compute probability of the observed sample under H₀

    • high probability → retain H₀

    • low probability → reject H₀ and accept H₁

    • this is exactly what the z-test does

    errors + power in hypothesis testing

    two realities

    • reality 1 → H₀ is TRUE (no real effect)

    • reality 2 → H₀ is FALSE (there IS a real effect)

    two decisions

    • retain H₀

    • reject H₀

    correct decisions

    • retain H₀ when H₀ true

    • reject H₀ when H₀ false (this = POWER)

    errors

    • Type I error = reject H₀ when H₀ actually true

      • false positive

      • controlled by alpha level (p)

      • alpha .05 means 5% chance of Type I error

      • alpha .01 lowers Type I error risk to 1%

    • Type II error = retain H₀ when H₀ actually false

      • false negative

      • happens when real effect exists but test fails to detect it

    visual

    • H₀ and H₁ distributions overlap

    • if sample mean falls in overlap where it is not “extreme enough” → Type II error

    power

    • power = correctly rejecting H₀ when H₀ is false

    • power = ability to detect a real effect

    alpha + trade-off

    • raising alpha → more power → but more Type I errors

    • lowering alpha → fewer Type I errors → but more Type II errors → lower power

    effect size influence

    • larger effect size → distributions further apart → less overlap → less Type II error → more power

    sample size influence

    • larger sample → smaller standard error → bigger z values → more likely to reject H₀ → more power

    summary

    • Type I = false alarm

    • Type II = miss

    • alpha sets Type I rate

    • power = ability to detect real effect

    • ↑ alpha = ↑ power but ↑ Type I

    • ↓ alpha = ↓ Type I but ↓ power

    • ↑ effect size + ↑ sample size → ↑ power

    Null and alternative hypothesis

    • The distribution of sample means and the definition of likely and unlikely sample means are directly linked to testing hypotheses.

    • A testable hypothesis must make a precise prediction that can be evaluated.

    • The Alternative Hypothesis (H) is usually the hypothesis that motivates a study. However, it is too imprecise to be tested directly.

      1

    • The Null Hypothesis (H) is useful because it makes a precise prediction – ZERO, no relationship between the IV and DV, nothing special about our sample.

      0

    • Therefore, the Null hypothesis is the critical hypothesis when we are testing hypotheses.

    Null hypothesis statistical tests

    • We worked through the computational steps of testing the hypothesis that chronic alcoholics have a slower reaction than the general non-alcoholic population.

    • Our first step is to assume the null hypothesis is true.

    • We then evaluate the probability of observing our sample if the null hypothesis is true.

      • If the probability is high, we retain the null hypothesis

      • If the probability is low, we reject the null hypothesis and adopt the alternative hypothesis

    • We tested the likelihood of the sample with a z-test.

    Decision errors

    • When conducting hypothesis testing, we can make correct decisions or errors

    • We can incorrectly reject (Type I) or retain (Type II) the null hypothesis.

    • The power of a test is to correctly conclude we should reject the null hypothesis (and infer our altenative hypothesis may be correct) when, in reality, we should.

    • Attempting to maximise the power of a test is a juggling act between minimising two forms of error.

    • Power can also be influenced by effect size and sample size.

    t test intro

    foundation

    • z test → compares sample mean to population mean

    • uses population SD to compute standard error of mean (SEM)

    • SEM = how much an average sample mean differs from population mean

    SEM calculation

    • σ² / N → then square root

    • or → σ / √N

    • same outcome

    hypothesis testing with z

    • find SEM

    • compute z = (sample mean – population mean) / SEM

    • compare to critical ±1.96 (for α = .05, 2-tailed)

    • if z beyond ±1.96 = reject H₀

    real world problem

    • we almost never know the population SD

    • cannot compute “true” SEM

    • solution = estimate

    t test

    • replace σ with S (sample estimates)

    • t = (sample mean – population mean) / estimated SEM

    • S = estimate → because we now estimate population variance + SEM

    estimating variance

    • sample mean is unbiased

    • sample variance is biased if divide by N

    • fix = divide by N-1 → inflates slightly → corrects bias

    • N-1 = degrees of freedom (df)

    • df = how many scores are free to vary given a fixed mean

    evaluate t

    • same logic as z

    • but t distributions change shape depending on df

    • low df → heavier tails (more platykurtic)

    • as df ↑ → t distribution → approaches normal → critical t → approaches 1.96

    critical values

    • t critical depends on df

    • example df = 15, α = .05 → t critical ≈ ±2.132

    • need t observed to exceed ±2.132 to reject H₀

    summary

    • t test = same logic as z but uses estimated SD

    • z = population SD known

    • t = population SD unknown

    • t distribution changes with df

    • small samples → need larger t to reject H₀

    single-sample t test worked example

    purpose

    • use t when you want to test whether a sample mean is significantly different from a known comparison mean

    • same logic as z test

    • BUT population SD unknown → need to estimate variance + SEM

    example: CO₂ 1970 vs 2000

    • 1970 CO₂ = comparison mean

    • sample from year 2000 = sample mean

    • have sum of squares for the sample

    steps

    1. estimated population variance

    • sample variance underestimates population variance

    • fix bias by dividing by N−1 (df)

    • SS / (N−1)

    • = 72600 / 24 = 3025

    2. estimated standard error of mean (SEM)

    • SEM = √(estimated population variance / N)

    • = √(3025 / 25)

    • = √121

    • = 11

    3. compute t

    • t = (sample mean – comparison mean) / estimated SEM

    • t obtained = 3.18

    4. critical value

    • t distribution shape varies by df

    • low df → more platykurtic → bigger critical values

    • df = N−1 = 24

    • t critical (α=.05 2-tailed) = 2.064

    5. decision

    • t obtained = 3.18 > 2.064

    • reject H₀

    • conclude sample mean is unlikely to come from distribution centred at 1970 levels

    • interpret directionally: CO₂ 2000 > CO₂ 1970

    reporting (APA)

    • report direction of effect + means

    • report statistic as: t(24) = 3.18, p < .05

    summary

    • calculate estimated variance using SS/(N−1)

    • derive estimated SEM

    • compute t

    • compare to t critical using df

    • reject or retain H₀

    • report t, df, p, and direction of difference

    repeated measures (dependent means) t test

    what it is

    • used when the same participants provide 2 scores

    • e.g. pre-post, or 2 conditions per participant

    • statistic is based on difference scores (condition1 − condition2)

    why we use difference scores

    • we are not comparing 2 separate group means

    • we are comparing: mean difference between 2 conditions vs 0

    • H₀: mean difference = 0

    • H₁: mean difference ≠ 0

    example: tickling robot

    • participants are tickled twice

    • condition A = self-controlled robot

    • condition B = experimenter-controlled robot

    • DV = milliseconds until pulling away

    • direction must be defined: in this case = self tickle minus experimenter tickle

    • positive numbers = self tickle tolerated longer

    steps

    • compute difference score per participant

    • compute SS of these difference scores

    • estimated population variance = SS / (N−1)

    • SEM = √(estimated variance / N) based on difference scores

    • t = (mean difference – 0) / SEM

    • compare t obtained vs t critical using df = N−1

    • if t obtained > t critical → reject H₀

    example results given

    • mean difference = 6.33 ms

    • estimated population variance = 26.06

    • SEM calculated from that

    • t obtained = 3.04

    • df = 5

    • t critical at α=.05 (two-tailed) = 2.571

    • 3.04 > 2.571 → reject H₀

    • interpretation: participants tolerate tickling longer when they control it

    reporting (APA)

    • report both condition means and the direction

    • specify this is a dependent means t test

    • report t(df) and p value

    • e.g.: t(5)=3.04, p<.05

    An introduction to ttests

    • When comparing a sample mean to a population mean we can use a ztest if we have both the population mean and its standard deviation or variance

    • When we don’t have a population standard deviation or variance we need to calculate an estimate of it

    • The estimated population variance is calculated with a denominator of N1 (or the degrees of freedom) to adjust for an underestimating bias.

    • ttest is a form of *z-*test that uses an estimated standard error of the mean based on estimate of the population standard deviation or variance

    • Results of a ttest are evaluated against critical values on t distributions that differ based on degrees of freedom

    Single sample ttests

    • When conducting a single sample *t-*test we need to calculate an estimated standard error of the mean from our sample data.

    • We do this by using information from our sample, such as the sum of squares.

    • Using our degrees of freedom of N – 1 we look up our t critical value in the t table.

    • We compare our t obtained score to the t critical value and the regions it defines for retaining or rejecting the null hypothesis.

    • We determine if we can reject the null hypothesis and find support for the alternative hypothesis that our sample mean differs significantly to the population mean.

    • When reporting a single sample *t-*test result be sure to report the means, the direction of the difference between them and t obtained with associated df and p values.

    **Dependent means t-test

    • A single sample ttest can be adapted to test the difference between two dependent means obtained from a repeated measures design.

    • Difference scores are calculated for each participant and these difference scores become the data used to calculate the estimated population variance and the estimated standard error of the mean.

    • The mean of the sampling distribution used as a comparison point will be zero, if the null hypothesis assumes no differences between means.

    • It is important to bear in the mind the direction that the difference scores are calculated to ensure results are interpreted correctly.

    • When reporting dependent means ttest results, means for both conditions should be reported along with the direction of the difference between them, the t value and its associated degrees of freedom and p value.

    independent groups t-test (logic)

    what it is

    • used when comparing two separate groups (different people in each group)

    • IV has 2 levels, DV is measured once in each group

    • groups can come from true random assignment OR quasi assignment

    • key = independent observations

    hypotheses

    • H₀ = population means are equal → difference = 0

    • H₁ = population means differ → difference ≠ 0 (or specific direction if one-tailed)

    conceptual model

    • like all other tests: logic is built around the sampling distribution

    • but this time → sampling distribution = distribution of differences between sample means

    • under H₀:

      • the 2 population distributions overlap (same mean)

      • the 2 sampling distributions of means overlap

      • therefore mean of “difference between means” = 0

    effect direction example (tickling)

    • if robot tickling actually produces higher tickle ratings than self tickling → the 2 population means differ

    • differences between means (MA − MB) would not centre at 0

    • if H₀ is true → difference distribution centres exactly at 0

    the computational complication

    • we do not have 1 sample (like single sample)

    • we do not have paired scores (like dependent t)

    • we have 2 independent samples and we must estimate population variance using both

    • therefore we need to compute a combined pooled variance based on both samples’ data

    • then we use that pooled variance to get the estimated standard error of the difference between means

    summary

    • independent groups t compares 2 group means drawn from different people

    • H₀ predicts a difference of exactly 0

    • H₁ predicts a non-zero difference

    • the sampling distribution used is the distribution of differences between sample means

    • key computation = estimate standard error from both groups (pooled)

    • inference: if the obtained difference is unlikely under H₀ → reject H₀ and conclude group means differ in population

    independent groups t-test worked example (distributed vs massed skateboarding practice)

    design + context

    • quasi-experiment (participants self-selected which condition)

    • independent groups (different people in each condition)

    • IV = learning schedule

      distributed: 1 hour × 8 days (n=15)

      massed: 8 hours in 1 day (n=20)

    • DV = number of half-pipe reps they could do at test

    • distributed mean = 7.4

    • massed mean = 5.2

    • observed difference = 2.2 reps

    hypotheses

    • H₁: mean distributed ≠ mean massed

    • H₀: mean distributed = mean massed (difference = 0)

    logic

    • if H₀ is true → populations overlap → sampling distributions overlap → sampling distribution of mean differences is centred on 0

    • we compare our obtained difference (2.2) to this model to see if it is likely or unlikely under chance

    assumptions

    • normal populations

    • homogeneity of variance (population variances roughly equal)

    • independence of observations (different people in each group)

    computational steps (condensed conceptual)

    • compute estimated population variance separately for each group

    • combine these two estimates into a pooled variance (weighted by df)

    • compute variance of each group’s sampling distribution (divide pooled variance by that group’s n)

    • sum these two to get variance of sampling distribution of the difference

    • square-root = standard error of the difference

    • compute t = (difference between sample means) / (standard error)

    result

    • t obtained = 4.23

    • df total = (n₁ − 1) + (n₂ − 1) = 33

    • critical t ≈ 2.04 (α = .05, 2-tailed)

    inference

    • 4.23 > 2.04 → difference very unlikely due to chance → reject H₀

    interpretation + reporting (APA style)

    • distributed learning produced significantly more reps than massed learning

    • t(33) = 4.23, p < .05

    summary

    • independent groups t compares 2 separate group means (different people)

    • must pool variances to estimate population variance

    • the pooled variance drives the standard error calculation

    • if obtained t > critical t → reject H₀ → support H₁

    confidence intervals (CI) — key idea

    • same computations as hypothesis testing but used to estimate what the population mean probably is

    • instead of only saying “sample is / is not representative of X population,” we estimate the range where the true population mean likely sits

    • a point estimate = a single number (sample mean)

    • an interval estimate = that sample mean ± margin of error

    CI logic (visual)

    • 95% CI uses the central region of the sampling distribution (middle 95%)

    • mathematically → sample mean ± (critical value × standard error)

    • for z-based CI → critical value = 1.96

    • for t-based CI → critical value = tcrit from table (depends on df)

    example 1 (alcoholism reaction time)

    • sample of 16 alcoholics: M = 475ms

    • known general population mean = 375ms, sd = 120ms, SEM = 30

    • margin of error = 1.96 × 30 = 58.8

    • 95% CI = 475 ± 58.8 = [416.2, 533.8]

    • interpretation = we are 95% confident the true mean RT for chronic alcoholics is between 416.2 and 533.8ms

    • because entire CI is above 375ms → same inference as z-test: reject H₀

    example 2 (distributed vs massed practice in skateboarding)

    • observed mean difference = 2.2 reps

    • t-critical (df=33) ~ 2.043

    • standard error of difference = 0.52

    • margin of error = 2.043 × 0.52 ≈ 1.06

    • 95% CI = 2.2 ± 1.06 = [1.14, 3.26]

    • interpretation = we are 95% confident the true benefit of distributed learning is between 1.14 and 3.26 more reps

    • CI does NOT contain 0 → same inference as independent groups t-test: reject H₀

    summary

    • confidence intervals use the same logic and maths as hypothesis tests

    • 95% CI gives a plausible range of true population values

    • if a 95% CI does not include the null value (usually 0) → reject H₀

      and this will always match the conclusion of a .05 hypothesis test

    effect size (Cohen’s d) — core concept

    • after we find a statistical difference → we need to know how big that difference actually is

    • effect size answers: “how far apart are the two populations?”

    • Cohen’s d = standardized effect size measure

    interpreting Cohen’s d visually

    • small d → lots of overlap between populations

    • medium d → moderate overlap

    • large d → little overlap

    Cohen’s d formula (concept)

    • (mean of population 2 − mean of population 1) / population SD

    • if we only have sample data → we estimate SD and use that estimate

    example: positive information → attractiveness

    • no info group: M = 200, SD = 48

    • positive info group: M = 220

    • d = (220 − 200) / 48 = .42 (medium-ish)

    • if positive group was M = 210 → d = .21 (half as big)

    benefit of Cohen’s d

    • like z → allows comparison across studies even when scales differ

    • one study used 0–400 scale, another used 1–10 scale → d standardizes them

    general guideline (Cohen, 1988)

    • d ≈ .20 = small

    • d ≈ .50 = medium

    • d ≥ .80 = large

    computing Cohen’s d after t-tests

    • same logic, but we need sample-estimated SD instead of population SD

    dependent (repeated) example — tickle robot

    • mean difference = 6.33

    • estimated pop SD = sqrt(SS / df) = sqrt(130.33 / 5) ≈ 5.1

    • d = 6.33 / 5.1 ≈ 1.24 (large)

    independent groups example — skateboarding

    • distributed − massed = 2.2 reps

    • pooled variance = 2.34 → sqrt = 1.53

    • d = 2.2 / 1.53 ≈ 1.44 (very large)

    summary

    • Cohen’s d is a standardized measure of how far apart two means are

    • allows effect comparison across different scales + different studies

    • most journals require reporting effect size (APA manual requirement)

    • add it alongside p-values to properly describe your result’s practical importance

    Independent groups design

    • Independent groups ttests are appropriate for designs that have two groups of observations, e.g. a true randomised experiment with two groups, or a quasi-experiment with two groups.

    • In conducting this test, we assume the null hypothesis is true. This is visualised as the underlying populations of our two samples as completely overlapping with the same mean.

    • The resulting sampling distributions also overlap under the null hypothesis.

    • The distribution of differences between means has a mean of zero, which gives us a precise number against which to test our observed difference between the means of the two groups in our study.

    • The major computation in this analysis is finding the standard error of the distribution of differences between means.

    Example: Learning the half-pipe

    • The computations for the independent groups ttest require us to make two estimates of the population variance based on the two independent samples.

    • These two estimates are combined into a single pooled estimate.

    • The pooled variance is used to compute the standard error of the sampling distributions for each of the conditions.

    • These variances are added together to get the estimated variance of the sampling distribution.

    • Taking the square root of the estimated variance of the sampling distribution gives us the standard error of the sampling distribution.

    • To get the observed tstatistic, we divide the observed difference between the means of the Distributed and the Massed group and divide that by the standard error of the sampling distribution.

    • We compare our result with the critical value in the ttables associated with the df closest to the df Total of our experiment without going over.

    • If the observed tstatistic is larger than the critical value, we reject the null hypothesis and adopt the alternative hypothesis.

    Confidence intervals

    • Confidence intervals are an informative complement to the null hypothesis tests of ztests, single sample ttests, repeated measures/dependent means ttests, and independent groups t-tests.

    • They are based on the same computations as the null hypothesis tests but use the information to generate an estimate of the mean of the population the sample came from.

    • Interpreting 95% confidence intervals leads us to the same conclusion as a null hypothesis test with an alpha level of 5% or 0.05.

    Effect size

    • Cohen's d is a standardised measure of effect size.

    • It is an informative adjunct to a statistical test (e.g. repeated measures ttest; independent groups ttest).

    • Can further support a statistically significant result.

    • Cohen's d enables comparison of effect sizes across studies that have used different measurements with different means and standard deviations.

    • The APA Publication Manual standard is to provide some measure of effect size in addition to the results of a significance test.

    Annotate

    Readings

    Grove — Chapter 1

    1.1 Introduction

    • Common perception of psychology often limited to mental disorders and therapy (e.g., Freud, Dr. Phil).

    • Psychology is a broad discipline with over 50 divisions (American Psychological Association, Stanovich, 2007).

    • Examples of divisions: General psychology, Military psychology, Teaching psychology, Exercise and Sport psychology, Organisational psychology, Psychology and Law, Addictions, Study of Men and Masculinity.

    • Despite diversity, unifying feature: psychology embraces the scientific method to seek knowledge and truth about behavior.

    1.2 Psychology and the Scientific Process

    1.2.1 Knowledge from Personal Experience

    • Often, people do not know the source of their knowledge.

    • Example: Beliefs about jogging causing knee problems may come from anecdotal evidence (e.g., a friend’s injury).

    • Problem: Overgeneralization from limited cases and selective perception (focusing on negative incidents, ignoring positive).

    • Personal experience is biased and subjective; knowledge from it is unreliable and unverified.

    1.2.2 Knowledge from Authority

    • We rely on experts or authorities (lecturers, mechanics) to simplify complex knowledge.

    • Trust is placed on qualifications and institutional evaluation.

    • Problem: Some authorities (politicians, broadcasters) are accepted without scrutiny of their expertise or evidence sources.

    • Authority figures can be biased or mistaken just like anyone else.

    1.2.3 Rationalism: Knowledge from Reason

    • Knowledge can come from reasoning, e.g., mathematical proofs and logical syllogisms.

    • Example: If all brown dogs are friendly and Bonzo is a brown dog, then Bonzo is friendly.

    • Limitation: Validity depends entirely on the soundness of initial assumptions.

    • Rationalism is reliable only if premises are true.

    1.2.4 Empiricism: Knowledge from Observation

    • “Seeing is believing” is problematic: observations are subjective, influenced by sensory differences, culture, mood, intoxication.

    • Different observers may describe the same event differently.

    1.2.5 Knowledge from Science

    • All previous methods (experience, authority, reason, observation) are fallible and prone to error.ii

    • The scientific method is the best way to generate sound, reliable knowledge based on evidence.

    • Scientific claims must be supported by evidence, not faith or morality.

    • Five basic principles of science:

      • Objectivity

      • Skepticism

      • Openness / Open-mindedness

      • Tentativeness

      • Independence from authority

    1.3 Principles of the Scientific Method

    1.3.1 Objectivity

    • Evidence must be observable by anyone, not just personal feelings or thoughts.

    • Example: Measuring anxiety by physiological responses (heart rate, sweating) rather than self-report.

    1.3.2 Skepticism

    • Science requires claims to be supported by evidence and critically evaluated.

    • Reflexive response: “Show me the evidence.”

    • No acceptance of claims based solely on authority or intuition.

    • Example: Galileo disproved the intuition that heavier objects fall faster by testing it experimentally.

    1.3.3 Openness / Open-mindedness

    • Scientists must fully report methods and conditions so others can replicate studies.

    • Different interpretations or conclusions by others are acceptable if based on evidence.

    • Inter-rater reliability: agreement among different observers.

    • Must be willing to accept new evidence and revise beliefs accordingly, even if ideas seem extraordinary.

    1.3.4 Tentativeness

    • Scientific knowledge is provisional and subject to revision with new data.

    • No scientific finding is 100% certain.

    • Confidence increases with repeated, replicated evidence (e.g., Pavlov’s classical conditioning).

    • Ambiguity is expected, especially early in research fields (example: ongoing debates about global warming).

    1.3.5 Independence from Authority

    • Authority is irrelevant unless claims are supported by solid evidence.

    • Even reputable sources must be scrutinized skeptically.

    1.4 Assumptions of Science

    1.4.1 Nature is Lawfully Organised

    • There are finite, discoverable rules explaining natural phenomena, including human behavior.

    • This does not necessarily negate free will; humans show patterns and consistencies (e.g., language acquisition, altruism).

    1.4.2 Science Assumes Determinism

    • Knowledge of rules allows prediction of behavior in given contexts.

    • Psychological rules predict behavior of most people most of the time (not necessarily every individual).

    • Example: Memory loss follows predictable patterns over time.

    1.4.3 Science is Concerned with Solvable Problems

    • Only questions answerable through objective evidence are scientifically valid.

    • Questions like “Is there life after death?” or “Are people essentially good?” cannot be answered scientifically as phrased.

    • However, rephrasing questions to be measurable can make them scientific (e.g., “Are lone individuals more likely to help someone in distress than groups?”).

    • Example: The “bystander effect” showed individuals are more likely to help when alone than in groups, challenging intuition.

    1.5 Goals of the Science of Psychology

    Psychology seeks to understand behavior through four key goals:

    1.5.1 Describe Behavior

    • First step: identify when and where behavior occurs.

    • Example: Pilots underestimate altitude more at night, resulting in short landings.

    • Description can be quantitative (distance from touchdown point, number of altitude adjustments) or qualitative.

    1.5.2 Explain Behavior

    • Provide causes or reasons for observed behavior.

    • Avoid pseudo-explanations (circular reasoning).

    • Example: Pilots underestimate altitude at night due to lack of visual information, not because estimates are simply “unreliable.”

    1.5.3 Predict Behavior

    • Use description and explanation to predict when behavior will or will not occur.

    • Example: Predict pilots will perform worse at night; if data contradicts this, revise the explanation.

    1.5.4 Control Behavior

    • Use understanding to influence or modify behavior.

    • Example: Improving night landings by adding ground lights to increase visual information.

    • Successful control validates deeper understanding.

    1.6 Scientific Hypotheses and Non-Scientific Theories

    1.6.1 What Is a Theory?

    • Definition: A theory is a logically organized set of propositions (claims, statements, assertions) that:

      • Summarizes existing knowledge on a topic

      • Organizes that knowledge into specific relationships between variables/factors

      • Explains, at some level, the phenomenon of interest

      • Makes specific predictions about outcomes in situations relevant to the theory

    • Role in Psychology:

      • Theories explain why behavior occurs as it does

      • Theories are precise statements about how world events affect behavior

      • Include assumptions and concepts that must be clearly defined to be tested and understood

    • Example: Piaget’s Developmental Theory

      • Children develop through 4 stages, each with distinct cognitive/physical abilities

        • Stage 1 (0-2 years): Object permanence, intentional actions

        • Stage 2 (2-7 years): Use language to represent objects, classify by single features

        • Stage 3 (7-11 years): Logical thinking about objects/events

        • Stage 4 (11+ years): Abstract thinking, hypothesis testing, future and ideological thought

      • Requires operational definitions: precise criteria for testing components (e.g., what counts as "understanding logic")

    1.6.2 What Is a Hypothesis?

    • Definition: A hypothesis is a focused, tentative explanation for behavior, derived from a theory.

    • Relationship to Theory:

      • Hypotheses test specific parts of broader theories.

      • Example: Testing if children 7-11 understand conservation of volume (a facet of Piaget’s theory)

    • Experiment Example:

      • Children given two identical short fat glasses of water; one is poured into a tall skinny glass

      • Prediction: Younger children (2-7) fail conservation (think tall glass has more water), older children (7-11) pass

    • Types of Hypotheses in Experiments:

      • Research (Alternate) Hypothesis (H1): Predicts a difference exists between groups (e.g., older children perform better)

      • Null Hypothesis (H0): Predicts no difference; any observed difference is due to chance or unrelated variables

    1.6.3 Where Do Hypotheses Come From?

    • Generating Hypotheses:

      • Can be intimidating for new researchers

      • Eureka moments exist but most come from systematic reading and study

    • Process:

      • Read published research on a topic to become familiar with terminology, methods, and theories

      • Identify gaps or shortcomings in existing research that suggest new hypotheses

      • Example: Previous alcohol-driving study measured errors; new hypothesis could focus on reaction time for deeper insight

    • Replication:

      • Exact replication: Repeat study exactly to confirm original results (increases confidence or questions prior findings)

      • Conceptual replication: Test the same hypothesis but with modified methods to explore robustness (e.g., measuring reaction times as well as errors)

    • Analogy:

      • Exact replication = asking same witness repeatedly for consistent testimony

      • Conceptual replication = asking different witnesses for corroboration

    1.6.4 Falsifiability

    • Core Principle:

      • Scientific theories/hypotheses must be falsifiable — they can be proven false by evidence

      • Non-falsifiable theories stall scientific progress (dead ends)

    • Example:

      • Theory of gravity predicts objects always fall down when dropped

      • If an object were observed floating up, theory would be falsified

    • Laws vs Theories:

      • Laws are theories that have withstood extensive attempts to falsify

      • Strictly, science only has theories, none are absolute laws

    • Analogy:

      • Theory = structural steel tested under stress

      • Each test strengthens or weakens confidence in the theory

      • Failing a test prompts modification or replacement of theory

    1.6.5 Circular Hypotheses

    • Problem:

      • Hypotheses that explain an event by restating the event itself are circular and uninformative

    • Examples:

      • "The boy is distractible because he has attention deficit disorder" (distractibility = disorder)

      • "Financial crisis caused by economic panic" (crisis = panic)

    1.6.6 Hypotheses Containing Non-Scientific Ideas or Forces

    • Issue:

      • Hypotheses involving concepts outside objective observation are untestable

    • Example:

      • Saying violent acts are caused by satanic possession is not testable scientifically, as "satan" is not objectively measurable

    1.6.7 Hypotheses Containing Ill-Defined Terms

    • Problem:

      • Without clear definitions, hypotheses cannot be tested

    • Example:

      • "Man tried to assassinate the president because he was mentally disturbed" is untestable without a clear, measurable definition of "mentally disturbed"

    1.6.8 Good Theories and Hypotheses Are Parsimonious

    • Parsimony: The simplest explanation that accounts for all observations is preferred

    • Approach:

      • Start with simple hypotheses explaining phenomena

      • Add complexity only when necessary due to new data

    • Example: Conway Lloyd Morgan’s Principle:

      • Avoid attributing higher mental faculties to animals when simpler explanations (like trial-and-error) suffice

      • E.g., dog opening gate explained by trial and error, not logical reasoning

    • Parsimonious explanations are generally favored over more complex ones if equally effective

    1.7 Scientific and Non-Scientific Evidence

    • Research Studies:

      • Ask and answer scientific questions

      • Two main types: (1) Experiments, (2) Observational studies

      • Must define:

        • Participants (who is studied)

        • Situations/settings for observation

        • Measurement methods for behavior

    • Criteria for Compelling Scientific Evidence:

      • Observations must be:

        • Objective: Free from bias and personal influence; results replicable regardless of observer

        • Systematic: Conducted step-by-step with a clear method (e.g., varying conditions in flight simulator to test pilot performance)

        • Controlled: Extraneous variables are held constant or eliminated to isolate factors of interest (e.g., controlling wind, cloud cover in pilot studies)

    1.7.1 Evidence Must Be Empirical

    • Based on observable phenomena that can be independently verified

    • Disputes about interpretation resolved by gathering more observations

    1.7.2 Observations Must Be Objective

    • Must not be influenced by the researcher’s beliefs or expectations

    • True objectivity means consistent results regardless of observer

    1.7.3 Observations Must Be Systematic

    • Observations follow a clear, consistent procedure

    • Example: Testing pilot’s altitude judgment under varying levels of visual ground detail systematically

    1.7.4 Observations Must Be Controlled

    • Control or hold constant all variables other than the one being studied

    • Example: Using a flight simulator to control environmental factors affecting pilot’s altitude estimate

    1.8 Critically Evaluating Evidence

    • Scientific Attitude Essentials:

      • Maintain an objective stance

      • Be skeptical of others’ arguments and evidence

      • Stay open-minded to new and surprising evidence

      • Be tentative and cautious about your own claims

      • Avoid undue influence by authorities when conducting or interpreting research

    • Initial Step in Evaluation:

      • Assess whether the researcher adhered to these principles above

    • Quality of Evidence Criteria:

      • Is the evidence empirical?

      • Is the evidence objective?

      • Is the evidence systematic?

      • Was it acquired under controlled circumstances?

    • Consequences of Failure:

      • Failing any of these criteria weakens confidence in the study’s conclusions

      • Some failures are more damaging than others (to be discussed further in the next chapter)

    1.9 The Scientific Process

    • Overview:

      • The scientific process can be visualized as a flow chart with feedback loops

      • Starts with an idea or theory explaining a phenomenon

        • Example: Children progress through four cognitive developmental stages

    • Step 1: Generate Hypotheses

      • Smaller, testable predictions derived from the theory

      • Example: 7-11-year-olds understand conservation; 2-7-year-olds do not

    • Step 2: Design a Study

      • Develop a carefully planned experiment to test hypotheses

      • Study design is critical; poor design leads to uninterpretable results

      • Chapter 2 focuses entirely on study design principles

    • Step 3: Collect and Organize Data

      • Raw data from questionnaires, interviews, tests, etc., initially unorganized

      • Summarize, organize, and describe the data to understand group relationships

    • Step 4: Data Analysis (Inferential Statistics)

      • Perform statistical tests to infer whether hypotheses are supported

      • This stage determines the reasonableness of the hypothesis as an explanation

    • Step 5: Interpret Results and Revise

      • If data support the hypothesis: proceed to write a report and publish findings

      • If data do not support hypothesis: revise or abandon it, retest with new hypothesis

      • This iterative revision is part of scientific progress

    • Step 6: Replication by Others

      • Studies may be replicated exactly or conceptually by other researchers

      • Consistent replication increases confidence and moves hypothesis toward theory status

      • Failure to replicate reduces confidence, possibly leading to hypothesis rejection or modification

    2.2.3 Sub-Types of Research

    Two Broad Categories Based on Goals

    • Basic Research – aims to increase knowledge for its own sake.

    • Applied Research – aims to solve practical problems.

    Two Broad Categories Based on Methods

    1. Qualitative Research

      • Focuses on descriptive, non-numerical data.

      • Examples:

        • Detailed interviews.

        • Classroom interaction recordings.

        • In-depth case studies (e.g., rare conditions, brain injury).

      • Produces extended descriptive narratives.

      • Typically does not use statistical analysis.

    2. Quantitative Research

      • Involves measurements and numerical data.

      • Analyzed using descriptive and inferential statistics.

      • Primary focus of this course.

    2.2.4 Experimental Variables

    Independent Variable (I.V.)

    • Definition: The variable manipulated by the experimenter.

    • Assumed to cause changes in the dependent variable.

    • Example: In a study, adding an extra mental task during driving simulation.

    • Levels of I.V.:

      • Level 1: No additional task (control).

      • Level 2: Additional task (treatment).

    • Can be:

      • Directly manipulated (e.g., task type).

      • Naturally occurring and grouped (e.g., age, gender, IQ, height).

    • Must have at least two values.

    • Often includes a zero (no treatment) and non-zero (treatment) condition.

    Dependent Variable (D.V.)

    • Definition: The variable measured at each level of the I.V.

    • Depends on the I.V. exposure.

    • Example: Driving performance scores after word-generation tasks.

    • Examples of D.V. measurements:

      • Reaction time (seconds) when I.V. is age.

      • Test performance when I.V. is teaching method.

    2.2.5 Unwanted Variables

    Purpose of Control

    • Goal: Determine if a true relationship exists between I.V. and D.V.

    • Challenge: Other factors (extraneous variables) can obscure results.

    • Need to identify and minimize these influences.

    Random Variables

    • Definition: Influences on the D.V. not directly due to the I.V.

    • Three main sources:

      • Situational variables.

      • Individual differences.

      • Measurement error.

    • Effect: Obscures the I.V.–D.V. relationship.

    Situational Variables

    • Characteristics of the testing environment.

    • Examples: Room temperature, noise level, lighting, time of day.

    • Can degrade or improve performance in unpredictable ways.

    • Example: Teaching method study – noisy/hot rooms may obscure differences in test performance.

    • Best practice: Optimize and hold situational variables constant.

    Individual Differences

    • People vary in traits such as height, IQ, motivation, anxiety, concentration.

    • Even within the same experimental condition, individuals differ in D.V. results.

    • Can interact with situational variables to amplify variability.

    • Reduction strategies:

      • Select participants with similar characteristics relevant to the study.

      • Example: Use only psychology students in a statistics learning study to ensure uniform background.

    Measurement Error

    • Differences due to experimenter performance, attention, or equipment.

    • Examples:

      • Misreading measurements.

      • Inconsistent stopwatch timing.

      • Missing behaviours in observation.

    • Reduction strategies:

      • Automate measurement where possible (e.g., computer-based reaction time tests).

      • Standardize instructions (script or recorded message).

    Extraneous Variables and Their Effects

    • Situational variables, individual differences, and measurement error have non-systematic effects.

    • Can inflate or deflate scores randomly, weakening consistency.

    • Example: Table 2.1 shows that removing random variables clarifies the I.V.–D.V. relationship, while their presence obscures it.

    Analogy: Random Variability as TV Static

    • Random variability is like TV static obscuring a clear picture.

    • The "signal" is the relationship between I.V. and D.V.

    • More variability = harder to detect the signal.

    • Controlling variability = tuning the channel for a clear picture.

    • Goal: Minimize random variability and eliminate confounding variables.

    Experimental Setup – Blakemore et al. (1999)

    • Participants sat at a table with right palm facing up.

    • A robot arm’s tip lightly touched the participant’s right palm.

    • Participant’s left hand held a robot control arm.

    • Conditions:

      • Self-tickle: Participant controlled robot arm with left hand, moving over 2 cm distance at ~2 strokes/sec.

      • Robot-tickle: Robot arm movements controlled by computer, mimicking self-tickle movement parameters.

    • After each trial, participants rated ticklishness on a 1–10 scale.

    Independent Groups Design

    • Two levels of Independent Variable (IV): self-tickle vs. robot-tickle.

    • Dependent Variable (DV): ticklishness rating (1–10).

    • Participants randomly assigned to one of the two groups.

    • Hypothesis: Higher ticklishness ratings when tickled by robot (external source) vs. self.

    • Data plotted with:

      • X-axis: condition (self-tickle, robot-tickle)

      • Y-axis: ticklishness rating

      • Robot-tickle scores generally higher but with overlap in ratings between groups.

    Role of Inferential Statistics

    • Visual trends can suggest an effect but overlap in data points means differences could be due to chance.

    • Inferential tests determine whether observed differences are statistically significant.

    Limitations of Independent Groups Design

    • High variability due to individual differences in ticklishness.

    • Ideal scenario would involve identical participants to remove variability — not feasible in reality.

    • Random variability can obscure effects of the IV.

    Strategies to Minimize Variability in Independent Groups

    • Use a homogeneous sample (e.g., similar ages to control for changes in sensitivity with age).

    • Keep experimental environment constant (room, lighting, temperature, time of day).

    • Use the same experimenter or identical pre-recorded instructions.

    Repeated Measures Design (Within-Subjects)

    • Same participants tested in all conditions — reduces variability from individual differences.

    • Only 16 participants needed for both conditions vs. 32 in independent groups.

    • Participants randomly assigned to order of conditions to control for order effects:

      • Half: robot-tickle first, then self-tickle.

      • Half: self-tickle first, then robot-tickle.

    • Counterbalancing ensures order effects do not confound results.

    Benefits of Repeated Measures

    • Reduces random variability due to individual differences.

    • Tighter clustering of data points in results (less spread).

    • Example: Same participant’s ratings across conditions more consistent than ratings between two different people in same condition.

    Limitations of Repeated Measures

    • Inappropriate if exposure to one condition permanently or temporarily alters performance in the other condition.

    • Example: Learning tasks — once learned, cannot be “unlearned” for subsequent conditions.

    • Certain drug studies or skill-based experiments require independent groups.

    Key Points

    • Independent groups: more susceptible to variability; larger sample size needed.

    • Repeated measures: preferred when possible, as it reduces variability and sample size requirements.

    • Counterbalancing is essential in repeated measures to prevent order effects from biasing results.

    Independent Groups Design

    • Different participants in each condition.

    • More variability due to individual differences.

    • Requires larger sample sizes.

    • No risk of order effects.

    • Appropriate when one condition might affect the other (learning, drug effects, etc.).

    Repeated Measures Design

    • Same participants in all conditions.

    • Less variability — individual differences controlled.

    • Smaller sample size needed.

    • Requires counterbalancing to avoid order effects.

    • Not suitable if exposure to one condition permanently alters performance in another.

    Displaying the Order in a Group of Numbers Using Tables and Graphs

    Why Organize Numbers?

    • Raw data lists are often overwhelming and obscure patterns.

    • Organizing numbers into tables and graphs allows clearer visualization of patterns, distributions, and relationships.

    • Goal: make sense of data by revealing order in what appears chaotic.


    Frequency Tables

    • Definition: A frequency table shows how often each value (or range of values) occurs in a dataset.

    • Steps to create:

      1. List all possible values from highest to lowest.

      2. Mark frequency (f): Count how many times each value occurs.

      3. Calculate percentage (%): Divide each frequency by total number of scores, then multiply by 100.

    • Advantages:

      • Provides a structured overview of the distribution.

      • Easy to spot common and rare values.

      • Forms the foundation for constructing graphs.


    Grouped Frequency Tables

    • Used when data spans a wide range or has many distinct values.

    • Values are grouped into intervals (e.g., test scores 90–99, 80–89).

    • Rules for grouping:

      • Intervals must be of equal size (e.g., 10 points each).

      • No overlapping intervals.

      • Every value must fit into one and only one interval.

    • Purpose: Simplifies large datasets, making trends clearer.


    Histograms

    • Definition: A bar graph representing a frequency distribution.

    • Key features:

      • X-axis: Variable values (e.g., scores).

      • Y-axis: Frequencies.

      • Bars: Touch each other (unlike bar charts for categories), indicating continuous data.

    • Interpretation:

      • Shape of distribution becomes visible (e.g., normal, skewed, uniform).

      • Easy to see modes, clustering, and spread of scores.


    Frequency Polygons

    • Definition: A graph that uses points connected by straight lines to show frequency.

    • Steps to create:

      • Plot frequencies above each value or midpoint of intervals.

      • Connect the dots with lines.

    • Advantages:

      • Easier comparison of two or more groups on the same graph.

      • Highlights trends more smoothly than histograms.


    Shapes of Distributions

    • Normal Distribution (bell curve): Symmetrical, unimodal, most scores around the middle.

    • Skewed Distributions:

      • Positively skewed (right-skewed): Long tail on right; common in income data (many low/mid values, few very high).

      • Negatively skewed (left-skewed): Long tail on left; less common, e.g., easy tests where most score high.

    • Bimodal or Multimodal: More than one peak in frequency; suggests multiple subgroups.

    • Rectangular/Uniform: Roughly equal frequency across all values.


    Stem-and-Leaf Plots

    • Definition: A method to display data while retaining original values.

    • Structure:

      • Stem: First digit(s) of numbers.

      • Leaf: Last digit of numbers.

    • Advantages:

      • Maintains raw data visibility.

      • Useful for small datasets.

      • Quick way to see distribution shape and spread.


    The Importance of Visualization

    • Tables summarize precise numbers.

    • Graphs provide an immediate impression of patterns.

    • Together, they help identify outliers, clustering, distribution shape, and trends.

    • These representations are essential groundwork before moving into descriptive or inferential statistics.

    Central Tendency and Variability

    Orientation

    • Purpose: describe and summarise groups of scores using single “typical” values (central tendency) and “spread” (variability).

    • Representative value: mean (primary), with mode and median as alternatives.

    • Variability measures: variance and standard deviation.

    • Statistical formulas are “recipes”; symbols must be understood before use.


    Central Tendency

    Concepts

    • Central tendency: the middle or typical value of a distribution.

    • Three measures: mean, median, mode.

    • Choice of measure depends on scale of measurement, distribution shape, and presence of outliers.

    Mean (arithmetic average)

    • Definition: sum of all scores divided by number of scores.

    • Notation and formula:

      • ( M ) (preferred in psych articles; sometimes (\bar{X})).

      • ( \sum ) = “sum of”.

      • ( X ) = scores; ( N ) = number of scores.

      • ( M = \dfrac{\sum X}{N} ).

    • Interpretation:

      • Balance point of the distribution (teeter-totter analogy).

      • Total distance above the mean equals total distance below.

      • Need not be an observed score; can be decimal even when all ( X ) are integers.

    • Worked examples (as given):

      • Dreams (10 students): ( M = 6 ).

      • Stress ratings (30 students): ( M = 6.43 ) (rounded to two decimals beyond original precision).

      • Social interactions (94 students): ( M = 17.39 ).

    Steps to compute the mean

    • Add all scores: compute ( \sum X ).

    • Divide by ( N ).

    Mode

    • Definition: most frequent single value in the distribution.

    • Identification:

      • Highest frequency in a frequency table; peak of a histogram.

    • Properties and cautions:

      • Can differ from mean/median in skewed or irregular distributions.

      • Can remain unchanged when many scores shift—often a poor overall summary for numerical data.

      • Appropriate for nominal variables (e.g., most common religion).

    Median

    • Definition: middle score when scores are ordered low→high.

    • Even ( N ): median is the average of the two middle scores.

    • Steps:

      • Order scores.

      • Locate middle position: ((N+1)/2); if fractional, take the two adjacent scores and average.

    • Robustness:

      • Resistant to outliers; useful when extreme scores distort the mean.

    • Example (reaction times):

      • Mean inflated by one very long time; median better captures typical performance.

    Comparing mean, median, mode

    • Skewed left (negative skew): mean < median < mode.

    • Skewed right (positive skew): mode < median < mean.

    • Normal (perfectly symmetric, unimodal): mean = median = mode.

    • Use guidelines:

      • Mean: equal-interval/ratio variables without extreme outliers; dominant in psychology.

      • Median: rank-order variables; when outliers or skew make the mean unrepresentative.

      • Mode: nominal variables; rarely used for numerical data.

    Illustrative controversy (interpretation risk)

    • Partner-number preferences:

      • Means suggested huge male–female difference (e.g., 64.3 vs 2.8).

      • Medians and modes both near 1 for men and women, revealing strong skew driven by few extreme male responses.

      • Lesson: focusing only on the mean can misrepresent skewed distributions.


    Variability

    Concepts

    • Variability: the spread of scores around the mean.

    • Distributions can share the same mean but differ in spread; conversely, different means can have similar spread.

    • Two main descriptive measures: variance and standard deviation.

    Variance

    • Definition (definitional form): average of the squared deviations from the mean.

      • Deviation score: ( X - M ).

      • Squared deviation: ( (X - M)^2 ).

    • Formulae:

      • ( SD^2 = \dfrac{\sum (X - M)^2}{N} ) (definitional).

      • ( SS = \sum (X - M)^2 ) (sum of squares); thus ( SD^2 = SS/N ).

      • Historical computational shortcut (less instructive): ( SD^2 = \dfrac{\sum X^2 - \dfrac{(\sum X)^2}{N}}{N} ).

    • Interpretation:

      • Larger when scores are more spread out.

      • Rarely reported descriptively because units are squared and less intuitive.

    Standard Deviation

    • Definition: positive square root of the variance.

    • Formulae:

      • ( SD = \sqrt{SD^2} = \sqrt{\dfrac{\sum (X - M)^2}{N}} = \sqrt{\dfrac{SS}{N}} ).

    • Interpretation:

      • Roughly the average distance of scores from the mean in original units.

      • Commonly reported alongside the mean.

    • Sensitivity:

      • Influenced by outliers; a single extreme score can substantially increase ( SD ).

    Worked examples (as given)

    • Dreams (10 scores; ( M=6 )):

      • ( SS = 66 ), ( SD^2 = 6.60 ), ( SD = 2.57 ).

    • Social interactions (94 scores; ( M=17.39 )):

      • ( SS = 12{,}406.44 ), ( SD^2 = 131.98 ), ( SD = 11.49 ).

    Practical tips (error-checking)

    • Deviation scores sum to ~0 (apart from rounding). If not, re-check work.

    • Do not take ( \sqrt{SS} ) directly; divide by ( N ) first to get variance, then take the square root.

    Definitional vs computational formulas

    • Definitional: directly embodies the meaning (builds intuition); recommended for learning.

    • Computational: algebraically equivalent shortcut from pre-computer era; less transparent, mainly historical interest now.

    Why variability matters in psychology

    • Core research aim: explain differences among people or conditions (e.g., stress differences explained by math experience; social interactions explained by traits like extraversion or by gender).

    • Many inferential methods partition variability to test hypotheses.

    Dividing by ( N ) vs ( N-1 )

    • Descriptive statistics for a specific sample or group: this chapter uses ( SS/N ).

    • Estimating population variance from a sample (common in research reporting and software): ( SS/(N-1) ).

    • Expect ( SD ) from software (e.g., SPSS/Excel default options) to reflect ( N-1 ) unless configured otherwise.


    Reporting in Research Articles

    Typical practice

    • Means and standard deviations are routinely reported (text or tables).

    • Medians and variances are only occasionally reported.

    • Example (social media use):

      • MySpace minutes per day mean > Facebook; higher SD indicates greater variability for MySpace.

    • Example (doctoral program applicants):

      • Means often exceed medians, signalling right-skewed distributions (few programs with very high counts).


    Controversy: “Tyranny of the Mean”

    Core argument

    • Overreliance on averages can obscure meaningful individual patterns.

    • Skinner’s critique: averaging can produce curves that represent no actual individual.

    • Qualitative and single-case traditions:

      • Emphasise in-depth observation/interview to discover categories before quantifying.

      • Blended approach advocated: explore qualitatively, then measure quantitatively.

    • Cultural concern (Jung/Von Franz):

      • “Statistical mood” may erode sense of uniqueness; numbers can dull qualitative meaning of human outcomes.


    Stereotype Threat, Equity, and Math Performance (context box)

    Key points

    • Stereotype threat: situational activation of negative group stereotypes can depress performance.

    • Evidence:

      • Informing women that men do better on a test lowers women’s scores; removing the prompt eliminates differences.

      • Similar effects for African Americans, Latinos, low-SES students.

      • Societal gender equality correlates with reduced gender gaps and more women with high mathematical talent.

    • Mechanism:

      • Stereotypes consume working memory resources; attitudes alone are insufficient protection.

    • Practical implications:

      • Reframe beliefs (“women can do math as well as men”); reduce test-threat cues; over-prepare to buffer memory demands.

      • Empowerment: quantitative literacy is an enabling skill across careers.


    Learning Aids (from the chapter’s in-text guidance)

    Tips for success

    • Know symbols before applying formulas; treat formulas as recipes.

    • Round with two more decimal places than the original data where needed.

    • Check deviation-sum ≈ 0; sanity-check magnitude (e.g., an SD larger than the whole scale is a red flag).


    Summary (concise)

    • Central tendency summarises “where” scores are (mean/median/mode); variability summarises “how spread out” they are (variance/SD).

    • Mean is foundational and widely used; median for skew/outliers or ordinal data; mode for nominal data.

    • Variance quantifies average squared spread; standard deviation puts spread back into original units.

    • Reporting convention in psychology: ( M ) and ( SD ) almost always; medians/variances sometimes.

    • Beware misleading means in skewed data; consider full distribution and alternative summaries.

    Key Ingredients for Inferential Statistics

    Overview

    • Purpose: Introduces foundational concepts for inferential statistics, bridging descriptive data and generalisation to populations.

    • Focus areas: Z scores, the normal curve, sample vs population, and probability.

    • Application: Enables psychologists to infer conclusions beyond immediate research participants.


    Z Scores

    Concept and Purpose

    • Describe where an individual score lies within a distribution.

    • Z score = number of standard deviations a score is above or below the mean.

    • Converts scores from raw units into standardised units.

    • Positive Z = above mean; Negative Z = below mean.

    Formula and Components

    • Formula:

      ( Z = \dfrac{X - M}{SD} )

      • ( X ): raw score

      • ( M ): mean

      • ( SD ): standard deviation

    • Reversed formula (to convert back):

      ( X = (Z)(SD) + M )

    Worked Examples

    • Jerome’s “morning person” score: ( X = 5 ), ( M = 3.40 ), ( SD = 1.47 ).

      ( Z = (5 - 3.40)/1.47 = +1.09 ).

    • Ashley’s score: ( X = 2 ), ( M = 3.40 ), ( SD = 1.47 ).

      ( Z = (2 - 3.40)/1.47 = -0.95 ).

    Interpretation

    • Z scores express relative position and distance from the mean in standard deviation units.

    • Facilitate comparisons across different measures.

    • Useful for identifying extreme scores or percentile positions.

    Converting Z Raw

    • From Z to raw: multiply Z by SD, then add mean.

      • e.g., ( X = 1.5(4) + 12 = 18 ).

    • From raw to Z: subtract mean, divide by SD.

    Distribution Properties of Z Scores

    • Mean of all Z scores = 0.

    • Standard deviation of all Z scores = 1.

    • Shape of distribution remains unchanged after conversion.


    The Normal Curve

    Definition

    • A theoretical, bell-shaped, symmetrical, unimodal distribution.

    • Serves as a reference model for natural and psychological variables.

    • Often referred to as the Gaussian distribution (after Gauss, originally derived by Abraham de Moivre).

    Why It Occurs in Nature

    • Random influences combine to produce values clustering around the mean.

    • The central limit theorem: many small random effects produce a normally distributed outcome.

    • Applies to biological, behavioural, and social phenomena (e.g., reaction time, IQ, height).

    Key Percentages

    • 68% of scores within ±1 SD.

    • 95% of scores within ±2 SD.

    • 99.7% of scores within ±3 SD.

      (Known as the Empirical Rule.)

    Range from Mean

    % of Scores

    ±1 SD

    68%

    ±2 SD

    95%

    ±3 SD

    99.7%

    Example:

    • IQ: ( M = 100 ), ( SD = 15 )

      • 85–115 = 68%

      • 70–130 = 95%

    Using Z Scores with the Normal Curve

    • Z scores locate raw scores on the normal curve.

    • Probabilities/percentages can be derived for any Z value.

    • Example: Z = +1 → 34% between mean and +1 SD; 16% above +1 SD.


    The Normal Curve Table (Z Table)

    Purpose

    • Lists percentages associated with each Z score.

    • Columns:

      • Z score

      • % Mean to Z

      • % in Tail

    • Symmetrical → positive and negative Z values mirror each other.

    • For Z = 0.64:

      • % Mean to Z = 23.89%

      • % in Tail = 26.11%

      • Total above mean = 50%.

    Using the Z Table

    • To find the percentage above/below a score:

      1. Convert raw → Z.

      2. Sketch distribution and shade region of interest.

      3. Estimate range using 50–34–14 rule.

      4. Lookup exact value in Z table.

      5. Add or subtract 50% as needed depending on side of mean.

    Reverse Lookup

    • To find Z from a known percentage:

      • Locate nearest % in Tail or % Mean to Z in table.

      • Read corresponding Z.

      • Example: top 30% → % in Tail = 30.15% → Z = +0.52.


    Applying the Normal Curve

    Examples

    • IQ = 125

      ( Z = (125 - 100)/15 = +1.67 ).

      Tail area = 4.75%. → 4.75% score higher.

    • IQ = 95

      ( Z = -0.33 ).

      Tail + mean = 62.93% above.

    Finding Raw Scores from Percentages

    • Top 5% → ( Z = +1.64 ), ( X = (1.64)(15) + 100 = 124.6 ).

    • Top 55% → ( Z = -0.13 ), ( X = (-0.13)(15) + 100 = 98.05 ).

    • Middle 95% → ( Z = ±1.96 ), range = 70.6–129.4.


    Samples and Populations

    Definitions

    • Population: Entire group of interest (e.g., all voters, all psychology students).

    • Sample: Subset drawn for study.

    • Purpose: Make inferences about population from sample.

    Why Use Samples

    • Entire populations impractical to test.

    • Samples allow manageable data collection and generalisation.

    Sampling Methods

    • Random selection: each individual has equal chance of inclusion (unbiased, ideal).

    • Haphazard selection: convenience-based (biased, common in psychology).

    • Cluster/multistage sampling: complex probability-based design (used in large-scale surveys).

    Polling Example (Gallup)

    • Quota sampling (used pre-1948) → bias error (Dewey vs Truman).

    • Modern polls use probability sampling to reduce bias.

    • Still face issues: nonresponse, exclusion of mobile-only users.

    Population Parameters vs Sample Statistics

    Concept

    Population

    Sample

    Mean

    μ (mu)

    M

    Standard Deviation

    σ (sigma)

    SD

    Variance

    σ²

    SD²

    • Parameters: actual (unknown) population values.

    • Statistics: computed from sample; estimates parameters.


    Probability

    Definition

    • Likelihood or proportion of a specific outcome occurring.

    • Formula:

      ( p = \dfrac{\text{successful outcomes}}{\text{total outcomes}} )

    • Range: 0 ≤ p ≤ 1.

    Interpretations

    1. Long-run (relative frequency): proportion expected over many trials.

      e.g., coin → 0.5 heads.

    2. Subjective: degree of belief or confidence (e.g., “95% sure the restaurant is open”).

    Examples

    • Coin flip: p(heads) = 1/2 = 0.5.

    • Rolling ≤3 on a die: 3/6 = 0.5.

    • Random senior from 200 students (30 seniors): 30/200 = 0.15.

    Probability Symbols

    • ( p ): probability.

    • ( p < .05 ): less than 5% chance (used in statistical significance).


    Probability and the Normal Curve

    Relationship

    • Normal curve = probability distribution.

    • Percentages under the curve = probabilities.

      • Between mean and +1 SD → p = 0.34.

      • Between -1.96 and +1.96 → p = 0.95.

    • Used to determine likelihood that a score occurs by chance.

    Probability, Samples, and Populations

    • Probabilities indicate how likely a sample’s score is drawn from a population.

    • Low-probability scores suggest the sample may come from a different population.

      • e.g., sample score = 4, population μ = 10, σ = 3 → p very low.


    Advanced: Probability Rules

    Addition Rule (“or” rule)

    • For mutually exclusive events:

      ( p(A \text{ or } B) = p(A) + p(B) )

    • e.g., roll of die: p(3 or 5) = 1/6 + 1/6 = 1/3.

    Multiplication Rule (“and” rule)

    • For independent events:

      ( p(A \text{ and } B) = p(A) \times p(B) )

    • e.g., 2 coin flips → p(2 heads) = 0.5 × 0.5 = 0.25.

    Conditional Probability

    • ( p(A|B) ): probability of A given B occurs.

    • e.g., p(woman | College A) = 0.5; p(woman | College B) = 0.6.


    Controversies

    “Is the Normal Curve Really Normal?”

    • Many real-world distributions deviate from normality.

    • Micceri (1989): none of 440 psychological measures perfectly normal.

    • Deviations due to ceiling/floor effects, skewness, kurtosis.

    • Yet traditional statistical methods remain robust under moderate deviations (Sawilowsky & Blair, 1992).

    • Nonparametric methods (distribution-free) are alternatives.

    Using Nonrandom Samples

    • Psychology often uses convenience samples (students, volunteers).

    • Justification: relationships between variables often generalise.

    • Other fields (sociology, medicine) emphasise random representativeness.

    • Example: Morgenstern et al. (2009) obesity study—checked representativeness and response rates.

    • Researchers acknowledge sampling limitations (e.g., Heyman et al., 2001).


    Research Applications

    In Articles

    • Z scores seldom reported directly.

    • Normal curve mentioned when describing score distributions.

    • Sampling methods explained in methodology (response rate, representativeness).

    • Probability shown in results as p-values (e.g., p < .05 or p < .01).


    Summary

    • Z scores standardise raw data and locate scores within a distribution.

    • Normal curve describes the ideal symmetrical distribution underlying many statistical tests.

    • Samples vs populations: inferential logic based on estimating population parameters from samples.

    • Probability quantifies uncertainty and underlies inferential reasoning.

    • Statistical significance (p < .05) arises from these probability principles.

    Correlation

    Core Concept

    • Correlation = the statistical description of the relationship between two equal-interval numerical variables.

    • Shows whether high values on X tend to occur with high values on Y (positive), or with low values on Y (negative), or no systematic pattern (zero correlation).

    • Correlation is about association — not proof of cause.

    Graphing the Relationship: Scatter Diagrams

    • Scatter plot: horizontal axis = predictor/cause variable (if known/theorised), vertical axis = outcome/criterion variable.

    • Each dot = one person’s score on both variables.

    • Roughly straight line pattern = linear correlation.

    • Shapes:

      • positive linear: up and right; highs with highs, lows with lows.

      • negative linear: down and right; highs with lows, lows with highs.

      • curvilinear: systematic but not straight (e.g., U-shape, inverted U).

      • no correlation: dots have no pattern at all.

    Linear vs Curvilinear

    • Linear = described by a single straight line.

    • Curvilinear = relationship exists but is not linear; a linear correlation coefficient will underestimate this relationship.

    • Example: idealisation vs marital satisfaction → too much idealisation reduces satisfaction (inverted U shape).

    Strength of a Correlation

    • Strength = how close dots cluster to the line.

    • Perfect = r = +1 or r = −1 (zero scatter around a line).

    • Weak = dots widely dispersed, line poorly represents pattern.

    Correlation Coefficient (r)

    • r = numerical index of linear correlation (range −1 to +1).

    • Sign indicates direction (+ positive, − negative).

    • Magnitude indicates strength (absolute size).

    • Computed by converting raw scores to Z scores, computing cross-products, summing them, dividing by N.

    Significance Testing for Correlation

    • Null hypothesis: true population correlation = 0.

    • r is transformed to a t-value to test significance.

    • df = N − 2.

    • Same logic as other inferential tests.

    Correlation ≠ Causation

    • Three causality directions always possible:

      • X causes Y

      • Y causes X

      • third variable causes both

    • Only a true experiment (random assignment) can rule out alternative directions.

    Statistical vs Research Design “Correlation”

    • “Correlation” as a statistic = Pearson r (formula).

    • “Correlational research” = any non-experimental study (surveys, observation etc). A correlational study can be analysed with or without Pearson r.

    Interpretation Issues

    r²: Proportion of Variance Explained

    • Square of r = proportionate reduction in error / proportion of variance accounted for.

    • A correlation twice as large in r is not twice as strong in r².

    Restriction of Range

    • Using limited range on one variable suppresses the correlation.

    • Example: only testing high scorers on an aptitude test → relationship to job performance seems artificially weak.

    Unreliability of Measurement

    • Noise/inaccuracy reduces r.

    • If measures are poor, true correlation will be underestimated.

    Outliers

    • Single extreme X/Y combination can massively distort r.

    • Outlier can change r from large to near zero or even switch sign.

    Curvilinear Solutions: Spearman’s rho

    • If relationship is monotonic but not linear, convert to ranks → compute Spearman rho.

    • Less sensitive to outliers.

    • Does not require equal interval measurement.

    Effect Size and Power

    • Cohen guidance: r = .10 small; r = .30 medium; r = .50 large.

    • But these conventions are contested.

    • Power depends on expected effect size; moderate correlations require sizeable N to achieve .80 power.

    Real-World Importance of Small r

    • Even very small correlations can have major real practical consequences if event is serious or sample is huge (e.g., aspirin and heart attack reduction; r = −.034 but huge real effects on mortality).

    Correlation Matrices

    • Common format in journal articles.

    • Table format presenting correlations between multiple variables, often only lower triangle shown.

    • Asterisks indicate statistical significance.

    Intro to Hypothesis Testing

    Core Definition and Purpose

    • Hypothesis testing = formal procedure to decide if sample results support a theory / innovation about a population.

    • Hypothesis = prediction derived from observation, prior research, or theory.

    • Theory = explanatory principles about psychological processes from which specific predictions are derived.

    • Essential frame: sample → inference → population.

    Cognitive Difficulty

    • Logic is counter-intuitive — requires multiple new abstractions at once.

    • Psychological research relies on this logic for almost all inferential conclusions.


    Core Logic

    • Ask: how likely is the result if the opposite of what we predict is actually true?

    • If that probability is sufficiently low, reject the null hypothesis.

    • If rejecting the null, we indirectly support the research hypothesis.

    • If not extreme enough, outcome is inconclusive, not support for the null.


    Key Terms

    • Research hypothesis (alternative): specifies predicted relationship/difference between populations.

    • Null hypothesis: predicts no difference or the opposite direction.

    • Comparison distribution: the reference population distribution if the null is true.

    • Cutoff sample score (critical value): threshold Z value beyond which result is too unlikely under null.

    • Statistically significant: outcome extreme enough to reject null under chosen significance level.


    Five Steps of Hypothesis Testing

    • Restate question as research + null hypothesis about populations.

    • Determine comparison distribution characteristics (mean, SD, shape).

    • Determine cutoff sample score (Z critical) for chosen significance level.

    • Determine sample score on comparison distribution (compute Z).

    • Decide: reject or fail to reject null.


    One-Tailed vs Two-Tailed

    • Directional hypothesis = one-tailed test.

      • region of rejection only in predicted direction.

      • less extreme critical cutoff.

      • downside: if effect occurs in opposite direction, cannot call significant.

    • Nondirectional hypothesis = two-tailed test.

      • tests for extreme results in both directions.

      • more conservative (critical values more extreme).

    • Convention: most researchers default to two-tailed unless explicitly justified otherwise.


    Significance Levels

    • Conventional α levels: .05 and .01.

    • One-tailed .05 = Z crit ±1.64.

    • Two-tailed .05 = Z crit ±1.96.

    • One-tailed .01 = Z crit ±2.33.

    • Two-tailed .01 = Z crit ±2.58.


    Interpretation Boundaries

    • Reject null → results are statistically significant; supports research hypothesis.

    • Fail to reject → results are inconclusive; cannot claim null is true.

    • Words “prove” or “true” must not be used about research findings.


    Examples

    • Baby vitamin example:

      • prediction: earlier walking.

      • cutoff (5%): Z ≤ −2.

      • sample baby walked at 6 months: Z = −2.67 → reject null.

    • $10 million happiness example:

      • prediction: happier.

      • cutoff (5%): Z ≥ +1.64.

      • sample Z = +1 → not enough → inconclusive.


    Controversy: Banning Significance Tests

    • Critics argue logical problems, misuse, misinterpretation, arbitrary cutoffs.

    • Bayesian methods can compute odds of H1 vs H0 directly (Bayes factor).

    • Bayes factor shows p values between .05 and .01 often provide weak evidence.

    • Consensus to date: keep significance tests but use carefully.

    • APA: significance tests remain allowed but must not be misused.


    Reporting in Research Articles

    • Authors typically report significance, test statistic symbol (t, F, χ²), and p level.

    • Tables often mark significant differences with asterisks.

    • Two-tailed is assumed unless stated otherwise.

    • Increasingly common to report exact p values (e.g., p = .03).


    Summary Concept

    • Hypothesis testing = rejection logic.

    • We support theories by showing data are unlikely under the “no effect” assumption.

    • Statistical significance ≠ proof.

    • Failure to reject null ≠ evidence null is true.

    Intro to Hypothesis Testing

    Core Definition and Purpose

    • Hypothesis testing = formal procedure to decide if sample results support a theory / innovation about a population.

    • Hypothesis = prediction derived from observation, prior research, or theory.

    • Theory = explanatory principles about psychological processes from which specific predictions are derived.

    • Essential frame: sample → inference → population.

    Cognitive Difficulty

    • Logic is counter-intuitive — requires multiple new abstractions at once.

    • Psychological research relies on this logic for almost all inferential conclusions.


    Core Logic

    • Ask: how likely is the result if the opposite of what we predict is actually true?

    • If that probability is sufficiently low, reject the null hypothesis.

    • If rejecting the null, we indirectly support the research hypothesis.

    • If not extreme enough, outcome is inconclusive, not support for the null.


    Key Terms

    • Research hypothesis (alternative): specifies predicted relationship/difference between populations.

    • Null hypothesis: predicts no difference or the opposite direction.

    • Comparison distribution: the reference population distribution if the null is true.

    • Cutoff sample score (critical value): threshold Z value beyond which result is too unlikely under null.

    • Statistically significant: outcome extreme enough to reject null under chosen significance level.


    Five Steps of Hypothesis Testing

    • Restate question as research + null hypothesis about populations.

    • Determine comparison distribution characteristics (mean, SD, shape).

    • Determine cutoff sample score (Z critical) for chosen significance level.

    • Determine sample score on comparison distribution (compute Z).

    • Decide: reject or fail to reject null.


    One-Tailed vs Two-Tailed

    • Directional hypothesis = one-tailed test.

      • region of rejection only in predicted direction.

      • less extreme critical cutoff.

      • downside: if effect occurs in opposite direction, cannot call significant.

    • Nondirectional hypothesis = two-tailed test.

      • tests for extreme results in both directions.

      • more conservative (critical values more extreme).

    • Convention: most researchers default to two-tailed unless explicitly justified otherwise.


    Significance Levels

    • Conventional α levels: .05 and .01.

    • One-tailed .05 = Z crit ±1.64.

    • Two-tailed .05 = Z crit ±1.96.

    • One-tailed .01 = Z crit ±2.33.

    • Two-tailed .01 = Z crit ±2.58.


    Interpretation Boundaries

    • Reject null → results are statistically significant; supports research hypothesis.

    • Fail to reject → results are inconclusive; cannot claim null is true.

    • Words “prove” or “true” must not be used about research findings.


    Examples

    • Baby vitamin example:

      • prediction: earlier walking.

      • cutoff (5%): Z ≤ −2.

      • sample baby walked at 6 months: Z = −2.67 → reject null.

    • $10 million happiness example:

      • prediction: happier.

      • cutoff (5%): Z ≥ +1.64.

      • sample Z = +1 → not enough → inconclusive.


    Controversy: Banning Significance Tests

    • Critics argue logical problems, misuse, misinterpretation, arbitrary cutoffs.

    • Bayesian methods can compute odds of H1 vs H0 directly (Bayes factor).

    • Bayes factor shows p values between .05 and .01 often provide weak evidence.

    • Consensus to date: keep significance tests but use carefully.

    • APA: significance tests remain allowed but must not be misused.


    Reporting in Research Articles

    • Authors typically report significance, test statistic symbol (t, F, χ²), and p level.

    • Tables often mark significant differences with asterisks.

    • Two-tailed is assumed unless stated otherwise.

    • Increasingly common to report exact p values (e.g., p = .03).


    Summary Concept

    • Hypothesis testing = rejection logic.

    • We support theories by showing data are unlikely under the “no effect” assumption.

    • Statistical significance ≠ proof.

    • Failure to reject null ≠ evidence null is true.

    Making Sense of Statistical Significance & Power

    Decision errors in hypothesis testing

    • Decision errors happen because we infer about populations from samples; the procedure limits but cannot eliminate error.

    • Two error types:

      • Type I error (α): Rejecting the null when it is actually true; its probability equals the significance level you set. Researchers sometimes lower α (e.g., .001) to reduce this risk, but there is a cost.

      • Type II error (β): Failing to reject the null when the research hypothesis is actually true; practically worrying because effective interventions might be missed.

    • You can’t know post-hoc that you made either error; significance levels act as “insurance,” but pushing α very low increases the risk of Type II error.

    • Trade-off: Protecting against one error type usually raises the other; standard compromises are α = .05 or .01.

    Alpha, beta, and terminology

    • α (alpha): Probability of a Type I error; same as the test’s significance level.

    • β (beta): Probability of a Type II error.

    Statistical power

    • Definition: Probability of correctly rejecting the null when the research hypothesis is true; equivalently, 1 − β. (Implied in the Type II section and power diagrams.)

    • Determinants:

      • Effect size / population variability: Smaller population SD (larger effect size) narrows the distribution of means → less overlap → higher power.

      • Sample size (N): Larger N narrows the distribution of means independently of effect size → higher power. Sample size affects power but is separate from effect size.

      • Alpha level: More lenient α (e.g., .10 or .20) increases power but raises Type I risk; stricter α reduces power but protects against Type I error.

    • Illustration: Increasing N (e.g., from 64 to 500) can raise power dramatically (e.g., from ~37% to ~99%) by shrinking the standard error and reducing curve overlap.

    Planning studies with power

    • Why plan for power: Too-low power makes significance unlikely even when the effect is real; researchers should avoid underpowered designs.

    • Figuring required N: Start from a target power (commonly 80%), expected effect size (or mean difference) and known/assumed SD to compute N. Example: with predicted mean difference of 8 and SD = 48, about N = 222 achieves 80% power (single-sample mean scenario).

    • In practice, researchers use power software or online calculators to invert the power steps and solve for N.

    Practical guidance

    • Choosing α: Use conventional .05 unless there’s a strong reason to shift; lowering α (e.g., .001) buys Type I protection at the cost of more Type II errors.

    • When Type II risk matters most: Applied settings (e.g., clinical) where missing a real effect means withholding a beneficial treatment.

    • Do not conflate effect size with sample size; both influence power, but via different mechanisms and imply different design choices.

    Introduction to t-tests — core idea

    • Up until now, hypothesis testing examples assumed you knew the population variance.

    • In real psychology research, you almost never know the population variance.

    • You usually have only samples.

    • Solution = t tests.


    When do you use a t test?

    • Use a Z test when population variance is known (rare).

    • Use a t test when population variance is unknown (normal/real research case).

    • Most psychological research = comparing two sets of scores with unknown variance.


    Two main t-test types in this chapter

    • Single sample t test: 1 sample mean vs a known population mean (population variance unknown).

    • Dependent means t test: same people measured twice → create difference scores → test mean difference ≠ 0.


    Single sample t test — why, and what changes?

    • Same logic as Z test.

    • Two new complications:

      • Must estimate population variance from sample → use unbiased estimate (divide by N−1).

      • Once you estimate variance → distribution is NOT normal anymore → becomes t distribution.


    Unbiased estimate + degrees of freedom

    • Sample variance underestimates population variance.

    • Correction: divide SS by N − 1 → this denominator = df.

    • df = scores free to vary once you fix the sample mean → N − 1.


    Distribution of means under t

    • Get estimated population variance → divide by N → get variance of distribution of means.

    • Square root = standard deviation of distribution of means (S_M).

    • Shape = t distribution with df = N − 1.

    • t distribution = normal-like but heavier tails. More extreme scores.

    • Lower df = heavier tails → more extreme cutoffs.

    • As df → ∞, t approaches normal.


    Testing with t — same 5 steps, one change

    • Computationally identical to Z procedure, but:

      • cutoff comes from t table

      • sample test statistic is t not z

    • t = (M_sample − μ_comparison) / S_M


    Why dependent means t test exists

    • Real research design commonly uses repeated measures:

      • before vs after therapy

      • beloved vs neutral face in fMRI

    • Two scores per participant → create difference score for each participant.

    • Then perform a single sample t test on the difference scores.

    • Null mean for difference scores = 0.


    When to use dependent means t test

    • Unknown population variance.

    • Repeated measures / paired samples / matched pairs.

    • Use difference scores.

    • Population 2 mean assumed = 0 (no change baseline).


    Repeated measures = higher power

    • Because individual differences cancel out within person.

    • SD of difference scores tends to be smaller.

    • Smaller denominator → bigger effect size d → more power.

    • BUT repeated measures without a control group is methodologically weak (confounds: maturation, time effects, practice, attrition).


    Reporting in articles

    • Standard format: t(df) = value, p = …

    • One-tailed or two-tailed reported if relevant.

    • Means reported for each condition.

    t test for independent means — core purpose

    • Use when comparing two separate groups of people.

    • Population variances unknown → must estimate → therefore use t not z.

    • Common design in psychology: experimental vs control groups.


    comparison distribution = distribution of differences between means

    • NOT difference scores (dependent means = difference scores).

    • Independent means = 2 separate samples → focus is mean₁ – mean₂.

    • Logic: build 2 distributions of means → then build distribution of the difference between those means.


    mean of this comparison distribution

    • If H₀ is true → μ₁ = μ₂ → difference between means = 0.

    • Therefore mean of the distribution of differences between means = 0.


    estimating population variance

    • Must assume equal population variances (homogeneity of variance).

    • Estimate variance separately for each sample.

    • Combine them → pooled variance (weighted average by df).


    pooled variance

    • More weight goes to the sample with larger df.

    • Pooled variance MUST lie between the two individual estimates.


    variance + SD of distribution of differences between means

    • Step 1: pooled variance.

    • Step 2: divide by N₁ → variance of distribution of means₁.

    • Step 3: divide by N₂ → variance of distribution of means₂.

    • Step 4: add those two → variance of distribution of difference.

    • Step 5: square root → SD of distribution of difference.


    df for t test for independent means

    • df_total = df₁ + df₂

    • df₁ = N₁ − 1

    • df₂ = N₂ − 1


    test statistic

    • t = (M₁ − M₂)/SD_difference


    assumptions

    • Each population = approx normal (t robust to moderate violation).

    • Population variances assumed equal (homogeneity).

    • Scores must be independent — no matching, no clustering, no nesting.


    effect size (independent means)

    • d = (μ₁ − μ₂)/σ (planned)

    • estimated d after study = (M₁ − M₂)/S_pooled


    power (independent means)

    • Same concept, but power depends on:

      • effect size

      • α

      • N

      • equal N gives more power than unequal N.

    • If N unequal → use harmonic mean to approximate power equivalence.


    the “too many t tests” problem

    • Running many t tests inflates Type I error dramatically.

    • Five significant results out of 100 comparisons = expected by chance.

    • Issue is real; solutions exist (e.g., Bonferroni, etc.).


    how it appears in articles

    • Usually: M, SD reported for each group + t(df) = value, p < .05


    table-level conceptual comparison

    • single sample t → compare 1 sample mean to known μ

    • dependent means t → compare same people twice (difference scores)

    • independent means t → compare 2 groups of different people