T3 Notes
Exam logistics
- Mid-semester exam date and format
- Saturday, September 6 at 2\ \text{PM}.
- Fully invigilated and in person.
- Covers content from weeks 1–4 (all material covered thus far, including lectures and textbook readings assigned in weeks 1–4).
- Tutorial content will not be assessed; only lecture content and assigned textbook readings matter.
- Worth 20\% of the overall course grade.
- Exam format: 40 multiple-choice questions (MCQs); each MCQ contributes 0.5\% to the final grade.
Case study overview (research design exercise)
- Study design overview (as described in the session):
- Participants divided into two relationship-status groups:
- Newly coupled participants (in a new relationship).
- Continually coupled participants (medium-to-long term relationships).
- Measures collected:
- Participants rated their actual partner on four traits: physical attractiveness, vitality, status/resources, and warmth/trustworthiness.
- Participants also rated their ideal partner (their preference) on the same four traits.
- Discrepancy score for each participant:
- Reflects the difference between how they rated their actual partner and how they rated their ideal partner.
- Popularly framed as: D = Rating{actual} - Rating{ideal} (magnitude |D| reflects discrepancy).
- Main finding: discrepancy scores were on average lower in continually coupled participants than in newly coupled participants.
- Interpretation given: people in longer relationships calibrate their ideal partner to be closer to their actual partner; relationship status affects ideal partner preferences.
- Baseline ratings of actual partners were similar between the two groups, meaning the difference lay in the ideals, not the actuals.
- Implication: over time, people’s preferences may recalibrate to match their current partner’s traits.
Critical-thinking prompts (group discussion and critique)
- Task given to students:
- For about 10 minutes at the table (or individually): identify sources of systematic and nonsystematic variability in this study.
- Propose alternative explanations for the results.
- If flaws are identified, discuss possible redesigns to obtain more robust conclusions.
- Example points raised in discussion (from the transcript):
- Systematic variability:
- Group assignment ( continually vs newly coupled ) is not time-defined; variability in group composition beyond simple status (e.g., years together) could confound results.
- Nonsystematic variability:
- What participants actually value may be influenced by momentary mood, context, or recent experiences rather than stable trait preferences.
- Alternative explanations discussed:
- Survivorship bias: those whose partners already closely match their ideals may be more likely to stay together and thus be in the continually coupled group.
- Cognitive dissonance or justification effects: long-term partners might adjust their reported ideals to maintain consistency with their relationship, or to rationalize staying in the relationship.
- Age effects: older participants in longer relationships might have different, potentially more realistic, preferences.
- Regression toward the mean or measurement noise in trait ratings.
- Redesign ideas proposed:
- Add a third group (e.g., long-term but not extremely long-term) or measure duration as a continuous variable (years together) and analyze with regression.
- A longitudinal design: recruit new couples, obtain baseline discrepancy (actual vs ideal) at dating onset, and follow over time to observe how discrepancy changes and whether initial discrepancy predicts relationship continuity.
- Longitudinal design would help address survivorship bias and test whether lower baseline discrepancy predicts relationship persistence, while also examining whether preferences recalibrate over time.
- Instructor’s reflections (summarized):
- Reiterated main alternative explanations (e.g., survivorship bias, cognitive dissonance, age effects).
- Emphasized that the study’s design cannot conclusively determine whether preferences shift due to relationship status or due to preexisting alignment between partner and self.
- Suggested longitudinal study as the best approach to disentangle these factors, acknowledging time and cost barriers in practice.
Constructs, variables, and operationalization (conceptual groundwork)
Core idea: constructs vs variables
- Construct: abstract concept (e.g., cognitive flexibility).
- Operationalization: concrete measurement of the construct as a variable (e.g., reaction times in a task-switching paradigm).
- Example from lecture: cognitive flexibility in older adults measured via a task-switching paradigm; construct = cognitive flexibility; operationalized as reaction times on switch vs non-switch trials.
- Importance of clarity in operationalization: enables replication and ensures readers understand exactly what was measured and how.
- There can be multiple valid ways to operationalize a construct; no single “right” method.
Scales of measurement (classification of variables)
Four primary scales with key features and examples:
- Nominal scale:
- Definition: Categories without any mathematical ordering or relationship.
- Example: Type of drug in a trial (paracetamol vs placebo).
- Other examples: Ethnicity categories; eye color (brown, blue, green).
- Mathematical implication: no meaningful arithmetic operations between categories.
- Ordinal scale:
- Definition: Ordered categories, but intervals between adjacent categories are not guaranteed to be equal.
- Example: Finishing position in a race (1st, 2nd, 3rd, …) or Likert-type scales (strongly disagree to strongly agree).
- Important nuance: treated as ordinal in theory, but often analyzed as interval in practice due to numeric labeling.
- Interval scale:
- Definition: Ordered with equal intervals between categories, but no true zero point.
- Example: Dates on a calendar (differences are meaningful; e.g., 2020 vs 2021 shows a 1-year interval; but there is no true zero year).
- Ratio scale:
- Definition: Ordered with equal intervals and a true zero point, allowing meaningful ratios.
- Examples: Dosage of a drug (0 mg means no drug), reaction times (0 ms means no time taken).
Special notes and nuanced examples from the lecture:
- Time-of-day as a variable can be represented across different scales depending on measurement choice:
- Nominal: day vs night.
- Ordinal: order of times (dawn, noon, afternoon, evening).
- Interval: standard 12-hour clock without absolute zero.
- Ratio: 24-hour time with midnight as a true zero point, enabling meaningful ratios (e.g., 14:00 is twice as far from midnight as 07:00).
- Eye color could theoretically be treated as a ratio measure if one used a continuous brightness metric, though typically it is treated as nominal.
- Socioeconomic status can be nominal or ordinal depending on how it’s measured (income vs a ranked status ladder).
- Self-reported happiness and other Likert-type scales are typically treated as ordinal in theory, though often analyzed as interval in practice.
- IQ scores and age are nuanced: IQ is often treated as at least ordinal and sometimes interval, but has interpretive complexity (non-uniform intervals across the scale; validity of zero is not clear); age generally treated as ratio but can be binned to yield ordinal or interval representations.
- Temperature example: Celsius is interval; Kelvin is ratio; both measure temperature but differ in zero-point interpretation.
- Final note on measurement: the same construct can be represented differently across scales; responsible researchers choose the scale that best fits the study design and analysis.
Validity and reliability (quality of measurements)
Validity: the extent to which a measurement actually measures what it claims to measure.
- Internal validity: how well conclusions about relationships can be drawn from the study design (focus on confounds and nuisance variance).
- External (ecological) validity: generalizability of results beyond the current setting or sample.
- Construct validity: whether the test actually measures the intended construct (strongly related to the adequacy of the construct measurement).
- Predictive validity: whether scores on a measurement co-vary with a future criterion it should predict.
- Content/face validity: whether the test appears to measure the intended construct; often considered the least critical form of validity.
Concrete examples from meditation/anxiety study to illustrate validity categories:
- Internal validity (good): Random assignment to a meditation vs control group would reduce systematic differences between groups; absence of confounds would strengthen causal inferences.
- Internal validity (poor): Self-selection into the meditation group (volunteering based on motivation) could confound results.
- External validity (good): Include participants from multiple schools, ages, prior meditation experiences.
- External validity (poor): Include only first-year psychology students from a single university.
- Construct validity (good): Use multiple indicators of anxiety (self-report, behavioral observations, physiological measures like heart rate or skin conductance).
- Construct validity (poor): Rely on a single self-report item to assess anxiety.
- Predictive validity (good): Reductions in anxiety scores predict fewer real-life anxiety symptoms later (panic attacks, etc.).
- Predictive validity (poor): Short-term anxiety reductions do not translate into real-world improvements over time.
- Face validity (good): A measurement that clearly asks about anxiety (e.g., “How anxious do you feel right now?”) is easy for participants to interpret.
- Face validity (poor): Using obscure physiological measures that participants don’t associate with anxiety may reduce perceived relevance, though it could be useful in some deception-prone contexts.
- Practical note: Face validity is often less important than construct validity; it can be strategically manipulated in some contexts (e.g., to prevent participants from altering responses).
Reliability (stability/consistency of a measure):
- Test-retest reliability: administer the same measure more than once and assess whether results correlate across administrations.
- Inter-rater reliability: different raters’ scores correlate; essential when observations are involved.
- Good reliability examples:
- Using a well-validated, standardized measure (e.g., Beck Depression Inventory) with known test-retest reliability.
- Trained, standardized observers rating aggressive behaviors with detailed criteria.
- Poor reliability examples:
- A newly developed depression inventory with highly variable scores from day to day due to mood, fatigue, or time of day.
- Untrained, unstandardized raters giving divergent scores because of personal bias.
- Conceptual relationship between validity and reliability:
- It is possible to have good reliability but poor validity (consistent but not measuring the intended construct).
- It is uncommon to have good validity with poor reliability; if reliability is near zero, there is essentially no meaningful information to rely on (like measuring with a spaghetti tape measure).
Methodological notes about reporting and interpretation:
- The method section should be written in past tense because the study has been completed.
- Distinguish between validity concepts and reliability concepts when evaluating a study.
Method section structure (how to write up a study)
- Four subsections, with suggested order and core content:
- Participants (must be included and described first)
- Include total number of participants (n), who they were, and how they were selected.
- Describe participation incentives (voluntary, paid, or other), and relevant demographic variables (age, gender, etc.).
- Example template (one-paragraph):
- "Participants n = 57, 40 female, 14 male, 3 non-binary, were undergraduates involved in three small sections of a third-year biology course at the University of Western Australia. They participated voluntarily; ages ranged from 18 to 32, with a mean and standard deviation recorded. Gender breakdown and other demographics reported as counts or percentages."
- Practical formatting note: write numbers that begin a sentence as words; numbers 10 and above used as numerals, except in special cases; mean and standard deviation use numerals.
- Design (structure of the study)
- State the overall design: experimental vs observational; independent groups vs repeated measures; cross-sectional vs longitudinal; etc.
- For the example in the transcript: observational, correlational; not experimental; not cross-sectional; no grouping by preexisting criteria beyond natural group status.
- Name the key constructs (e.g., conscientiousness, anxiety) and describe how you operationalize them (e.g., Big Five conscientiousness subscale; summed item scores).
- A concise example template: "Participants were assessed in a two-condition setup with repeated measures; construct X measured via Y; Z measured via W."
- Procedure
- Detail step-by-step how the study was carried out, including who administered it, participant instructions, and data collection method.
- Example from the transcript: online questionnaire completed in tutoring class; instructions provided by tutor; written task instructions; response sheets submitted; debriefing provided.
- Emphasize replication: the write-up should allow exact replication from the description.
- Materials
- Describe each scale used for the study (even those not central to the hypotheses); include number of items per scale and per subscale; provide at least one example item per scale (not per subscale).
- Note about reverse scoring where applicable and why it matters.
- Mention that readers should be directed to the appendix for full scale details (scales used, items, etc.).
- Provide a concrete example item for each scale used in the study (one per scale; not necessarily for every subscale).
- Practical tip: for this course’s assignment, you should describe every scale used, even those not analyzed; this is more exhaustive than a typical published paper.
- General guidance and style notes:
- Use past tense consistently.
- Keep each subsection concise (often a single paragraph per subsection).
- The four subsections together should provide enough detail for a reader to replicate the study exactly, including the specific measures and procedures used.
- The Materials subsection will typically be longer than the other sections because it includes descriptions of multiple scales and example items.
- An appendix section may include full scales, item-level details, and scoring rules; refer readers there from the Materials subsection.
Practical examples and templates (quick-reference for writing)
- Participant example template (one-paragraph):
- "Participants n = 57, 40 female, 14 male, 3 non-binary, were undergraduates enrolled in three sections of a third-year biology course at the University of Western Australia. Participation was voluntary. Ages ranged from 18 to 32 (M = 25.3, SD = 3.7). Gender breakdown reported as counts."
- Design example (one-paragraph):
- "Design: observational, correlational. No experimental manipulation. The study examined relationships between variables X and Y within a single sample. Constructs: X was measured by [scale or items], Y by [scale or items]."
- Procedure example (one-paragraph):
- "Procedures: participants completed online questionnaires in their first tutorial class. The tutor provided instructions; written task prompts were supplied; responses were collected via a digital response sheet; debriefing followed the session."
- Materials example (one-page overview, longer for real study):
- "Scales administered included Scale A (XX items; Cronbach’s α = .XX), Scale B (YY items; Cronbach’s α = .YY), and Scale C (ZZ items; Cronbach’s α = .ZZ). One sample item per scale: A: 'I feel [construct-related item]'; B: '[item]'; C: '[item]'. Half of the items on Scale A and Scale B were reverse-scored. Scores were summed to yield a total for each scale. Appendix A lists all items and subscales."
Closing notes and next steps
- The instructor’s closing remark: next week will cover data analysis in Excel and calculating the standard deviation.
- Key takeaway: the upcoming session will build practical data analysis skills to complement the conceptual foundations reviewed here.