Comprehensive Notes on Research Methods in Relationship Science

Data collection in Relationship Science

  • Focus on observable behavior and interaction as data sources
  • Examples of observable data include:
    • Facial expressions
    • Who is speaking and what they’re saying
    • How they’re saying it (tone, prosody)
  • Observational coding schemes (highly structured coding of behavior) used to quantify interactions
  • Example study approach described:
    • Bring couples into a lab or natural setting
    • Prime them to discuss a typical topic
    • Record the interaction for later coding
    • Code movement of face, tone of voice, and interaction dynamics
    • Analyze how participants respond to each other in real time (e.g., face changes, snapping back, calming the conflict)
  • Observational methods provide a rich, contextual picture but are labor-intensive and costly
  • Two broad data collection approaches discussed:
    • Single-instance observational data
    • Repeated observations or longitudinal designs
  • Example: measure a couple before and after a relationship-education program using the same observational method
  • Contrast with experience sampling (ESM):
    • Participants carry a device that records short verbal samples when prompted
    • Records snippets of lived experiences throughout the day, then data are coded
  • Purpose of ES M is to capture data that matches the research question (e.g., college students’ conversations after a big football game)
  • Preschool observational work described:
    • Some early pilot studies described as watching many children for short periods (e.g., 10 seconds per child) and collecting thousands of data points to measure sociability
  • Observational data are typically naturalistic and short bursts; can be conducted in lab settings too
  • Common questions about observational data:
    • Is a single observation enough, or should we collect multiple sessions?
    • How does study design (lab vs natural setting) influence findings?

Potential downsides and limitations of observational data

  • Reactivity effect (Hawthorne effect): participants may behave differently because they know they’re being observed
  • Narrow window: a single observation may not generalize to typical behavior
  • Internal observation limits: data are limited to observable behaviors; private states may be inferred but are less direct
  • High cost and effort: training, reliability checks, and coding time are substantial
  • Reliability challenges:
    • Raters need consistent coding across observers
    • Requires rigorous training and calibration to reach acceptable reliability
    • Time and cost escalate with larger samples
  • Mitigation strategies:
    • Mixed methods: combine observational data with self-report data
    • Use standardized coding schemes and training materials
    • Use multiple observational instances to increase reliability
  • Technological considerations:
    • Video/audio recording equipment, secure storage, and data management
    • Synchronization of multiple data streams (e.g., video plus audio transcripts)
  • Ethical considerations in observational studies discussed within Belmont Report framework (see Ethics section)

Experience Sampling (ESM) in relationship science

  • ESM samples real-lived experiences across time rather than relying on a single observation
  • Typical setup:
    • Portable recording device (e.g., handheld recorder or smartphone app)
    • Automatic prompts or user-initiated recordings
    • Data captured only when the participant starts talking; device then stops
  • Advantages:
    • Captures variability across moods, contexts, conflicts, and interactions
    • Reduces reliance on retrospective reporting
  • Limitations:
    • Participant burden and potential nonresponse bias
    • Data are momentary and require careful interpretation
  • Examples:
    • After a significant social event (e.g., football game), what topics arise and how do partners interact?
    • Across a school day, what interactions occur with peers or partners?

Other data collection modalities

  • Self-report data (surveys, questionnaires): prompts participants to report thoughts, feelings, behaviors
  • Physiological data: body-based indicators of relationship processes
    • Heart rate, respiration, cortisol (stress hormone), skin conductance (sweat)
    • Brain activity (e.g., EEG/other measures) in some studies
    • Hormone levels can be measured non-invasively (e.g., saliva cortisol)
  • Archival and secondary data: using existing data sources
    • Public records (marriage/divorce decrees, birth records)
    • Health records, governmental datasets
    • Online data (social media posts, forums, Reddit) to infer relationship processes
  • Example of archival/social-media data use:
    • Analyzing why people call off marriages by coding Reddit posts without directly asking participants

Key distinctions among data types

  • Natural environments vs lab settings: data can be collected in either; ecological validity often higher in natural settings
  • Single-instance vs repeated measures: longitudinal data allow observing change over time
  • Depth vs breadth: observational coding yields rich, detailed data; self-report can cover broader constructs
  • Inter-method fit: researchers often combine data types (mixed methods) to address limitations of any single approach

Reliability in relationship science data

  • Reliability refers to consistency of measurement
  • Three forms highlighted:
    • Internal reliability (internal consistency): consistency of responses within a measure
    • External reliability (test-retest reliability): stability of scores across time or conditions
    • Inter-rater reliability: consistency between different coders or raters

Internal reliability

  • Focused on how consistently multiple items measure the same construct
  • Example: a depression scale with 10 items
  • Internal consistency means respondents answer related items in a coherent way
  • Common statistic: Cronbach’s alpha, α=NN1(1<em>i=1Nσ</em>i2σT2)\alpha = \frac{N}{N-1}\left(1 - \frac{\sum<em>{i=1}^N \sigma</em>i^2}{\sigma_T^2}\right)
    • N = number of items
    • \sigma_i^2 = variance of item i
    • \sigma_T^2 = variance of the total score
  • High internal reliability suggests items cohere well; not sufficient alone for validity

External reliability (test-retest)

  • Measures stability of scores over time
  • Approach: administer the same measure to the same participants at two or more time points
  • Ideal outcome: scores remain reasonably stable when the underlying construct is stable
  • Note: some constructs (e.g., mood) may legitimately change over time; interpretation depends on construct

Inter-rater reliability

  • Important for observational and coding data
  • Gauges whether different raters code the same behavior similarly
  • Common statistics:
    • Cohen’s kappa: κ=P<em>oP</em>e1P<em>e\kappa = \frac{P<em>o - P</em>e}{1 - P<em>e} where Po is observed agreement and P_e is chance agreement
    • Intraclass correlation (ICC) for continuous ratings
  • Achieving high inter-rater reliability requires training, calibration, and clear coding manuals

Validity in relationship science data

  • Validity concerns whether a measure accurately captures what it is intended to measure
  • Three forms highlighted:
    • Convergent validity: different measures of the same construct yield similar results
    • Divergent (discriminant) validity: measures of different constructs do not highly correlate
    • Face validity: the measure appears to assess the intended construct at face value

Convergent validity

  • If two separate depression scales yield high correlation for the same individuals, they demonstrate convergent validity
  • Conceptual idea: different methods converge on the same underlying construct

Divergent validity

  • Measures of related but distinct constructs should not be perfectly correlated
  • Example: depression vs. relationship satisfaction should not perfectly track together, though some overlap may exist

Face validity

  • The items or tasks should seem appropriate for the construct being measured
  • Poor face validity example: measuring depression with only questions about eating habits
  • Face validity matters for participant understanding and engagement, but it is not sufficient for overall validity

Reliability vs validity relationship

  • Reliability is a prerequisite for validity: you cannot have a valid measure if it is not reliable
  • A measure can be reliable but not valid (consistently wrong)
  • A measure cannot be valid if it is not reliable
  • Example visuals (described conceptually):
    • Highly reliable but far off target (consistent but biased)
    • On-target but inconsistent (accurate on average but noisy)
    • Neither reliable nor valid (scattered and off-target)
  • Practical takeaway: ensure both reliability and validity when interpreting data

Ethics and responsible research with human participants

  • Research with humans requires ethical consideration and oversight
  • Belmont Report (1979): foundational document outlining ethical principles for human subjects research
    • Respect for persons: acknowledge autonomy; protect those with diminished autonomy; informed consent
    • Beneficence: do not harm; maximize benefits; minimize risks
    • Justice: fairness in distribution of research burdens and benefits
  • Institutional Review Board (IRB): independent group that evaluates proposed research to protect participants
    • If a study poses risk or ethical concerns, IRB can require modifications or reject the study
  • Historical caution: Tuskegee syphilis study highlighted the dangers of unethical research and the need for IRB oversight
  • Informed consent: written and/or verbal documentation detailing a study’s purpose, procedures, risks, benefits, duration, compensation, data handling, and right to withdraw
  • Participant rights: voluntary participation, right to withdraw at any time without penalty, ability to ask questions and receive answers
  • Ethical conduct in practice:
    • Be transparent about study goals and potential harm
    • Ensure fair treatment and respect for participants
    • Avoid coercion and undue influence
    • Protect privacy and data confidentiality
    • Provide debriefing and resources if discussing sensitive topics

Data quality, limitations, and responsible interpretation

  • Data quality matters: poor data lead to incorrect conclusions
  • Data cleaning: process of identifying and correcting or removing noisy or erroneous data
    • Examples: misreporting (e.g., “three kids” but only two checked), duplicated records, missing values
  • Limitations and transparency:
    • All studies have limitations; good papers acknowledge them and outline their impact
    • Be cautious about overgeneralizing from single studies or small samples
  • Being a good consumer of research:
    • Check measurement definitions (what was measured and how)
    • Assess reliability and validity evidence
    • Look for explicit limitations and potential biases
    • Consider the appropriateness of the data type for the research question
    • Examine ethical considerations (IRB approval, consent, data handling)

Mixed methods and real-world applications

  • Mixed methods combine observational, self-report, physiological, and archival data to provide a fuller picture
  • The choice of method depends on the research question and practical constraints
  • Real-world example from the module:
    • An assignment where students discuss whether a self-report survey on satisfaction is appropriate for a longitudinal study
    • Emphasis on thinking through the data type, reliability, and interpretation
  • Practical research example discussed in class:
    • Studying toddler-satisfaction or partner-report correlations; identifying X and Y variables and their directional relationship
    • Using correlation concepts to interpret whether higher scores on one measure align with higher scores on another
  • Broader relevance:
    • These methods inform relationship education programs, clinical practice, and understanding of relationship dynamics in everyday life

Quick glossary and key terms (with notes)

  • Experience Sampling (ESM): real-time data collection through momentary reports
  • Internal reliability: consistency of items within a measure; often quantified by Cronbach’s alpha α\alpha
  • External reliability: stability of scores over time (test-retest)
  • Inter-rater reliability: agreement among coders or raters (e.g., Cohen’s kappa κ\kappa, ICC)
  • Convergent validity: different measures of the same construct yield similar results
  • Divergent validity: measures of different constructs do not correlate strongly
  • Face validity: items appear to measure the intended construct at face value
  • Belmont Report: ethical principles for human subjects research (Respect for Persons, Beneficence, Justice)
  • IRB: institutional review board that reviews research proposals to protect participants
  • Hawthorne effect: behavioral changes due to awareness of being observed
  • Data cleaning: process of correcting or removing inaccurate data
  • Correlation: statistical association between two variables; often summarized by r=cov(X,Y)σ<em>Xσ</em>Yr = \frac{\mathrm{cov}(X,Y)}{\sigma<em>X \sigma</em>Y}
  • Causality and longitudinal design: longitudinal data help infer temporal relations and potential causal inferences, though causality requires careful analysis
  • Archival data: using existing records or datasets rather than collecting new data
  • Mixed methods: integrating qualitative and quantitative approaches to address research questions

Example prompts and reflection (to practice exam-style thinking)

  • If you observe a high inter-rater reliability (ICC or Cohen’s kappa), what does that imply about your observational coding scheme?
  • How would you determine whether a depression scale used in a relationship study has adequate convergent validity with another established depression measure?
  • What ethical steps would you take to study a topic involving intimate partner violence or high-stress conflict, and how would you communicate potential risks to participants?
  • Given a short-term cortisol measure during a conflict task, how would you interpret elevated cortisol with respect to the patient’s subjective report of stress?
  • When would experience sampling be preferred over a single in-lab observational session, and why?

Final takeaway

  • High-quality relationship science relies on a thoughtful combination of data sources, rigorous reliability and validity checks, and strict ethical conduct
  • Use transparent reporting of methods, acknowledge limitations, and integrate multiple data types to form robust conclusions
  • Being a critical consumer of research includes scrutinizing measurement quality, data handling, and the broader implications for individuals and communities

Additional note from the instructor's example

  • There is an ongoing study in the department on attachments and adult brain responses to images; students were invited to consider this study as an example of ethical research and data interpretation in practice