Comprehensive Notes on Research Methods in Relationship Science

Data collection in Relationship Science

Focus on observable behavior and interaction as data sources
Examples of observable data include:
- Facial expressions
- Who is speaking and what they’re saying
- How they’re saying it (tone, prosody)
Observational coding schemes (highly structured coding of behavior) used to quantify interactions
Example study approach described:
- Bring couples into a lab or natural setting
- Prime them to discuss a typical topic
- Record the interaction for later coding
- Code movement of face, tone of voice, and interaction dynamics
- Analyze how participants respond to each other in real time (e.g., face changes, snapping back, calming the conflict)
Observational methods provide a rich, contextual picture but are labor-intensive and costly
Two broad data collection approaches discussed:
- Single-instance observational data
- Repeated observations or longitudinal designs
Example: measure a couple before and after a relationship-education program using the same observational method
Contrast with experience sampling (ESM):
- Participants carry a device that records short verbal samples when prompted
- Records snippets of lived experiences throughout the day, then data are coded
Purpose of ES M is to capture data that matches the research question (e.g., college students’ conversations after a big football game)
Preschool observational work described:
- Some early pilot studies described as watching many children for short periods (e.g., 10 seconds per child) and collecting thousands of data points to measure sociability
Observational data are typically naturalistic and short bursts; can be conducted in lab settings too
Common questions about observational data:
- Is a single observation enough, or should we collect multiple sessions?
- How does study design (lab vs natural setting) influence findings?

Potential downsides and limitations of observational data

Reactivity effect (Hawthorne effect): participants may behave differently because they know they’re being observed
Narrow window: a single observation may not generalize to typical behavior
Internal observation limits: data are limited to observable behaviors; private states may be inferred but are less direct
High cost and effort: training, reliability checks, and coding time are substantial
Reliability challenges:
- Raters need consistent coding across observers
- Requires rigorous training and calibration to reach acceptable reliability
- Time and cost escalate with larger samples
Mitigation strategies:
- Mixed methods: combine observational data with self-report data
- Use standardized coding schemes and training materials
- Use multiple observational instances to increase reliability
Technological considerations:
- Video/audio recording equipment, secure storage, and data management
- Synchronization of multiple data streams (e.g., video plus audio transcripts)
Ethical considerations in observational studies discussed within Belmont Report framework (see Ethics section)

Experience Sampling (ESM) in relationship science

ESM samples real-lived experiences across time rather than relying on a single observation
Typical setup:
- Portable recording device (e.g., handheld recorder or smartphone app)
- Automatic prompts or user-initiated recordings
- Data captured only when the participant starts talking; device then stops
Advantages:
- Captures variability across moods, contexts, conflicts, and interactions
- Reduces reliance on retrospective reporting
Limitations:
- Participant burden and potential nonresponse bias
- Data are momentary and require careful interpretation
Examples:
- After a significant social event (e.g., football game), what topics arise and how do partners interact?
- Across a school day, what interactions occur with peers or partners?

Other data collection modalities

Self-report data (surveys, questionnaires): prompts participants to report thoughts, feelings, behaviors
Physiological data: body-based indicators of relationship processes
- Heart rate, respiration, cortisol (stress hormone), skin conductance (sweat)
- Brain activity (e.g., EEG/other measures) in some studies
- Hormone levels can be measured non-invasively (e.g., saliva cortisol)
Archival and secondary data: using existing data sources
- Public records (marriage/divorce decrees, birth records)
- Health records, governmental datasets
- Online data (social media posts, forums, Reddit) to infer relationship processes
Example of archival/social-media data use:
- Analyzing why people call off marriages by coding Reddit posts without directly asking participants

Key distinctions among data types

Natural environments vs lab settings: data can be collected in either; ecological validity often higher in natural settings
Single-instance vs repeated measures: longitudinal data allow observing change over time
Depth vs breadth: observational coding yields rich, detailed data; self-report can cover broader constructs
Inter-method fit: researchers often combine data types (mixed methods) to address limitations of any single approach

Reliability in relationship science data

Reliability refers to consistency of measurement
Three forms highlighted:
- Internal reliability (internal consistency): consistency of responses within a measure
- External reliability (test-retest reliability): stability of scores across time or conditions
- Inter-rater reliability: consistency between different coders or raters

Internal reliability

Focused on how consistently multiple items measure the same construct
Example: a depression scale with 10 items
Internal consistency means respondents answer related items in a coherent way
Common statistic: Cronbach’s alpha, $\alpha = \frac{N}{N-1}\left(1 - \frac{\sum{i=1}^N \sigmai^2}{\sigma_T^2}\right)$
- N = number of items
- \sigma_i^2 = variance of item i
- \sigma_T^2 = variance of the total score
High internal reliability suggests items cohere well; not sufficient alone for validity

External reliability (test-retest)

Measures stability of scores over time
Approach: administer the same measure to the same participants at two or more time points
Ideal outcome: scores remain reasonably stable when the underlying construct is stable
Note: some constructs (e.g., mood) may legitimately change over time; interpretation depends on construct

Inter-rater reliability

Important for observational and coding data
Gauges whether different raters code the same behavior similarly
Common statistics:
- Cohen’s kappa: $\kappa = \frac{Po - Pe}{1 - Pe}$ where Po is observed agreement and P_e is chance agreement
- Intraclass correlation (ICC) for continuous ratings
Achieving high inter-rater reliability requires training, calibration, and clear coding manuals

Validity in relationship science data

Validity concerns whether a measure accurately captures what it is intended to measure
Three forms highlighted:
- Convergent validity: different measures of the same construct yield similar results
- Divergent (discriminant) validity: measures of different constructs do not highly correlate
- Face validity: the measure appears to assess the intended construct at face value

Convergent validity

If two separate depression scales yield high correlation for the same individuals, they demonstrate convergent validity
Conceptual idea: different methods converge on the same underlying construct

Divergent validity

Measures of related but distinct constructs should not be perfectly correlated
Example: depression vs. relationship satisfaction should not perfectly track together, though some overlap may exist

Face validity

The items or tasks should seem appropriate for the construct being measured
Poor face validity example: measuring depression with only questions about eating habits
Face validity matters for participant understanding and engagement, but it is not sufficient for overall validity

Reliability vs validity relationship

Reliability is a prerequisite for validity: you cannot have a valid measure if it is not reliable
A measure can be reliable but not valid (consistently wrong)
A measure cannot be valid if it is not reliable
Example visuals (described conceptually):
- Highly reliable but far off target (consistent but biased)
- On-target but inconsistent (accurate on average but noisy)
- Neither reliable nor valid (scattered and off-target)
Practical takeaway: ensure both reliability and validity when interpreting data

Ethics and responsible research with human participants

Research with humans requires ethical consideration and oversight
Belmont Report (1979): foundational document outlining ethical principles for human subjects research
- Respect for persons: acknowledge autonomy; protect those with diminished autonomy; informed consent
- Beneficence: do not harm; maximize benefits; minimize risks
- Justice: fairness in distribution of research burdens and benefits
Institutional Review Board (IRB): independent group that evaluates proposed research to protect participants
- If a study poses risk or ethical concerns, IRB can require modifications or reject the study
Historical caution: Tuskegee syphilis study highlighted the dangers of unethical research and the need for IRB oversight
Informed consent: written and/or verbal documentation detailing a study’s purpose, procedures, risks, benefits, duration, compensation, data handling, and right to withdraw
Participant rights: voluntary participation, right to withdraw at any time without penalty, ability to ask questions and receive answers
Ethical conduct in practice:
- Be transparent about study goals and potential harm
- Ensure fair treatment and respect for participants
- Avoid coercion and undue influence
- Protect privacy and data confidentiality
- Provide debriefing and resources if discussing sensitive topics

Data quality, limitations, and responsible interpretation

Data quality matters: poor data lead to incorrect conclusions
Data cleaning: process of identifying and correcting or removing noisy or erroneous data
- Examples: misreporting (e.g., “three kids” but only two checked), duplicated records, missing values
Limitations and transparency:
- All studies have limitations; good papers acknowledge them and outline their impact
- Be cautious about overgeneralizing from single studies or small samples
Being a good consumer of research:
- Check measurement definitions (what was measured and how)
- Assess reliability and validity evidence
- Look for explicit limitations and potential biases
- Consider the appropriateness of the data type for the research question
- Examine ethical considerations (IRB approval, consent, data handling)

Mixed methods and real-world applications

Mixed methods combine observational, self-report, physiological, and archival data to provide a fuller picture
The choice of method depends on the research question and practical constraints
Real-world example from the module:
- An assignment where students discuss whether a self-report survey on satisfaction is appropriate for a longitudinal study
- Emphasis on thinking through the data type, reliability, and interpretation
Practical research example discussed in class:
- Studying toddler-satisfaction or partner-report correlations; identifying X and Y variables and their directional relationship
- Using correlation concepts to interpret whether higher scores on one measure align with higher scores on another
Broader relevance:
- These methods inform relationship education programs, clinical practice, and understanding of relationship dynamics in everyday life

Quick glossary and key terms (with notes)

Experience Sampling (ESM): real-time data collection through momentary reports
Internal reliability: consistency of items within a measure; often quantified by Cronbach’s alpha $\alpha$
External reliability: stability of scores over time (test-retest)
Inter-rater reliability: agreement among coders or raters (e.g., Cohen’s kappa $\kappa$ , ICC)
Convergent validity: different measures of the same construct yield similar results
Divergent validity: measures of different constructs do not correlate strongly
Face validity: items appear to measure the intended construct at face value
Belmont Report: ethical principles for human subjects research (Respect for Persons, Beneficence, Justice)
IRB: institutional review board that reviews research proposals to protect participants
Hawthorne effect: behavioral changes due to awareness of being observed
Data cleaning: process of correcting or removing inaccurate data
Correlation: statistical association between two variables; often summarized by $r = \frac{\mathrm{cov}(X,Y)}{\sigmaX \sigmaY}$
Causality and longitudinal design: longitudinal data help infer temporal relations and potential causal inferences, though causality requires careful analysis
Archival data: using existing records or datasets rather than collecting new data
Mixed methods: integrating qualitative and quantitative approaches to address research questions

Example prompts and reflection (to practice exam-style thinking)

If you observe a high inter-rater reliability (ICC or Cohen’s kappa), what does that imply about your observational coding scheme?
How would you determine whether a depression scale used in a relationship study has adequate convergent validity with another established depression measure?
What ethical steps would you take to study a topic involving intimate partner violence or high-stress conflict, and how would you communicate potential risks to participants?
Given a short-term cortisol measure during a conflict task, how would you interpret elevated cortisol with respect to the patient’s subjective report of stress?
When would experience sampling be preferred over a single in-lab observational session, and why?

Final takeaway

High-quality relationship science relies on a thoughtful combination of data sources, rigorous reliability and validity checks, and strict ethical conduct
Use transparent reporting of methods, acknowledge limitations, and integrate multiple data types to form robust conclusions
Being a critical consumer of research includes scrutinizing measurement quality, data handling, and the broader implications for individuals and communities

Additional note from the instructor's example

There is an ongoing study in the department on attachments and adult brain responses to images; students were invited to consider this study as an example of ethical research and data interpretation in practice