Week 7: Validity and Behavioral Observation

0.0(0)

Studied by 1 person

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/28

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

29 Terms

New cards

What is validity?

Assesses whether the scale is measuring what it is supposed to be measuring?
Are the items on a self-esteem actually measuring self-esteem?

New cards

What is content validity?

Adequately covering the relevant content
Expert ratings of test content
Example: In educational testing, does the test cover all 5 chapters or just 2?
Considered "logical" rather than statistical.
- This means that you can’t run a statistical test to ensure your test has content validity (expert ratings)

New cards

What are the threats to content validity?

*2 Principles to content validity

Principle:
- Test should not include content irrelevant to the construct.
- Threat: "Construct Irrelevant Content"
Principle:
- Test should include content representing the full range of the construct.
- Threat: "Construct Under-representation"
- Test fails to include content representing the full range of the construct.

New cards

What is the difference between content validity and face validity?

Face Validity:

Related to content validity.
The degree to which a measure appears to be related to a specific construct, as judged by non-experts (e.g., test takers).
Not considered a “real” measure of validity because it doesn’t provide evidence to support conclusions.
Considered “logical” rather than statistical.
- This means that you can’t run a statistical test

New cards

What is criterion-related validity, and what are its types?

Tells us how well a test corresponds to an established criterion.
Assessed using a correlation.
- Use your test and a gold standard test from the field then correlate them
Predictive Validity:
- "Forecasting function" → longitudinal.
- Correlation between a predictor and a criterion.
- Predictor variable is measured before criterion variable
- Example: college entrance test (taken in high school) predicting college GPA.
Concurrent Validity:
- Simultaneously administered → cross-sectional.
- Correlation between two variables at the same time point
- Example: college entrance test (taken in high school) predicting high school GPA

New cards

What is predictive validity?

Predictive Validity:
- Type of criterion-related validity
- "Forecasting function" → longitudinal.
- Correlation between a predictor and a criterion.
- Predictor variable is measured before criterion variable
- Example: college entrance test (taken in high school) predicting college GPA.

New cards

What is concurrent validity?

Concurrent Validity:
- Type of criterion-related validity
- Simultaneously administered → cross-sectional.
- Correlation between two variables at the same time point
- Example: college entrance test (taken in high school) predicting high school GPA

New cards

What is construct-related validity, and what are its types?

Construct-Related Validity: Assesses whether a test measures the theoretical construct it is intended to measure.
Convergent Evidence:
- Similar to criterion-related validity.
- Assesses correlation between two measures that are theorized to be related.
- Can be positively or negatively correlated.
- Example: A new anxiety scale correlating with an existing measure of anxiety.
Divergent/Discriminant Evidence:
- Assesses correlation between two measures that are theorized to be unrelated.
- Does not mean -negative correlation-, just no meaningful relationship.
- Example: A test of mathematical ability should not correlate with a measure of extroversion.

New cards

What is convergent evidence?

Convergent Evidence:
- Type of construct-related validity
- Similar to criterion-related validity.
- Assesses correlation between two measures that are theorized to be related.
- Can be positively or negatively correlated.
- Example: A new anxiety scale correlating with an existing measure of anxiety.

New cards

What is divergent/discriminant evidence?

Divergent/Discriminant Evidence:
- Type of construct-related validity
- Assesses correlation between two measures that are theorized to be unrelated.
- *Does not mean -negative correlation-, just no meaningful relationship.
- Example: A test of mathematical ability should not correlate with a measure of extroversion.

New cards

What are multi-method approaches for construct validity?

Multi-method approaches integrate different measurement methods to strengthen construct validity.
Multiple Informants:
- Different people report on the same individual.
- Example: For young children, researchers may use parent and teacher reports.
- Example: For adolescents or adults, peer evaluations can supplement self-reports.
Multiple Methods:
- Different types of data collection.
- Example: A researcher observing social interactions to validate a self-report on shyness.
- Example: Measuring heart rate as a physiological indicator of anxiety.

New cards

What is a validity coefficient, and how is it interpreted?

A validity coefficient is a correlation coefficient that indicates how well a test predicts a criterion.
Acceptable coefficient: r = .30 or higher.
Percentage of variation explained:
- Squared value of the correlation coefficient (r²).
- Example: If r = .40, then 16% of the variation in the criterion can be explained by the test.

New cards

What are some considerations for evaluating validity coefficients?

Sample composition – Ensure the measure has been validated in the population/sample you are testing.
- Example: Using an anxiety scale validated for 7-8-year-olds in a study on 3-5-year-olds may not be appropriate.
Sample size – Smaller samples may inflate correlation coefficients.
Restricted ranges – A limited range in predictor or outcome variables can lower the validity coefficient.
Generalizability – Consider whether the original validation study results apply to your sample.
- Example: A measure validated with parent reports may not generalize if you use teacher reports.
Differential predictions – The test's predictive power may vary across different groups or conditions.

New cards

What is the difference between reliability and validity?

Reliability = Consistency (Does the test produce stable and consistent results?)
Validity = Accuracy (Does the test measure what it is supposed to measure?)
A test can be reliable but not valid (e.g., a broken scale consistently gives the wrong weight).
A test cannot be valid without being reliable (if a test is not consistent, it cannot be accurate).

<ul><li><p><strong><mark data-color="#ffaef4" style="background-color: #ffaef4; color: inherit">Reliability</mark> = </strong><span style="color: #942876"><strong>Consistency</strong></span> (Does the test produce stable and consistent results?)</p></li><li><p><strong><mark data-color="#87caff" style="background-color: #87caff; color: inherit">Validity</mark> = </strong><span style="color: #7792d4"><strong>Accuracy</strong></span> (Does the test measure what it is supposed to measure?)</p></li><li><p>A test can be <strong>reliable but not valid</strong> (e.g., a broken scale consistently gives the wrong weight).</p></li><li><p>A test <strong>cannot be valid without being reliable</strong> (if a test is not consistent, it cannot be accurate).</p></li></ul><p></p>

New cards

What is behavioral observation?

Moves beyond questionnaire-based data.
Involves observing a participant or group of participants directly in real-world settings or controlled environments.

New cards

What are the contexts of behavioral coding?

Laboratory or Home:
- Lab: Controlled environment, often with video recording.
- Issue: Participants may act differently because they know they are being recorded.
Naturalistic:
- Uncontrolled: Observing in settings like classrooms, hospitals, parks.
- Live, in vivo coding: More challenging as it requires multiple researchers and real-time observation without playback.
- Video recorded: Using tools like GoPro for unobtrusive observation.

New cards

Why use behavioral observation? What are its pros?

Ecological validity: Observing behavior in natural settings increases relevance to real-world situations.
Assess construct in young children: Some children may be too young to understand or respond to questionnaires.
Limit self-report bias: Reduces reliance on self-reports, which can be biased or inaccurate.
Multi-method approach: Can contribute to establishing construct validity when combined with other methods.

New cards

Why might you NOT use behavioral observation? What are its cons?

Expensive/resource intensive: Requires equipment, multiple observers, and potentially a large setup.
Time intensive: Observing and coding behavior takes significant time and effort.

New cards

What is frequency coding in behavioural observation?

Frequency Coding: Measures how often a certain behaviour occurs during an observation period.
Example: How many times does a parent praise their child during a three-minute play session?
- Coded Behaviour: 16 instances of praise observed.
- 16 instances ÷ 3 minutes = 5.33 instances per minute.

New cards

What is duration coding in behavioural observation?

Duration Coding: Measures how long a certain behaviour occurs during an observation period.
Example:
- How long did a participant smile during a one-minute social interaction?
- Coded Behavior: 43 seconds of smiling.
- Proportion: 43 seconds ÷ 60 seconds = 71.67%.

New cards

What is interval coding in behavioural observation?

Interval Coding: Measures whether a behaviour occurs during predetermined intervals.
Example: A researcher divides a one-minute task into six 10-second epochs.
Coded Behavior:
- Dichotomous: Is the behaviour present or absent?
- Continuous: Is the behaviour occurring at low, medium, or high levels?
  - Integrates intensity - Measures the strength or frequency of the behavior.

<ul><li><p><strong><mark data-color="#ffafe4" style="background-color: #ffafe4; color: inherit">Interval Coding:</mark></strong> Measures whether a <mark data-color="#ffe4f5" style="background-color: #ffe4f5; color: inherit"><u>behaviour occurs during </u></mark><strong><mark data-color="#ffe4f5" style="background-color: #ffe4f5; color: inherit"><u>predetermined intervals.</u></mark></strong></p></li><li><p><strong>Example:</strong> A researcher divides a one-minute task into six 10-second epochs.</p></li><li><p><strong>Coded Behavior:</strong></p><ul><li><p><strong><mark data-color="#b4dcff" style="background-color: #b4dcff; color: inherit">Dichotomous:</mark></strong> Is the behaviour <span style="color: #235780"><strong>present or absent?</strong></span></p></li><li><p><strong><mark data-color="#b893e3" style="background-color: #b893e3; color: inherit">Continuous:</mark></strong> Is the behaviour occurring at <span style="color: #7a42a2"><strong>low, medium, or high levels?</strong></span></p><ul><li><p>Integrates intensity - Measures the strength or frequency of the behavior.</p></li></ul></li></ul></li></ul><p></p>

New cards

What is global coding in behavioural observation?

Global Coding: Provides an overall impression of the behaviour across the entire observation period, rather than focusing on specific intervals or epochs.

New cards

How do you develop a coding scheme for behavioral observation?

What do you want to measure?
- Example: Number of times a student asks for help during a 50-minute tutorial.
How do you define the behavior?
- Example: What counts as "help"? Does it include raising a hand, asking the TA, or asking peers? Clear guidelines are necessary.
For abstract behaviors (e.g., shyness):
- Theoretically, what behaviors capture shyness?
  - Verbal hesitancy
  - Gaze aversion
  - Body orientation

New cards

What is inter-rater reliability?

Inter-Rater Reliability: Ensures that different raters (or coders) interpret behaviors in the same way.
Blinded raters need to overlap on about 15% of the cases to establish reliability.
Example:
- You have 200 videos to code and two coders.
- Each coder should code a subset of 30 overlapping videos to assess agreement.

New cards

How is inter-rater reliability established using the Kappa statistic?

Kappa Statistic: Assesses the level of agreement among raters.
Range:
- 1 = Perfect agreement
- -1 = Less agreement than would be expected by chance
Kappa Value:
- Higher than .70: Excellent agreement
- .40 to .70: Fair to Good (Acceptable)
- Less than .40: Poor agreement

New cards

What should you do if you have a low Kappa statistic?

Clarify your coding scheme to ensure consistency and reduce ambiguity.
Train coders more extensively to improve their understanding and consistency in applying the coding criteria.

New cards

What is internal consistency, and how is it used in behavioral coding?

Internal Consistency: Measures the consistency of items within a test or coding scheme.
Can be used for interval coding, where epochs are treated as "items."
Cronbach's alpha is used, just like in questionnaires, to assess how well the items (or epochs) correlate with each other.

New cards

Can we establish convergent validity for these measures?

yes because the correlation coefficient is greater than 0.3

<p><strong>yes</strong> because the correlation coefficient is greater than 0.3</p>

New cards

Are these coders in agreement?

yes bc kappa is above 0.4 for each individual set of coders