Lec 7 - Measurement Validity and Reliability

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/72

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

73 Terms

1
New cards

Psychometrics

  • Also called clinometrics

  • Concerned with the development, construction, and validation of measurement tools

  • Determines whether a tool possesses useful and accurate measurement properties

2
New cards

Reliability

  • Also known as reproducibility, repeatability, or dependability

  • Extent to which a measure produces consistent results, free from error, between repeated measurements, assuming the underlying condition has not changed.

3
New cards

Measurement error

  • The difference between the true value and the observed value

  • Error = variation without true change

  • Inconsistency is always expected as no measurement is perfectly reliable

4
New cards

Regression toward the mean

  • Closely linked with reliability

  • Phenomenon when the extreme scores on a pretest are expected to move closer, regressing toward the mean on the posttest

  • Most likely seen when a less reliable measure is used

5
New cards

Minimal Detectable Change

  • The amount of change in a variable must be achieved before we can be confident that error does not account for the entire measured difference

  • If a measure has low reliability, then there is a higher value for it

6
New cards

Systematic Error

  • Constant and predictable error

  • Occurs consistently in one direction

  • Does not affect reliability, but may affect validity since it biases the results

7
New cards

Random error

  • Unpredictable and due to chance

  • Caused by:

    • Instrument precision issues

    • Fatigue or inconsistency of the tester

    • Participant issues

    • Environmental fluctuations

  • Leads to both overestimation and underestimation

  • To reduce random error, record multiple trials and take the average to allow the positive and negative errors to cancel out

  • Reliability focuses on the degree of random error in measurement.

8
New cards

Classical Theory

  • Theories

  • Observed scores = True score + Random Error, (X = T + E)

  • Treats all errors as random

9
New cards

Generalizability Theory

  • Theory

  • Identifies the specific sources of error

  • Why inconsistencies occur and how to improve by adjusting testing conditions.

10
New cards

True Score Model

  • Both classical and generalizability theory fall under this

  • Assumes that every observed score is composed of the true score and error components.

  • A “true” value exists behind every observed measurement

11
New cards

Hawthorne Effect

  • People enhance their performance because they know that they are being tested

12
New cards

Test-Retest Reliability

  • Assess the stability of an instrument following repeated measures on at least two different occasions

    • Stability, ability to obtain the same results over repeated administrations, assuming no change in variables

  • Used in experiments where raters are minimally involved such as self-reported questionnaires

  • Testing and carryover effects usually manifest as systematic errors, creating unidirectional changes

13
New cards

Test Re-test Interval

  • Interval must be close enough to avoid genuine changes

  • Interval must be far enough to avoid learning, fatigue, and memory effects

  • Dependent on the variables being tested

14
New cards

Carryover Effect

  • Initial trial can encourage practice or learning that can alter or enhance the subsequent trials

  • Pre-test trials can be done to neutralize this possibility

15
New cards

Testing Effects

  • The test or procedure is responsible for changes in variables

16
New cards

Intra-rater Reliability

  • Stability of the data obtained by one rater across two or more trials performed in a single occasion

  • Carryover effect is not typically an issue; trials follow each other immediately

  • Rater bias may arise due to differences in rater characteristics

    • Raters can do blind scoring so that they will not see the scores during initial trials

    • Creating an objective rater criteria is integral to negate rater bias

17
New cards

Inter-rater Reliability

  • Variation in measurement between two or more raters who measure the same group of subjects at least once

  • Ideally simultaneous, if not, videos or recordings are done

  • Raters must be independent and do not influence one another in scoring to avoid bias

  • Intra-rater reliability shall be established first

  • Bias may arise dues to differences in rater characteristics

18
New cards

Internal Consistency

  • Used for instruments that have a set of questions intended to measure various aspects of a knowledge or construct

  • Degree of homogeneity of test items within an instrument

  • Focuses on the item’s consistency between one another

  • Assessed using the Cronbach’s coefficient alpha (α)

19
New cards

Alternate Forms

  • Uses alternate versions of the same tool

  • Common in standardized educational testing

  • Utilized if two equivalent version of tests or tools are needed

20
New cards

Split-half Reliability

  • Assess the correlation of the results of a subject’s half-test scores

  • Two sets that are redundant or parallel to each other, then the results are combined to assess reliability

21
New cards

Variance

  • Measure of variability among scores in a sample

    • Dispersed sample scores = larger variance

    • Similar sample scores = smaller variance

22
New cards

True Score Variance

  • Primary source of variance

  • Caused by individual differences in behavior

  • When we measure, there is a true score

23
New cards

Error Variance

  • Primary source of variance

  • Caused by different sources of measurement errors

24
New cards

Estimate of Variance


true score variance (T)true score variance (T) + error variance (E)

  • Used when measuring reliability

  • The ratio of true score variance to the total variance

  • Higher error variance, lower reliability

25
New cards

Reliability Coefficients

  • Estimates of reliability vary depending on the type of reliability being analyzed

  • Agreement vs Consistency

  • It is recommended to use the reliability coefficient that analyzes both agreement and consistency

  • The type of statistics used will depend on the level of measurement of the variables

26
New cards

Agreement

  •  if both scores agree between two sets of data.

    • Data can be correlated but do not agree with each other

27
New cards

Consistency

  • based on measurements of correlation

    • Correlation, the degree of association between two sets of data

    • Usually bi-variate, only analyzes 2 sets of data

28
New cards

Relative Reliability

  • Reflect true variance as the proportion of the total variance in a set of scores

  • Unitless

29
New cards

Absolute Reliability

  • Indicate how much of a measure value is likely to be due to error

  • Expressed in original units of your test and measure

30
New cards

Weighted Kappa

  • Used when data is Nominal or Ordinal

  • Represents the average rate of agreement for an entire set of yes or no responses

  • Reliability estimates:

    • <0.60 = poor

    • 0.60 to 0.80 = moderate

    • >0.80 = good

31
New cards

Interclass Correlation Coeffcient

  • Used when data is Interval or Ratio

  • Reflect both degree of correspondence and agreement among ratings

  • Used in test-retest and rater reliability coefficient

  • Different models are used depending on:

    • Raters

    • Kind of reliability being measured

    • Generalizability of findings

  • Reliability estimates: 

    • <0.50 = poor

    • 0.50 - 0.75 = moderate

    • >0.75 = good

32
New cards

Standard Error of Measurement

  • Absolute Reliability Index of Test-Retest reliability

  • Measures response stability or the stability of the instrument’s core over time

  • Gives magnitude of measurement error

  • Commonly used when the stability of response is questioned, and for label constructs

  • Used to form confidence intervals around an observed score

  • Standard Deviation (SD) of measurement errors reflects the reliability of the response

    • ↑ SD = ↓ reliability 

    • ↓ SD = ↑ reliability

33
New cards

Cronback’s Coefficient Alpha

  • Measures internal consistency

  • Can be used if item scores are dichotomous or when there are more than 2 response choices

  • Reliability estimates:

    • <0.50 = Unacceptable 

    • 0.50 - 0.60 = Poor 

    • 0.60 - 0.70 = Questionable

    • 0.70 - 0.80 = Acceptable 

    • 0.80 - 0.90 = Good 

    • >0.90 = Excellent

34
New cards

Validity

  • Results of research are only useful to the extent that they can be accurately and confidently interpreted

  • Seen at a broader research perspective or from a psychometric perspective

35
New cards

Internal Validity

  • Degree of confidence that the causal relationship being tested is trustworthy, accurate, and not influenced by factors or variables

  • Results obtained are attributable to or are a function of the manipulated variables

36
New cards

External Validity

  • Extent to which results from a study can be applied or generalized

37
New cards

Maturation

  • Threats to Internal Validity

  • Changes that occur as time passes

  • Occur in participants (particularly long-term) during the course of the study that are not part of the study methods

38
New cards

Testing

  • Threats to Internal Validity

  • Effects of a pretest on the performance and the posttest 

39
New cards

Confounding Factors

  • Threats to Internal Validity

  • Factors that influence the causal relationship being tested in the study

  • Unexpected and not previously identified

40
New cards

Selection

  • Threats to Internal Validity

  • Participants in the group being investigated differ significantly

  • May be due to pre-existing differences between the groups, rather than differences due to the intervention

  • Significant for experimental or quasi-experimental

41
New cards

Droupouts

  • Threats to Internal Validity

  • Loss of participants from a study due to withdrawal of participation or decision to stop participating in the study

  • If it is caused by the experimental treatment, the internal validity is threatened

42
New cards

Regression towards the Mean

  • Threats to Internal Validity

  • When extreme scores on the first test tend to regress towards the average score on a second test

43
New cards

Instrumentation

  • Threats to Internal Validity

  • Changes and differences in the instrument, observer, testers, and procedures impact the outcome

  • Differences in how the outcome is measured

44
New cards

Social Interaction

  • Threats to Internal Validity

  • Influence when there is interaction between the participants in different groups

45
New cards

John Henry Effect

  • participants in one group attempt to do better than the other group because they are aware that they are being compared

46
New cards

Effect of Testing

  • Threats to External Validity

  • Administration of the test may affect performance or response of participants

  • Results may not be generalized to contexts where pre-testing will not occur

47
New cards

Multiple Treatment Interference

  • Threats to External Validity

  • It is difficult to ensure that the particular intervention produces the outcome

  • It is challenging to control the effects of other prior treatments

48
New cards

Selection-Treatment Interaction

  • Threats to External Validity

  • Characteristics of the participants interact with the aspects of the treatment

  • Happens when samples or participants do not represent the bigger population or group

  • The sample participants of the group shall be representative of the general population we want our result to be applied to

49
New cards

Effects of Experimental Arrangements

  • Threats to External Validity

  • Difficult to generalize for non-experimental arrangements if the effect is attributable to the experimental arrangement

  • Important for research in a highly controlled setting

  • Highly controlled settings = many prerequisites to the real world

  • Threatens replicability of conditions

50
New cards

Discrimination

  • Purposes of a test

  • Distinguish between individual’s or a group’s underlying dimension or phenomenon where there is no external criterion validating it.

  • Distinguish the presence or absence of an attribute or condition

51
New cards

Evaluation

  • Purposes of a test

  • Measures the magnitude of longitudinal change in an individual or group on the dimension of interest

52
New cards

Prediction

  • Purposes of a test

  • Classifies people into a set of predefined categories

  • Determine if an individual has been classified correctly or not

53
New cards

Validity Estimates

  • Can be measured through Pearson’s r and Spearman’s ρ (rho). 

  • Demonstrates the strength of the linear relationship between two variables. 

  • Varies from –1 to 0 to +1, depending on the variables being compared

54
New cards

Strength

  • Considerations in interpreting validity estimates

  • Magnitude of the relationship

  • Very rare for a perfect correlation to occur (1)

  • Standards:

    • Criterion

    • Construct


55
New cards

Direction

  • Considerations in interpreting validity estimates

  • Negative, opposite direction

    • As one increases, the other decreases 

  • Positive, same direction

    • As one increases or decreases, so does the other

56
New cards

Face Validity

  • Instrument appears to test what it is supposed to measure and it is a plausible method to do so

  • Highly subjective, least rigorous

  • Either an instrument has face validity, or not

  • Easily established for tests that require direct observation

57
New cards

Content Validity

  • Extent to which items in an instrument address and sample relevant aspects within the concept being measured or assessed

  • Subjective, based on the review of  a panel of experts in a field

  • Tests should cover all parts of the concept and reflect the relative importance of each part

  • To establish this, concept should be clearly defined

58
New cards

Criterion-Related Validity

  • Ability of a tool to predict results obtained on an external criterion or a gold standard, if not, a reference standard or acceptable criterion

  • Outcomes from the instrument can be used as a substitute measure to the gold standard

  • Correlaiton between the target test and standard is high

  • Used when measuring abstract variables

59
New cards

Concurrent Validity

  • Measure reflects the same behavior as the criterion measure

    • If the new tool is potentially more efficient, it is proposed as alternative

  • New tool and criterion measure are taken at the same time

60
New cards

Predictive Validity

  • Provides a basis for predicting outcomes or future behavior

    • Usually used to assess risks, prognosticate, and set long-term goals

  • Measure will be a valid predictor of some future criterion score

  • Target test is given at one session, followed by a period of time, after which a criterion score is taken

61
New cards

Construct Validity

  • Assesses the ability of an instrument to:

    • Measure and abstract concept

    • Support an underlying theoretical assumptions and context

    • Assess the meaning of a construct

  • Difficult to establish as validity estimates have a lower cut-off

62
New cards

Convergent Validity

  • The instrument yields similar results with other measures meant to assess the same construct or underlying phenomenon

63
New cards

Divergent Validity

  • The instrument demonstrates different results with other measures that are believed to assess different characteristics

64
New cards

Sensitivity

  • Ability of a test to obtain a positive finding when the condition is actually present

  • Positive Predictive Value

    • Probability that the disease is present when a test is positive

65
New cards

Specificity

  • Ability to obtain a negative finding when the condition is actually absent

  • Negative Predictive Value

    • Probability that the disease is not present when a test is negative

66
New cards

Change Score

  • Difference between the outcome and the initial scores

  • Used to determine the change in an individual’s performances

  • Provides basis for interferences related to the difference in the magnitude of change between individuals

67
New cards

Responsiveness to Change

  • Ability of an instrument to detect minimal change over time

68
New cards

Minimal Clinically Important Difference

  • Smallest difference in a measured variable that signifies an important difference in a patient’s condition

    • A statistically significant change in outcomes may not necessarily be clinically significant

    • Effectiveness of interventions must be based on relevance and clinical importance, not just statistical significance

  • Made within the context of the study, outcome measures used, population and setting, and design of the study

69
New cards

Level of Measurement

  • Issues Affecting Validity of Change Scores

  • Type of data obtained affects the ability of an instrument to demonstrate change

  • Accurate computations of change can only be made if the data available are either interval or ratio

70
New cards

Reliability

  • Issues Affecting Validity of Change Scores

  • Related to the concept of measurement error

  • Pre-test and post-test scores may be different because of random errors

  • An important precondition for application of change scores

71
New cards

Stability

  • Issues Affecting Validity of Change Scores

  • Labile Variables

    • Changing and fluctuating consistently

    • May not demonstrate change as a function of improvement following treatment because the change may be due to variable instability

72
New cards

Baseline

  • Floor Effect

    • When a client obtains low scores at baseline, deterioration may not be demonstrated

  • Ceiling Effect

    • When a client obtains high scores at baseline, further improvement may not be demonstrated

73
New cards

Clinical Utility

  • Practicality of administration

  • Involves:

    • Clarity of Instructions

    • Format

    • Ease of Administration

      • Involves time required to complete, administer scores, and interpret them

    • Experitise needed of assessors

    • Cost-effectiveness