Reliability and Validity

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/29

flashcard set

Earn XP

Description and Tags

Research Final pt. 2

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

30 Terms

1
New cards

validity

accuracy

2
New cards

reliability

consistency

3
New cards

relative reliability

degree to which individuals maintain their positions or rankings relative to one another across repeated measures in a sample. Basically, do you get the same score multiple times, and does each rater come up with the same score? Consistency of relationships between scores

4
New cards

intraclass correlation coefficient

assess test-retest reliability, intra-rater and inter-rater reliability, and intra-subject reliability

  • use interval or ratio data

  • range 0.00-1.00, unitless and higher is more reliable. low is most often from disagreement between raters or test-retest scores, or scores being too homogenous/not enough variance. there is no specific number needed, though - that’s up to the researcher/clinician

5
New cards

Excellent score

>.90

6
New cards

Good score

>.75

7
New cards

Moderate score

.50-.75

8
New cards

poor score

<.50

9
New cards

2 sets of scores

can assess reliability across > …..

10
New cards

Form 1

single rating required from each subject

11
New cards

form K

each subject does multiple trials and then their score is a mean

12
New cards

random effect

generalize outcome to similar raters (subjects usually considered random effect); fixed effect = our raters are the only ones of interest (not generalized)

13
New cards

Model 1

raters chosen randomly, some subjects are assessed y different sets of raters (rarely applied)

14
New cards

Model 2

each subject assessed by same set of raters who are considered “randomly” chosen from larger pop (most common)

15
New cards

Model 3

each subject is assessed by same set of raters, but raters are fixed and are only raters of interest (mixed model)

16
New cards

Classify (model, form)

for ex, (2,k) except k would actually be the number of values taken in the k form

17
New cards

absolute reliability

the degree to which a measurement give consistent results in absolute terms, typically qualified by the amount of error in repeated measurement. How close repeated measurements are to each other.

  • measured with standard error of measurement

18
New cards

standard error of measurement (SEM)

quantifies error in a measurement tool or process, telling how much an observed score is likely to differ from a “true” score

19
New cards

agreement

extent to which diff raters or tools provide same result when assessing same subjects. ensurese reliability, especially when working with categorical data

  • kappa statistic (k)

20
New cards

kappa statistic

proportion of agreement between raters beyond what would be expected by chance

21
New cards

kappa statistic ranges (-1 to +1)

>.8 = excellent

0.6-0.8 = substantial

0.4-0.6 = moderate

<.4 = poor to fair

22
New cards

weighted kappa

when categories have ordinal scale, penalizing disagreements more heavily when differences are larger (like mild vs severe being worse than mild vs moderate)

23
New cards

internal consistency

important for diagnostic consistency, tool validation, and treatment planning

are different items within a single questionnaire related to each other? if high score, it means that the measure is actually testing for the same thing

24
New cards

measure with Cronbach’s Alpha (a)

range 0-1, with >.7 being acceptable and >.9, excellent. too high (>.95) may indicate redundancy

25
New cards

limits of agreement (LoA)

assess agreement b/tw measurement tools or method by quantifying differences between pair measurements

  • calc based on mean and SD of diffs between two methods

  • if narrow, good agreement, if wide, may not agree enough for practical use

    • important for comparing devices, determining interchangeability of tools/methods, and evaluation the reliability of measures

26
New cards

MDC

tells whether observed change in pt’s score is meaningful and isn't due to variability or measurement error. Tells if there is true clinical improvement

-related to SEM, and reported in same units as the measurement tool

-good for tracking progress, evaluating interventions, and clinical decision-making

27
New cards

MCID

tells whether change is enough that it has practical importance to pt’s well-being or fx. Tells if the intervention is meaningful to pt.

  • bridges gap between measurable outcomes and subjective satisfaction

  • reported in same units as the measurement tool

    • methods: anchor-based, distribution based, and combination

  • good for assessing intervention effectiveness, setting goals, and interpreting research for clinical meaningfulness instead of only statistical significance

28
New cards

anchor-based

compares change in measurement (for ex, pain scale) to external standing from pt (for ex, much better)

29
New cards

distribution based

uses stats calc like effect size of SEM to get rough estimate of MCID based on data variability

30
New cards

combination

improves accuracy and relevance if you use both