1/29
Research Final pt. 2
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
validity
accuracy
reliability
consistency
relative reliability
degree to which individuals maintain their positions or rankings relative to one another across repeated measures in a sample. Basically, do you get the same score multiple times, and does each rater come up with the same score? Consistency of relationships between scores
intraclass correlation coefficient
assess test-retest reliability, intra-rater and inter-rater reliability, and intra-subject reliability
use interval or ratio data
range 0.00-1.00, unitless and higher is more reliable. low is most often from disagreement between raters or test-retest scores, or scores being too homogenous/not enough variance. there is no specific number needed, though - that’s up to the researcher/clinician
Excellent score
>.90
Good score
>.75
Moderate score
.50-.75
poor score
<.50
2 sets of scores
can assess reliability across > …..
Form 1
single rating required from each subject
form K
each subject does multiple trials and then their score is a mean
random effect
generalize outcome to similar raters (subjects usually considered random effect); fixed effect = our raters are the only ones of interest (not generalized)
Model 1
raters chosen randomly, some subjects are assessed y different sets of raters (rarely applied)
Model 2
each subject assessed by same set of raters who are considered “randomly” chosen from larger pop (most common)
Model 3
each subject is assessed by same set of raters, but raters are fixed and are only raters of interest (mixed model)
Classify (model, form)
for ex, (2,k) except k would actually be the number of values taken in the k form
absolute reliability
the degree to which a measurement give consistent results in absolute terms, typically qualified by the amount of error in repeated measurement. How close repeated measurements are to each other.
measured with standard error of measurement
standard error of measurement (SEM)
quantifies error in a measurement tool or process, telling how much an observed score is likely to differ from a “true” score
agreement
extent to which diff raters or tools provide same result when assessing same subjects. ensurese reliability, especially when working with categorical data
kappa statistic (k)
kappa statistic
proportion of agreement between raters beyond what would be expected by chance
kappa statistic ranges (-1 to +1)
>.8 = excellent
0.6-0.8 = substantial
0.4-0.6 = moderate
<.4 = poor to fair
weighted kappa
when categories have ordinal scale, penalizing disagreements more heavily when differences are larger (like mild vs severe being worse than mild vs moderate)
internal consistency
important for diagnostic consistency, tool validation, and treatment planning
are different items within a single questionnaire related to each other? if high score, it means that the measure is actually testing for the same thing
measure with Cronbach’s Alpha (a)
range 0-1, with >.7 being acceptable and >.9, excellent. too high (>.95) may indicate redundancy
limits of agreement (LoA)
assess agreement b/tw measurement tools or method by quantifying differences between pair measurements
calc based on mean and SD of diffs between two methods
if narrow, good agreement, if wide, may not agree enough for practical use
important for comparing devices, determining interchangeability of tools/methods, and evaluation the reliability of measures
MDC
tells whether observed change in pt’s score is meaningful and isn't due to variability or measurement error. Tells if there is true clinical improvement
-related to SEM, and reported in same units as the measurement tool
-good for tracking progress, evaluating interventions, and clinical decision-making
MCID
tells whether change is enough that it has practical importance to pt’s well-being or fx. Tells if the intervention is meaningful to pt.
bridges gap between measurable outcomes and subjective satisfaction
reported in same units as the measurement tool
methods: anchor-based, distribution based, and combination
good for assessing intervention effectiveness, setting goals, and interpreting research for clinical meaningfulness instead of only statistical significance
anchor-based
compares change in measurement (for ex, pain scale) to external standing from pt (for ex, much better)
distribution based
uses stats calc like effect size of SEM to get rough estimate of MCID based on data variability
combination
improves accuracy and relevance if you use both