1/72
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Psychometrics
Also called clinometrics
Concerned with the development, construction, and validation of measurement tools
Determines whether a tool possesses useful and accurate measurement properties
Reliability
Also known as reproducibility, repeatability, or dependability
Extent to which a measure produces consistent results, free from error, between repeated measurements, assuming the underlying condition has not changed.
Measurement error
The difference between the true value and the observed value
Error = variation without true change
Inconsistency is always expected as no measurement is perfectly reliable
Regression toward the mean
Closely linked with reliability
Phenomenon when the extreme scores on a pretest are expected to move closer, regressing toward the mean on the posttest
Most likely seen when a less reliable measure is used
Minimal Detectable Change
The amount of change in a variable must be achieved before we can be confident that error does not account for the entire measured difference
If a measure has low reliability, then there is a higher value for it
Systematic Error
Constant and predictable error
Occurs consistently in one direction
Does not affect reliability, but may affect validity since it biases the results
Random error
Unpredictable and due to chance
Caused by:
Instrument precision issues
Fatigue or inconsistency of the tester
Participant issues
Environmental fluctuations
Leads to both overestimation and underestimation
To reduce random error, record multiple trials and take the average to allow the positive and negative errors to cancel out
Reliability focuses on the degree of random error in measurement.
Classical Theory
Theories
Observed scores = True score + Random Error, (X = T + E)
Treats all errors as random
Generalizability Theory
Theory
Identifies the specific sources of error
Why inconsistencies occur and how to improve by adjusting testing conditions.
True Score Model
Both classical and generalizability theory fall under this
Assumes that every observed score is composed of the true score and error components.
A “true” value exists behind every observed measurement
Hawthorne Effect
People enhance their performance because they know that they are being tested
Test-Retest Reliability
Assess the stability of an instrument following repeated measures on at least two different occasions
Stability, ability to obtain the same results over repeated administrations, assuming no change in variables
Used in experiments where raters are minimally involved such as self-reported questionnaires
Testing and carryover effects usually manifest as systematic errors, creating unidirectional changes
Test Re-test Interval
Interval must be close enough to avoid genuine changes
Interval must be far enough to avoid learning, fatigue, and memory effects
Dependent on the variables being tested
Carryover Effect
Initial trial can encourage practice or learning that can alter or enhance the subsequent trials
Pre-test trials can be done to neutralize this possibility
Testing Effects
The test or procedure is responsible for changes in variables
Intra-rater Reliability
Stability of the data obtained by one rater across two or more trials performed in a single occasion
Carryover effect is not typically an issue; trials follow each other immediately
Rater bias may arise due to differences in rater characteristics
Raters can do blind scoring so that they will not see the scores during initial trials
Creating an objective rater criteria is integral to negate rater bias
Inter-rater Reliability
Variation in measurement between two or more raters who measure the same group of subjects at least once
Ideally simultaneous, if not, videos or recordings are done
Raters must be independent and do not influence one another in scoring to avoid bias
Intra-rater reliability shall be established first
Bias may arise dues to differences in rater characteristics
Internal Consistency
Used for instruments that have a set of questions intended to measure various aspects of a knowledge or construct
Degree of homogeneity of test items within an instrument
Focuses on the item’s consistency between one another
Assessed using the Cronbach’s coefficient alpha (α)
Alternate Forms
Uses alternate versions of the same tool
Common in standardized educational testing
Utilized if two equivalent version of tests or tools are needed
Split-half Reliability
Assess the correlation of the results of a subject’s half-test scores
Two sets that are redundant or parallel to each other, then the results are combined to assess reliability
Variance
Measure of variability among scores in a sample
Dispersed sample scores = larger variance
Similar sample scores = smaller variance
True Score Variance
Primary source of variance
Caused by individual differences in behavior
When we measure, there is a true score
Error Variance
Primary source of variance
Caused by different sources of measurement errors
Estimate of Variance
true score variance (T)true score variance (T) + error variance (E)
Used when measuring reliability
The ratio of true score variance to the total variance
Higher error variance, lower reliability
Reliability Coefficients
Estimates of reliability vary depending on the type of reliability being analyzed
Agreement vs Consistency
It is recommended to use the reliability coefficient that analyzes both agreement and consistency
The type of statistics used will depend on the level of measurement of the variables
Agreement
if both scores agree between two sets of data.
Data can be correlated but do not agree with each other
Consistency
based on measurements of correlation
Correlation, the degree of association between two sets of data
Usually bi-variate, only analyzes 2 sets of data
Relative Reliability
Reflect true variance as the proportion of the total variance in a set of scores
Unitless
Absolute Reliability
Indicate how much of a measure value is likely to be due to error
Expressed in original units of your test and measure
Weighted Kappa
Used when data is Nominal or Ordinal
Represents the average rate of agreement for an entire set of yes or no responses
Reliability estimates:
<0.60 = poor
0.60 to 0.80 = moderate
>0.80 = good
Interclass Correlation Coeffcient
Used when data is Interval or Ratio
Reflect both degree of correspondence and agreement among ratings
Used in test-retest and rater reliability coefficient
Different models are used depending on:
Raters
Kind of reliability being measured
Generalizability of findings
Reliability estimates:
<0.50 = poor
0.50 - 0.75 = moderate
>0.75 = good
Standard Error of Measurement
Absolute Reliability Index of Test-Retest reliability
Measures response stability or the stability of the instrument’s core over time
Gives magnitude of measurement error
Commonly used when the stability of response is questioned, and for label constructs
Used to form confidence intervals around an observed score
Standard Deviation (SD) of measurement errors reflects the reliability of the response
↑ SD = ↓ reliability
↓ SD = ↑ reliability
Cronback’s Coefficient Alpha
Measures internal consistency
Can be used if item scores are dichotomous or when there are more than 2 response choices
Reliability estimates:
<0.50 = Unacceptable
0.50 - 0.60 = Poor
0.60 - 0.70 = Questionable
0.70 - 0.80 = Acceptable
0.80 - 0.90 = Good
>0.90 = Excellent
Validity
Results of research are only useful to the extent that they can be accurately and confidently interpreted
Seen at a broader research perspective or from a psychometric perspective
Internal Validity
Degree of confidence that the causal relationship being tested is trustworthy, accurate, and not influenced by factors or variables
Results obtained are attributable to or are a function of the manipulated variables
External Validity
Extent to which results from a study can be applied or generalized
Maturation
Threats to Internal Validity
Changes that occur as time passes
Occur in participants (particularly long-term) during the course of the study that are not part of the study methods
Testing
Threats to Internal Validity
Effects of a pretest on the performance and the posttest
Confounding Factors
Threats to Internal Validity
Factors that influence the causal relationship being tested in the study
Unexpected and not previously identified
Selection
Threats to Internal Validity
Participants in the group being investigated differ significantly
May be due to pre-existing differences between the groups, rather than differences due to the intervention
Significant for experimental or quasi-experimental
Droupouts
Threats to Internal Validity
Loss of participants from a study due to withdrawal of participation or decision to stop participating in the study
If it is caused by the experimental treatment, the internal validity is threatened
Regression towards the Mean
Threats to Internal Validity
When extreme scores on the first test tend to regress towards the average score on a second test
Instrumentation
Threats to Internal Validity
Changes and differences in the instrument, observer, testers, and procedures impact the outcome
Differences in how the outcome is measured
Social Interaction
Threats to Internal Validity
Influence when there is interaction between the participants in different groups
John Henry Effect
participants in one group attempt to do better than the other group because they are aware that they are being compared
Effect of Testing
Threats to External Validity
Administration of the test may affect performance or response of participants
Results may not be generalized to contexts where pre-testing will not occur
Multiple Treatment Interference
Threats to External Validity
It is difficult to ensure that the particular intervention produces the outcome
It is challenging to control the effects of other prior treatments
Selection-Treatment Interaction
Threats to External Validity
Characteristics of the participants interact with the aspects of the treatment
Happens when samples or participants do not represent the bigger population or group
The sample participants of the group shall be representative of the general population we want our result to be applied to
Effects of Experimental Arrangements
Threats to External Validity
Difficult to generalize for non-experimental arrangements if the effect is attributable to the experimental arrangement
Important for research in a highly controlled setting
Highly controlled settings = many prerequisites to the real world
Threatens replicability of conditions
Discrimination
Purposes of a test
Distinguish between individual’s or a group’s underlying dimension or phenomenon where there is no external criterion validating it.
Distinguish the presence or absence of an attribute or condition
Evaluation
Purposes of a test
Measures the magnitude of longitudinal change in an individual or group on the dimension of interest
Prediction
Purposes of a test
Classifies people into a set of predefined categories
Determine if an individual has been classified correctly or not
Validity Estimates
Can be measured through Pearson’s r and Spearman’s ρ (rho).
Demonstrates the strength of the linear relationship between two variables.
Varies from –1 to 0 to +1, depending on the variables being compared
Strength
Considerations in interpreting validity estimates
Magnitude of the relationship
Very rare for a perfect correlation to occur (1)
Standards:
Criterion
Construct
Direction
Considerations in interpreting validity estimates
Negative, opposite direction
As one increases, the other decreases
Positive, same direction
As one increases or decreases, so does the other
Face Validity
Instrument appears to test what it is supposed to measure and it is a plausible method to do so
Highly subjective, least rigorous
Either an instrument has face validity, or not
Easily established for tests that require direct observation
Content Validity
Extent to which items in an instrument address and sample relevant aspects within the concept being measured or assessed
Subjective, based on the review of a panel of experts in a field
Tests should cover all parts of the concept and reflect the relative importance of each part
To establish this, concept should be clearly defined
Criterion-Related Validity
Ability of a tool to predict results obtained on an external criterion or a gold standard, if not, a reference standard or acceptable criterion
Outcomes from the instrument can be used as a substitute measure to the gold standard
Correlaiton between the target test and standard is high
Used when measuring abstract variables
Concurrent Validity
Measure reflects the same behavior as the criterion measure
If the new tool is potentially more efficient, it is proposed as alternative
New tool and criterion measure are taken at the same time
Predictive Validity
Provides a basis for predicting outcomes or future behavior
Usually used to assess risks, prognosticate, and set long-term goals
Measure will be a valid predictor of some future criterion score
Target test is given at one session, followed by a period of time, after which a criterion score is taken
Construct Validity
Assesses the ability of an instrument to:
Measure and abstract concept
Support an underlying theoretical assumptions and context
Assess the meaning of a construct
Difficult to establish as validity estimates have a lower cut-off
Convergent Validity
The instrument yields similar results with other measures meant to assess the same construct or underlying phenomenon
Divergent Validity
The instrument demonstrates different results with other measures that are believed to assess different characteristics
Sensitivity
Ability of a test to obtain a positive finding when the condition is actually present
Positive Predictive Value
Probability that the disease is present when a test is positive
Specificity
Ability to obtain a negative finding when the condition is actually absent
Negative Predictive Value
Probability that the disease is not present when a test is negative
Change Score
Difference between the outcome and the initial scores
Used to determine the change in an individual’s performances
Provides basis for interferences related to the difference in the magnitude of change between individuals
Responsiveness to Change
Ability of an instrument to detect minimal change over time
Minimal Clinically Important Difference
Smallest difference in a measured variable that signifies an important difference in a patient’s condition
A statistically significant change in outcomes may not necessarily be clinically significant
Effectiveness of interventions must be based on relevance and clinical importance, not just statistical significance
Made within the context of the study, outcome measures used, population and setting, and design of the study
Level of Measurement
Issues Affecting Validity of Change Scores
Type of data obtained affects the ability of an instrument to demonstrate change
Accurate computations of change can only be made if the data available are either interval or ratio
Reliability
Issues Affecting Validity of Change Scores
Related to the concept of measurement error
Pre-test and post-test scores may be different because of random errors
An important precondition for application of change scores
Stability
Issues Affecting Validity of Change Scores
Labile Variables
Changing and fluctuating consistently
May not demonstrate change as a function of improvement following treatment because the change may be due to variable instability
Baseline
Floor Effect
When a client obtains low scores at baseline, deterioration may not be demonstrated
Ceiling Effect
When a client obtains high scores at baseline, further improvement may not be demonstrated
Clinical Utility
Practicality of administration
Involves:
Clarity of Instructions
Format
Ease of Administration
Involves time required to complete, administer scores, and interpret them
Experitise needed of assessors
Cost-effectiveness