1/33
CA1 - Final Term
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Reliability
Consistency, accuracy, dependability of the test results.
Classical Test Score Theory
Assumes that each person has a true score that would be obtained if there were no errors in measurement.
Classical Test Score Theory
A person’s observed score is made up of:
True score → their actual ability or knowledge
Error score → random influences like guessing or mistakes
Formula: Observed Score = True Score + Error
Systematic error
Is a consistent, predictable influence on test scores that can usually be identified and corrected.
Random error
Is an unpredictable fluctuation in the measurement process that is difficult to detect or remove, making it harder to estimate the true score.
Domain Sampling Method
Considers the problem created by using a limited number of items.
The more items, the higher the reliability.
What is the mantra on reliability?
Item Response Theory
Focuses on the range of item difficulty that helps assess an individual’s ability.
Individual’s ability
Refers to how skilled or knowledgeable an individual is.
Item difficulty
Refers to how hard a test question is, usually measured by the proportion of people who answered it correctly.
Item branching
A way of giving test questions that change depending on your previous answer, making the test adaptive.
Test-Retest Reliability
Refers to the consistency of test results when the same test is given to the same group of people at two different times.
Parallel Forms Reliability
Compares two equivalent forms of a test that measure the same attributes.
Internal Consistency
Refers to how well the items (questions) on a test measure the same idea or skill.
Split‑Half Reliability
The test is split into two halves.
Reliability is estimated by comparing scores from each half.
Spearman-Brown formula is used to adjust reliability for the reduced number of items.
Reliability may be lower because the test was cut in half.
Kuder-Richardson 20
Used for dichotomous items (questions with only one correct answer, e.g., true/false).
Assumes items vary in difficulty (easy, medium, hard).
All tests naturally have varying item difficulty unless justified otherwise.
Kuder-Richardson 21
Also for dichotomous items.
Assumes all items have the same level of difficulty (must be justified).
Simpler to compute but less precise.
Cronbach’s Coefficient Alpha
Used for polytomous items (questions with multiple possible answers, not just right/wrong).
Commonly applied to Likert‑scale items.
Estimates how consistently items measure the same construct when responses can vary in degree.
Interrater Reliability
Consistency of judges/raters evaluating the same behavior.
Validity
We measure if the test is measuring what it purports to measure.
Criterion Validity
How well it corresponds to a particular criterion.
Criterion Test
A well‑established psychological test that is already known to measure the construct correctly.
Used as a benchmark when developing new tests (e.g., comparing a new intelligence test to an existing one).
If both tests give similar results, it shows they measure the same thing.
Criterion Data
Any type of information or data that is easily accessible and can serve as a standard for comparison.
Predictive Validity
Refers to how well a test can forecast future performance or outcomes.
There is a time gap between taking the test and observing the results.
Using entrance exam scores to predict a student’s GPA in their fourth year.
Concurrent Validity
Refers to how well a test’s results agree with a criterion test or criterion data that measure the same construct at the same time.
Time elapsed is not important.
Shows that the test and the criterion are related and produce similar results.
Content Validity
Adequacy of representation of the conceptual domain the test is designed to cover.
Experts judge the validity of test items.
Construct Validity
Refers to how well a test truly measures the abstract concept it claims to measure.
Needed when measuring intangible traits (e.g., intelligence, anxiety, motivation).
Strongly based on theoretical frameworks and psychological models.
Harder to establish because it requires proof that the theory holds through research and evidence
Convergent Validity
Refers to how well your test is related to an existing theory or construct.
Shows that your test is measuring the same concept as other established measures.
If two tests measure the same construct, their results should be strongly related.
Divergent Validity
Refers to how well your test is not related to a different construct.
Proves that your test is measuring something unique, not overlapping with unrelated traits.
If two constructs are theoretically different, your test should not correlate with measures of the other.
Face Validity
Refers to whether a test appears to measure what it is supposed to measure, just by looking at it.
It’s about appearance and impression, not statistical proof.
Utility
Practicality or usefulness of the test.
Not a psychometric property.
It’s relative and subjective depending on situation or people.
A test can be reliable but not valid, but a test cannot be valid unless it’s reliable
What is the mantra of psychometric properties?
at least 0.70+ reliability
What is the minimum reliability standard for basic research?
at least 0.90+ reliability
What is the minimum reliability standard for clinical research?