1/74
Flashcards covering key concepts on reliability in psychometrics, including types of reliability, error, correlation, and measurement theories, based on Week 4 lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is reliability in the context of psychological tests?
Reliability refers to how much we can trust the test to measure approximately the same way each time, indicating precision or consistency.
What are the two ways reliability might be used?
Reliability might be used to refer to the reliability/precision of test scores (consistency in general) or as a reliability coefficient (statistical evaluation).
Can a test be reliable but not valid?
Yes, a test can be reliable but not valid.
What are the components of an observed assessment score according to Classical Test Theory?
An observed assessment score equals a 'true' score (which cannot be measured) plus measurement error (which can be estimated).
What is random error in testing?
Random error includes factors like testing room conditions, administrator error, or issues with the measurement tool, and it lowers the reliability of a test.
What is systematic error in testing?
Systematic error is when a single source always increases or decreases a score by the same amount (e.g., a scale off by 2 pounds, a proctor giving too few minutes), and it does not lower the reliability of the test.
List factors that can contribute to measurement error in a test.
Factors that can contribute to measurement error include test length, homogeneity of questions, test-retest interval, test administration, scoring, and cooperation of test takers.
When might factors typically considered error variance be classified as true variance?
If the purpose of testing is to measure fluctuations in a characteristic (e.g., mood), day-to-day changes would be part of true variance; if interested in a permanent characteristic, those fluctuations would be error variance.
What does the Pearson Product-Moment Correlation take into account?
It considers not only a person’s position in the group but also the amount of their deviation above or below the group mean.
What is the formula for reliability (rxx) in terms of variance?
Reliability (rxx) = true-score variance (σ²t) / observed-score variance (σ²o).
What does observed-score variance include?
Observed-score variance includes measurement error.
What does it mean for a correlation to be significant at the 1% or .01 level?
It means there is no greater than a 1 out of 100 chance that the population correlation is 0, implying the two variables are truly correlated.
What range of reliability coefficients typically indicates strong reliability?
Coefficients in the .80s or .90s typically mean strong reliability.
What does test-retest reliability measure?
It measures the stability of test scores over time.
What is considered error in test-retest reliability?
Error consists of random fluctuations of performance from one test session to the other.
Why should the time interval between test-retest administrations be reported?
Because correlations decrease as the interval lengthens, and a shorter interval (less than 6 months, shorter for children due to developmental changes) is generally preferred for measures of stability.
When might alternate forms of a test be useful?
They are useful for evaluating improvement, when someone seeks a second opinion and a test was recently administered, or after a 'spoiled' administration of an initial test.
What is 'item sampling' or 'content sampling' in the context of alternate forms?
It refers to error variance resulting from the specific selection of items, impacting to what extent scores depend on factors specific to that selection.
What are the limitations of alternate forms reliability?
If the behavior is subject to a large practice effect, alternate forms will reduce but not eliminate it. Also, the range of items, length, and difficulty should be equal between forms.
What does split-half reliability measure?
It measures the internal consistency of the content sampling.
For what types of tests can split-half reliability be used?
It can only be used for homogeneous tests; for heterogeneous tests, it should be evaluated for each subtest.
What is the purpose of the Spearman-Brown correction in split-half reliability?
It is used to correct for the decrease in reliability that occurs when a test is shortened (by splitting it in half).
Which statistical methods are used for split-half reliability when questions have binary (T/F, right/wrong) vs. multiple answers?
KR-20 is used for binary questions, while Cronbach's Alpha is used when questions can have multiple answers.
What does interrater reliability assess?
It assesses whether evaluators or observers agree, typically used for more subjective or creative measures.
What type of data is diagnosis considered?
Diagnosis is considered nominal data.
What statistical measure is used to calculate agreement between two raters for nominal data?
Cohen's Kappa is used for nominal data to measure agreement between two raters.
What is the range of reliability coefficients?
Reliability coefficients range from 0 (not reliable at all) to 1 (very reliable).
If a test has a reliability coefficient of .85, what does it mean?
It means 85% of the variance in test scores depends on the true variance of the trait measured, and 15% depends on error variance.
List ways to increase test reliability.
Ways to increase test reliability include increasing the number of questions, improving question quality (clarity, homogeneity), decreasing the interval between administrations (cautious of practice effects), administering the test in a standardized manner, careful scoring, and ensuring test takers' cooperation.
What type of error variance is associated with 'time sampling'?
Test-retest reliability is associated with time sampling.
What type of error variance is associated with 'content sampling'?
Alternate forms (immediate) and split-half reliability are associated with content sampling.
What is the Standard Error of Measurement (SEM)?
It is the amount of error in an individual test score.
What is the formula for calculating SEM?
SEM = SD * sqrt(1 - rxx), where SD is the standard deviation and rxx is the reliability coefficient.
If an IQ test has an SD of 15 and a reliability coefficient of .89, what is the SEM?
SEM = 15 * sqrt(1 - .89) = 15 * sqrt(.11) ≈ 15 * 0.33 ≈ 5.
How does SEM compare to the reliability coefficient for interpreting individual scores versus comparing tests?
SEM is more appropriate for interpreting individual scores, while the reliability coefficient is better for comparing the reliability of different tests.
What is a confidence interval in the context of psychological testing?
It is the range of scores we are confident will contain the 'true' score.
How does Generalizability Theory differ from Classical Test Theory?
Classical Test Theory focuses on random measurement error, whereas Generalizability Theory allows for the evaluation of both random and systematic error.
What is reliability in the context of psychological tests?
Reliability refers to how much we can trust the test to measure approximately the same way each time, indicating precision or consistency.
What are the two ways reliability might be used?
Reliability might be used to refer to the reliability/precision of test scores (consistency in general) or as a reliability coefficient (statistical evaluation).
Can a test be reliable but not valid?
Yes, a test can be reliable but not valid.
What are the components of an observed assessment score according to Classical Test Theory?
An observed assessment score equals a 'true' score (which cannot be measured) plus measurement error (which can be estimated).
What is random error in testing?
Random error includes factors like testing room conditions, administrator error, or issues with the measurement tool, and it lowers the reliability of a test.
What is systematic error in testing?
Systematic error is when a single source always increases or decreases a score by the same amount (e.g., a scale off by 2 pounds, a proctor giving too few minutes), and it does not lower the reliability of the test.
List factors that can contribute to measurement error in a test.
Factors that can contribute to measurement error include test length, homogeneity of questions, test-retest interval, test administration, scoring, and cooperation of test takers.
When might factors typically considered error variance be classified as true variance?
If the purpose of testing is to measure fluctuations in a characteristic (e.g., mood), day-to-day changes would be part of true variance; if interested in a permanent characteristic, those fluctuations would be error variance.
What does the Pearson Product-Moment Correlation take into account?
It considers not only a person’s position in the group but also the amount of their deviation above or below the group mean.
What is the formula for reliability (r_{xx}) in terms of variance?
Reliability (r{xx}) = true-score variance (\sigma^2t) / observed-score variance (\sigma^2_o).
What does observed-score variance include?
Observed-score variance includes measurement error.
What does it mean for a correlation to be significant at the 1% or .01 level?
It means there is no greater than a 1 out of 100 chance that the population correlation is 0, implying the two variables are truly correlated.
What range of reliability coefficients typically indicates strong reliability?
Coefficients in the .80s or .90s typically mean strong reliability.
What does test-retest reliability measure?
It measures the stability of test scores over time.
What is considered error in test-retest reliability?
Error consists of random fluctuations of performance from one test session to the other.
Why should the time interval between test-retest administrations be reported?
Because correlations decrease as the interval lengthens, and a shorter interval (less than 6 months, shorter for children due to developmental changes) is generally preferred for measures of stability.
When might alternate forms of a test be useful?
They are useful for evaluating improvement, when someone seeks a second opinion and a test was recently administered, or after a 'spoiled' administration of an initial test.
What is 'item sampling' or 'content sampling' in the context of alternate forms?
It refers to error variance resulting from the specific selection of items, impacting to what extent scores depend on factors specific to that selection.
What are the limitations of alternate forms reliability?
If the behavior is subject to a large practice effect, alternate forms will reduce but not eliminate it. Also, the range of items, length, and difficulty should be equal between forms.
What does split-half reliability measure?
It measures the internal consistency of the content sampling.
For what types of tests can split-half reliability be used?
It can only be used for homogeneous tests; for heterogeneous tests, it should be evaluated for each subtest.
What is the purpose of the Spearman-Brown correction in split-half reliability?
It is used to correct for the decrease in reliability that occurs when a test is shortened (by splitting it in half).
Which statistical methods are used for split-half reliability when questions have binary (T/F, right/wrong) vs. multiple answers?
KR-20 is used for binary questions, while Cronbach's Alpha is used when questions can have multiple answers.
What does interrater reliability assess?
It assesses whether evaluators or observers agree, typically used for more subjective or creative measures.
What type of data is diagnosis considered?
Diagnosis is considered nominal data.
What statistical measure is used to calculate agreement between two raters for nominal data?
Cohen's Kappa is used for nominal data to measure agreement between two raters.
What is the range of reliability coefficients?
Reliability coefficients range from 0 (not reliable at all) to 1 (very reliable).
If a test has a reliability coefficient of .85, what does it mean?
It means 85% of the variance in test scores depends on the true variance of the trait measured, and 15% depends on error variance.
List ways to increase test reliability.
Ways to increase test reliability include increasing the number of questions, improving question quality (clarity, homogeneity), decreasing the interval between administrations (cautious of practice effects), administering the test in a standardized manner, careful scoring, and ensuring test takers' cooperation.
What type of error variance is associated with 'time sampling'?
Test-retest reliability is associated with time sampling.
What type of error variance is associated with 'content sampling'?
Alternate forms (immediate) and split-half reliability are associated with content sampling.
What is the Standard Error of Measurement (SEM)?
It is the amount of error in an individual test score.
What is the formula for calculating SEM?
SEM = SD * sqrt(1 - r{xx}), where SD is the standard deviation and r{xx} is the reliability coefficient.
If an IQ test has an SD of 15 and a reliability coefficient of .89, what is the SEM?
SEM = 15 * sqrt(1 - .89) = 15 * sqrt(.11) \,\approx\, 15 * 0.33 \,\approx\, 5.
How does SEM compare to the reliability coefficient for interpreting individual scores versus comparing tests?
SEM is more appropriate for interpreting individual scores, while the reliability coefficient is better for comparing the reliability of different tests.
What is a confidence interval in the context of psychological testing?
It is the range of scores we are confident will contain the 'true' score.
How does Generalizability Theory differ from Classical Test Theory?
Classical Test Theory focuses on random measurement error, whereas Generalizability Theory allows for the evaluation of both random and systematic error.
Summarize the primary methods for estimating test reliability, including their administration procedures and the statistical formulas used.
Reliability Method | Test Administration | Formula / Statistical Measure |
---|---|---|
Test-Retest | Administer the same test to the same group at two different times. | Pearson Product-Moment Correlation |
Alternate Forms | Administer two different but equivalent forms of a test to the same group, either immediately or with a time interval. | Pearson Product-Moment Correlation |
Split-Half | Administer a single test once, then divide the items into two equivalent halves. | Pearson Product-Moment Correlation corrected by Spearman-Brown formula. For internal consistency of multiple-choice or binary items, use KR-20 (binary) or Cronbach's Alpha (multiple answers). |
Interrater | Two or more independent raters evaluate the same performance, behavior, or product. | Cohen's Kappa (\kappa) for nominal data (two raters), or other agreement coefficients. |