PSYCHOMETRICS - Week 4: Reliability

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/74

Earn XP

Description and Tags

Flashcards covering key concepts on reliability in psychometrics, including types of reliability, error, correlation, and measurement theories, based on Week 4 lecture notes.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

75 Terms

New cards

What is reliability in the context of psychological tests?

Reliability refers to how much we can trust the test to measure approximately the same way each time, indicating precision or consistency.

New cards

What are the two ways reliability might be used?

Reliability might be used to refer to the reliability/precision of test scores (consistency in general) or as a reliability coefficient (statistical evaluation).

New cards

Can a test be reliable but not valid?

Yes, a test can be reliable but not valid.

New cards

What are the components of an observed assessment score according to Classical Test Theory?

An observed assessment score equals a 'true' score (which cannot be measured) plus measurement error (which can be estimated).

New cards

What is random error in testing?

Random error includes factors like testing room conditions, administrator error, or issues with the measurement tool, and it lowers the reliability of a test.

New cards

What is systematic error in testing?

Systematic error is when a single source always increases or decreases a score by the same amount (e.g., a scale off by 2 pounds, a proctor giving too few minutes), and it does not lower the reliability of the test.

New cards

List factors that can contribute to measurement error in a test.

Factors that can contribute to measurement error include test length, homogeneity of questions, test-retest interval, test administration, scoring, and cooperation of test takers.

New cards

When might factors typically considered error variance be classified as true variance?

If the purpose of testing is to measure fluctuations in a characteristic (e.g., mood), day-to-day changes would be part of true variance; if interested in a permanent characteristic, those fluctuations would be error variance.

New cards

What does the Pearson Product-Moment Correlation take into account?

It considers not only a person’s position in the group but also the amount of their deviation above or below the group mean.

New cards

What is the formula for reliability (rxx) in terms of variance?

Reliability (rxx) = true-score variance (σ²t) / observed-score variance (σ²o).

New cards

What does observed-score variance include?

Observed-score variance includes measurement error.

New cards

What does it mean for a correlation to be significant at the 1% or .01 level?

It means there is no greater than a 1 out of 100 chance that the population correlation is 0, implying the two variables are truly correlated.

New cards

What range of reliability coefficients typically indicates strong reliability?

Coefficients in the .80s or .90s typically mean strong reliability.

New cards

What does test-retest reliability measure?

It measures the stability of test scores over time.

New cards

What is considered error in test-retest reliability?

Error consists of random fluctuations of performance from one test session to the other.

New cards

Why should the time interval between test-retest administrations be reported?

Because correlations decrease as the interval lengthens, and a shorter interval (less than 6 months, shorter for children due to developmental changes) is generally preferred for measures of stability.

New cards

When might alternate forms of a test be useful?

They are useful for evaluating improvement, when someone seeks a second opinion and a test was recently administered, or after a 'spoiled' administration of an initial test.

New cards

What is 'item sampling' or 'content sampling' in the context of alternate forms?

It refers to error variance resulting from the specific selection of items, impacting to what extent scores depend on factors specific to that selection.

New cards

What are the limitations of alternate forms reliability?

If the behavior is subject to a large practice effect, alternate forms will reduce but not eliminate it. Also, the range of items, length, and difficulty should be equal between forms.

New cards

What does split-half reliability measure?

It measures the internal consistency of the content sampling.

New cards

For what types of tests can split-half reliability be used?

It can only be used for homogeneous tests; for heterogeneous tests, it should be evaluated for each subtest.

New cards

What is the purpose of the Spearman-Brown correction in split-half reliability?

It is used to correct for the decrease in reliability that occurs when a test is shortened (by splitting it in half).

New cards

Which statistical methods are used for split-half reliability when questions have binary (T/F, right/wrong) vs. multiple answers?

KR-20 is used for binary questions, while Cronbach's Alpha is used when questions can have multiple answers.

New cards

What does interrater reliability assess?

It assesses whether evaluators or observers agree, typically used for more subjective or creative measures.

New cards

What type of data is diagnosis considered?

Diagnosis is considered nominal data.

New cards

What statistical measure is used to calculate agreement between two raters for nominal data?

Cohen's Kappa is used for nominal data to measure agreement between two raters.

New cards

What is the range of reliability coefficients?

Reliability coefficients range from 0 (not reliable at all) to 1 (very reliable).

New cards

If a test has a reliability coefficient of .85, what does it mean?

It means 85% of the variance in test scores depends on the true variance of the trait measured, and 15% depends on error variance.

New cards

List ways to increase test reliability.

Ways to increase test reliability include increasing the number of questions, improving question quality (clarity, homogeneity), decreasing the interval between administrations (cautious of practice effects), administering the test in a standardized manner, careful scoring, and ensuring test takers' cooperation.

New cards

What type of error variance is associated with 'time sampling'?

Test-retest reliability is associated with time sampling.

New cards

What type of error variance is associated with 'content sampling'?

Alternate forms (immediate) and split-half reliability are associated with content sampling.

New cards

What is the Standard Error of Measurement (SEM)?

It is the amount of error in an individual test score.

New cards

What is the formula for calculating SEM?

SEM = SD * sqrt(1 - rxx), where SD is the standard deviation and rxx is the reliability coefficient.

New cards

If an IQ test has an SD of 15 and a reliability coefficient of .89, what is the SEM?

SEM = 15 * sqrt(1 - .89) = 15 * sqrt(.11) ≈ 15 * 0.33 ≈ 5.

New cards

How does SEM compare to the reliability coefficient for interpreting individual scores versus comparing tests?

SEM is more appropriate for interpreting individual scores, while the reliability coefficient is better for comparing the reliability of different tests.

New cards

What is a confidence interval in the context of psychological testing?

It is the range of scores we are confident will contain the 'true' score.

New cards

How does Generalizability Theory differ from Classical Test Theory?

Classical Test Theory focuses on random measurement error, whereas Generalizability Theory allows for the evaluation of both random and systematic error.

New cards

What is reliability in the context of psychological tests?

Reliability refers to how much we can trust the test to measure approximately the same way each time, indicating precision or consistency.

New cards

What are the two ways reliability might be used?

Reliability might be used to refer to the reliability/precision of test scores (consistency in general) or as a reliability coefficient (statistical evaluation).

New cards

Can a test be reliable but not valid?

Yes, a test can be reliable but not valid.

New cards

What are the components of an observed assessment score according to Classical Test Theory?

An observed assessment score equals a 'true' score (which cannot be measured) plus measurement error (which can be estimated).

New cards

What is random error in testing?

Random error includes factors like testing room conditions, administrator error, or issues with the measurement tool, and it lowers the reliability of a test.

New cards

What is systematic error in testing?

New cards

List factors that can contribute to measurement error in a test.

Factors that can contribute to measurement error include test length, homogeneity of questions, test-retest interval, test administration, scoring, and cooperation of test takers.

New cards

When might factors typically considered error variance be classified as true variance?

New cards

What does the Pearson Product-Moment Correlation take into account?

It considers not only a person’s position in the group but also the amount of their deviation above or below the group mean.

New cards

What is the formula for reliability (r_{xx}) in terms of variance?

Reliability (r{xx}) = true-score variance (\sigma^2t) / observed-score variance (\sigma^2_o).

New cards

What does observed-score variance include?

Observed-score variance includes measurement error.

New cards

What does it mean for a correlation to be significant at the 1% or .01 level?

It means there is no greater than a 1 out of 100 chance that the population correlation is 0, implying the two variables are truly correlated.

New cards

What range of reliability coefficients typically indicates strong reliability?

Coefficients in the .80s or .90s typically mean strong reliability.

New cards

What does test-retest reliability measure?

It measures the stability of test scores over time.

New cards

What is considered error in test-retest reliability?

Error consists of random fluctuations of performance from one test session to the other.

New cards

Why should the time interval between test-retest administrations be reported?

New cards

When might alternate forms of a test be useful?

They are useful for evaluating improvement, when someone seeks a second opinion and a test was recently administered, or after a 'spoiled' administration of an initial test.

New cards

What is 'item sampling' or 'content sampling' in the context of alternate forms?

It refers to error variance resulting from the specific selection of items, impacting to what extent scores depend on factors specific to that selection.

New cards

What are the limitations of alternate forms reliability?

If the behavior is subject to a large practice effect, alternate forms will reduce but not eliminate it. Also, the range of items, length, and difficulty should be equal between forms.

New cards

What does split-half reliability measure?

It measures the internal consistency of the content sampling.

New cards

For what types of tests can split-half reliability be used?

It can only be used for homogeneous tests; for heterogeneous tests, it should be evaluated for each subtest.

New cards

What is the purpose of the Spearman-Brown correction in split-half reliability?

It is used to correct for the decrease in reliability that occurs when a test is shortened (by splitting it in half).

New cards

Which statistical methods are used for split-half reliability when questions have binary (T/F, right/wrong) vs. multiple answers?

KR-20 is used for binary questions, while Cronbach's Alpha is used when questions can have multiple answers.

New cards

What does interrater reliability assess?

It assesses whether evaluators or observers agree, typically used for more subjective or creative measures.

New cards

What type of data is diagnosis considered?

Diagnosis is considered nominal data.

New cards

What statistical measure is used to calculate agreement between two raters for nominal data?

Cohen's Kappa is used for nominal data to measure agreement between two raters.

New cards

What is the range of reliability coefficients?

Reliability coefficients range from 0 (not reliable at all) to 1 (very reliable).

New cards

If a test has a reliability coefficient of .85, what does it mean?

It means 85% of the variance in test scores depends on the true variance of the trait measured, and 15% depends on error variance.

New cards

List ways to increase test reliability.

New cards

What type of error variance is associated with 'time sampling'?

Test-retest reliability is associated with time sampling.

New cards

What type of error variance is associated with 'content sampling'?

Alternate forms (immediate) and split-half reliability are associated with content sampling.

New cards

What is the Standard Error of Measurement (SEM)?

It is the amount of error in an individual test score.

New cards

What is the formula for calculating SEM?

SEM = SD * sqrt(1 - r{xx}), where SD is the standard deviation and r{xx} is the reliability coefficient.

New cards

If an IQ test has an SD of 15 and a reliability coefficient of .89, what is the SEM?

SEM = 15 * sqrt(1 - .89) = 15 * sqrt(.11) \,\approx\, 15 * 0.33 \,\approx\, 5.

New cards

How does SEM compare to the reliability coefficient for interpreting individual scores versus comparing tests?

SEM is more appropriate for interpreting individual scores, while the reliability coefficient is better for comparing the reliability of different tests.

New cards

What is a confidence interval in the context of psychological testing?

It is the range of scores we are confident will contain the 'true' score.

New cards

How does Generalizability Theory differ from Classical Test Theory?

Classical Test Theory focuses on random measurement error, whereas Generalizability Theory allows for the evaluation of both random and systematic error.

New cards

Summarize the primary methods for estimating test reliability, including their administration procedures and the statistical formulas used.

Reliability Method	Test Administration	Formula / Statistical Measure
Test-Retest	Administer the same test to the same group at two different times.	Pearson Product-Moment Correlation
Alternate Forms	Administer two different but equivalent forms of a test to the same group, either immediately or with a time interval.	Pearson Product-Moment Correlation
Split-Half	Administer a single test once, then divide the items into two equivalent halves.	Pearson Product-Moment Correlation corrected by Spearman-Brown formula. For internal consistency of multiple-choice or binary items, use KR-20 (binary) or Cronbach's Alpha (multiple answers).
Interrater	Two or more independent raters evaluate the same performance, behavior, or product.	Cohen's Kappa (\kappa) for nominal data (two raters), or other agreement coefficients.