Reliability II

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/11

There's no tags or description

Looks like no tags are added yet.

Last updated 8:41 AM on 6/18/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

12 Terms

New cards

Inter-Item Consistency

refers to the degree of correlation among all the items on a scale •

An index of inter-item consistency is useful in assessing the homogeneity of the test

New cards

Test Homogenity

Tests are said to be homogenous if they contain items that measure a single trait
In contrast to test homogeneity, heterogeneity describes the degree to which a test measures different factors , more than one trait
Because a homogeneous test samples a relatively narrow content area, it is to be expected to contain more inter-item consistency than a heterogeneous test
Test homogeneity is desirable because it allows relatively straightforward test-score interpretation
Testtakers with the same score on a homogeneous test probably have similar abilities in the area tested
Testtakers with the same score on a more heterogeneous test may have quite different abilities

New cards

Kuder-Richardson Formula KR-20

Where test items are highly homogeneous, KR-20 and split-half reliability estimates will be similar

KR-20 is the statistic of choice for determining the internal consistency of dichotomous items, primarily those items that can be scored right or wrong (such as multiple-choice items)

Many modifications. Most popular is A Coefficient

New cards

Coefficient Alpha

coefficient alpha is appropriate for use on tests containing non dichotomous items
the preferred statistic for obtaining an estimate of internal consistency reliability
Coefficient alpha typically ranges in value from 0 to 1, helping to answer the question how similar sets of data are
Similarity is gauged, in essence, on a scale from 0 (absolutely no similarity) to 1 (perfectly identical)
Values of alpha above .90 may be “too high” and indicate redundancy in the items
the coefficient alpha provides a measure that is loosely equivalent to the average of all possible split-half reliability coefficients

New cards

Internal Consistency and Testtakers Characteristics

All indexes of reliability, coefficient alpha among them, provide an index that is a characteristic of a particular group of test scores, not the test itself

If a new group of testtakers is sufficiently different from the group of testtakers on whom the reliability studies were done, the reliability coefficient may not be the same as the previously reported one

New cards

Inter-Scorer Reliability

the degree of agreement or consistency between two or more scorers
If the reliability coefficient is very high, the prospective test user knows that test scores can be derived in a systematic
coefficient of inter-scorer reliability= degree of consistency among scorers in the scoring

New cards

Nature of the Test – Homogeneity VS Heterogeneity

A test is said to be homogeneous in items if it is functionally uniform throughout
Tests designed to measure one factor, such as one ability or one trait, are expected to be homogeneous in items. For such tests it is reasonable to expect a high degree of internal consistency
By contrast, if the test is heterogeneous in items, an estimate of internal consistency might be low relative to a more appropriate estimate of test-retest reliability

New cards

Nature of the Test – Dynamic VS Static

A dynamic characteristic is a trait, state, or ability presumed to be ever changing as a function of situational and cognitive experiences (e.g. the dynamic characteristic of anxiety)
In the case of dynamic characteristics the best estimate of reliability could be obtained from an internal consistency measure
A static characteristic is a trait, state, or ability presumed to be relatively unchanging (e.g. intelligence). In this instance, either the test-retest or alternate forms method would be appropriate

New cards

Nature of the Test – Restriction VS Inflation Range

If the variance of either variable in a correlational analysis is restricted by the sampling procedure used , then the resulting correlation coefficient tends to be lower

If the variance of either variable in a correlational analysis is inflated by the sampling procedure , then the resulting correlation coefficient tends to be higher

New cards

Nature of the Test – Speed VS Power Tests

When a time limit is long enough to allow testtakers to attempt all items, and if some items are so difficult that no testtaker is able to obtain a perfect score, then the test is a power test

By contrast, a speed test generally contains items of uniform level of difficulty so that when given generous time limits, all testtakers should be able to complete all the test items correctly

The time limit on a speed test is established so that few if any of the testtakers will be able to complete the entire test

Score differences on a speed test are therefore based on performance speed because items attempted tend to be correct

New cards

Reliability of Speed Tests

A reliability estimate of a speed test should be based on performance from two independent testing periods using one of the following: • test-retest reliability • alternate-forms reliability or • split-half reliability from two separately timed half tests.

reliability of a speed test should reflect the consistency of response speed, the reliability of a speed test should not be computed from a single administration of the test with a single time limit

If a speed test is administered once and some measure of internal consistency is computed, like the Kuder-Richardson or a split-half correlation, the result will be a spuriously high reliability coefficient

New cards

Nature of the Test – Criterion Referenced Tests

designed to provide an indication of where a testtaker stands with respect to some criterion such as an educational or a vocational objective

Scores on criterion-referenced tests tend to be interpreted in pass/fail or “master/ failed-to-master” terms

how different the scores are from one another is seldom a focus of interest. The critical issue for the user of a mastery test is whether or not a certain criterion score has been achieved • Therefore, traditional procedures for estimating reliability are usually not appropriate