Reliability-Psychological testing and Assessment

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/61

There's no tags or description

Looks like no tags are added yet.

Last updated 12:06 AM on 5/16/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

62 Terms

New cards

Classical test theory

States that if we use x to represent an observed score, t to represent a true score, e to represent error

New cards

Systematic error

Would less likely affect scores, it is associated with the test that is applicable to all test takers, it does not change the variability of the distribution or affects reliability

New cards

Random error

Affects test performance more, it is not associated with the test itself

New cards

Reliability

Refers to the consistency of findings or results of a psychology research under study

New cards

Reliability

Refers to the trustworthiness of a measure, yielding the same results across multiple applications to the same sample

New cards

Test construction, test administration, test scoring and interpretation, other sources

Sources of error variance

New cards

Test construction

One source of variance during this is item sampling or content sampling, terms that refer to variation among items within a test as well as to variation among items between tests

New cards

Test construction

From the perspective of a test developer, a challenge in test development is to maximize the proportion of true variance from relevant differences and to minimize the proportion of error variance from irrelevant or random differences

New cards

Test administration

Sources of error variance that occur during this may influence the testtaker’s attention or motivation, the testtaker’s reactions to those influences are the sorice of error variance

New cards

Test environment, test taker variables, examiner related variables

Other sources of error under test administration are:

New cards

Test environment

The room temperatue, the level of lighting, and the amound of ventilation and noise, for instance

New cards

Test taker variables

Emotional problema, physical discomfort, lack of sleep, and the effects of food, drugs or any medication

New cards

Examiner related variables

Examiner’s physical appearance and manners, the presence or absence of an examiner, head nodding, eye movements, and non verbal gestures

New cards

Test scoring and interpretation

The advent of computed scoring and a growing reliance on objective, computer scorable items virtually have mostly eliminated error variance caused by scorer differences in many tests

New cards

Test scoring and interpretation

If subjectively is involved in scoring, then the scorer or rater can be a source of error variance

New cards

Other sources

Females may underreport abuse because of fear, shame, or social desirability factors and overreport abuse if they are seeking help

New cards

Test scoring and interpretation

Males may underreport abuse because of embarrassment and social desirabiloty factors and overreport abuse if they are attempting to justify the report

New cards

Test retest reliability, parallel and alternate forms reliability, internal consistency estimate of reliability/inter item consistency, inter score reliability

Types of reliability estimates

New cards

Test retest reliability

One way of estimating the reliability of a measuring instrument is by using the same instrument to measure the same thing at two points in time, it is obtained by correlation pairs of scores from the same people on two different administrations of the same test

New cards

Test retest reliability

Is appropriate when evaluation the reliability of a test that purports to measure something that is relatively stable over time, such as personality traits, it is also appropriate in measuring reaction time and perceptual judgments

New cards

Test retest reliability

This passage of time can be a source of error variance, the longer the time that passes, the likelihood that the reliability coefficient will be lower, factors include practice, memory, fatigue, and motivation

New cards

Parallel and alternate forms reliability

Evaluates the degree of the relationship between various forms of a test, two different forms

New cards

Parallel forms

__ of a test exist when for each form of the test, the means and the variances of observed test scores are equal, the mwans of scores obtained on this correlate equally with the true score, scores obtained here correlate equally with other measures, it has high correlation on form A and B and is more reliable

New cards

Alternate forms

Are simply different versions of a test that have been constructed so as to be parallel, although they do not meet the requirements for the parallel forms, are designed to be equivalent to variables such as content and level of difficulty, it has low correlation but forms A and B are still significant

New cards

Alternate forms

Minimizes the effect of memory to the test takers from the content or a previosuly administered form of the test, certain traits are presimed to be relatively stable in people over time, time consuming and expensive

New cards

Internal consistency estimate of reliability/ inter item consistency

A reliability estimate of a test can still be obtained without developing an alternate form of the test nor administering the test twice to the same people, entails an evaluation of the internal consistency of the test items

New cards

Split half reliability

Obtained by correlation two pairs of scores obtained from equivalent halves of a single test administered once, difficult to make

New cards

Split half reliability

Acceptable ways to split a test are randomly assign items to one or the other half of the test or using odd-even reliability, divide the test by content so that each half contains items equivalent to difficulty and content

New cards

Split half reliability

It is not recommended to simple divide the test in the middle because this procedure have factors to consider, different amounts of fatigue for the different parts of the test, different amounts of test anxiety, and differences in item placement in the test

New cards

Odd even reliability

Assign odd numbered items to one half of the test and even numbered items to the other half

New cards

Inter item consistency

Refers to the degree of correlation among all the items on a scale, it is calculated from a single administration of a single form of a test, is useful in assessing the homogenieity of the test or measures a single trait, the more homogeneous a test is

New cards

Heterogeneity

Non homogeneous is a degree to which a test measures different factors

New cards

Kuder-richardson formula

G. Frederic Kuder and M. W. Richardson developed their own measures for estimating reliability primarily for dichotomous items that replaced split half reliability, no equal variances

New cards

KR 20

__ and split half reliability estimates are similar, is the statistic of choice for determining the inter item consistency of dichotomous items, primarily those items that can be scored right or wrong, if more heterogeneous, this will yield lower reliability estimates

New cards

KR 21

Is used if there is reason to assume that all the test items have approximately the same degree of difficulty

New cards

KR 21

The formule for this has become outdated in an era of calculators and compited, in the past, this was sometimes used to estimate the other test because it is simpler and requires fewer calculations

New cards

Coefficient alpha

Developed by Cronbach and subsequently elaborated on by others, may be thought of as the mean of all possible split half correlations

New cards

Coefficient alpha

In contrast to KR 20, this is appropriate for use on tests containing non dichotomous/polytomous items, no right or wrong answers

New cards

Coefficient alpha

The preferred statistic for obtaining an estimate of internal consistency reliability, is widely used as a mesure of reliability because it requires only one administration of the test

New cards

Coefficient alpha

Typically ranges in value from zero, no similarity to one, perfectly identical, negative values of alpha are theoretically impossible, if under such rare circumstances, then the alpha can be reported as zero

New cards

Inter scorer reliability

Also referred as judge reliability, observer reliability and inter rater reliability, it is the degree of agreement or consistency between two or more scorers or judges with regards to a particular measure

New cards

Nature of tests

Considerations concerning the purpose and use of a reliability coefficient are those concerning the nature of the test itself

New cards

Test items are homogeneous or heterogeneous in nature, the characteristic, ability, or trait being measured is presumed to be dynamic or static, range of test scores is or is not restricted, the test is a speed or power test, the test is or is not criterion referenced

Some things we have to consider as to whether:

New cards

Homogeneous

A test is said to be this in items if it is functionally uniform throughout, tests designed to measure one factor, such as one ability or one trait, are expected to be __ in items, it is expected for internal consistency to be high

New cards

Heterogeneous

If the test is __ in terms, it is expected for test retest reliability to be high

New cards

Dynamic characteristic

Is a trait, state, or ability presumed to be ever changing as a function of situational and cognitive experiences are best obtained through internal consistency

New cards

Static characteristic

Such as traits, state, or ability presumed to be relatively unchanging such as intelligence are best measured by test retest or the alternate forms

New cards

Restriction

If the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be low

New cards

Inflation

If the variance of either variable in a correlational analysis is inflated by the sampling procedure then the resulting correlation coefficient tends to be high

New cards

Power test

No time limit, allows testtkers to attempt all items, there are some items that starts from easy to difficult items that no test taker is no longer able to answer

New cards

Speed test

Contains items of uniform level of difficulty so that, when we give generous time limit, all test takers should be able to answer as many test items as possible

New cards

Criterion referenced test

Is designed to provide an indication of where a testtaker stands with respect to some variable or criterion, used frequenly to gauge achievement or mastery, scores on these tests tend to be interpreted in pass or fail

New cards

Classical test theory, domain sampling theory, generalizability theory, item response theory

Reliability theories

New cards

Classical test theory/true score theory

The test is the unit of analysis, it is explained by the formula X is equal to T added to E

New cards

True score

The ability to be measured is not always evident because it is covered by error

New cards

Confidence interval

Since we cannot measure the actual abiloty, we can measure for its __, the location where the true score is, the range of the true score

New cards

Domain sampling theory

True score is equal to a universe or infinity, small number of items is a representative of the bigger number of iterms, controls sampling error in selection of items, more items, more reliable, internal consistency is perhaps the most compatible

New cards

Generalizability theory

Developed by Lee Cronbach, a person’s test scores vary from testing to testing because of variables in the testing situation or bias, it may contain more sources of error

New cards

Item response theory/latent trait theory

The test items are the unit of analysis, it is possible to have lesser iterms to be reliable

New cards

Item branching

Calibrates difficulty of items depending on the testtaker’s performance

New cards

Item difficulty in IRT

The attribute of not being easily solved, or comprehended

New cards

Item discrimination in IRT

The degree to which an item differentiates among people with higher or lower levels of what is being measured