C5 reliability cohen

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/113

There's no tags or description

Looks like no tags are added yet.

Last updated 2:06 AM on 4/23/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

114 Terms

New cards

Dependability or consistency

reliability is a synonym for _______ or ______66

New cards

Reliability

______ refers to consistency in measurement

New cards

Reliability coefficient

0-1

______ is a statistic that quantifies reliability, ranging from ____ (not at all reliable) to ____ (perfectly reliable

New cards

Error

______ has a broader meaning, referring both to preventable mistakes and to aspects of measurement imprecision that are inevitable.

New cards

False! With extremely accurate

measurement devices, fluctuations due to measurement error are still present but might be

trivially small.

True or false:,With extremely accurate

measurement devices, fluctuations due to measurement error are not present but even how

trivially small.

New cards

True

True or false: true scores can never be observed directly, they are a useful fiction that allows us to understand the concept of reliability more deeply

New cards

averaging many measurements

At best, we can approximate true scores by ______

New cards

First, time elapses between measurements

act of measurement can alter what is being estimated.

Unfortunately, when measuring something repeatedly, two influences interfere with

accurate measurement. First, ________. Some psychological

variables are in constant flux, such as mood, alertness, and motivation. Thus, the true score

a moment ago might differ markedly from the true score a moment from now. Second, _______

New cards

carryover effects

Measurement processes that alter what is measured are termed ______

New cards

Practkce effects

In ability tests, ________ are carryover effects in which the test itself provides an opportunity

to learn and practice the ability being measured.

New cards

Fatigue effects

_____ are carryover effects in which repeated testing reduces overall mental energy or motivation to perform on a test.

New cards

True score

rewind time and measure the quantity repeatedly without carryover effects, the longterm average of those estimates would equal the _____

New cards

True

True or false: true scores can only be approximated.

New cards

standard error of measurement

The standard deviation of those repeated measurements is called the _________, which represents the typical distance from an observed score to the true score

New cards

measurement instrument

Confusingly, the true score is not necessarily the truth. By definition, a true score is tied to the _______ used

New cards

Construct score

If you are interested in the truth independent of measurement, you are not looking for

the so-called true score, but what psychologists call the _______

New cards

Construct

______ is a theoretical variable we believe exists, such as depression, agreeableness, or reading ability

New cards

Cosntruct

Identical

_______ is a person's standing on a theoretical variable independent of any particular measurement.

If we could create tests that perfectly measured theoretical constructs, the true score and the construct score would be ______

New cards

Reliwble, valid

_______ tests give scores that closely approximate true scores.

______ tests give scores that closely approximate construct scores

New cards

reliable but not valid.

A deeply flawed test that gives consistent measurements is ______

New cards

True score

the long-term average of many measurements free of carryover effects. We will symbolize it as ___

New cards

Observed score

When we take a measurement, that measurement is called an _____ which we will be symbolized as ____

New cards

This amount of measurement error will be symbolized by the letter ______

New cards

X= T + E

The observed score X is related to the true score T and the measurement error score E with this famous equation:

New cards

True score

Measurement error

We would like to be able to describe how much the observed score is influenced by the ______ and how much the observed score is determined by ______

New cards

variability of test scores

Because we cannot view the true scores or the error scores directly, we need an indirect method of estimating their influence. We can indirectly estimate how much the true score influences the observed score by measuring the ______

New cards

Variance

A statistic useful in describing sources of test score variability is the _____—the standard deviation squared

New cards

Variance

This statistic is useful because it can be broken into components. If we measured many people on a test, their scores would differ from each other in part because they have different true scores and in part because of measurement error.

New cards

True variance

Error variance

Variance from true differences is _______, and variance from irrelevant, random sources is ______

New cards

σ 2 = σ 2t + σ 2e

If σ2 represents the total observed variance, its relation with the true variance and the error

variance, can be expressed as _______

New cards

Reliability

refers to the proportion of the total variance attributed to true variance.

New cards

Stable, consistent

Because true differences are assumed to be _______, they are presumed to yield ______ scores on repeated administrations of the same test as well as on equivalent forms of test

New cards

systematic or random

Measurement error can be ______ or ______(sometimes referred to as "noise").

New cards

Random error

consists of unpredictable fluctuations and inconsistencies of other variables in the measurement process

New cards

Random error

Sometimes referred to as "noise," this source of error fluctuates from one testing situation to another with no discernible pattern that would systematically raise or lower scores

New cards

Random errors are

increase or decrease test scores unpredictably. On average and in the long run, these errors tend to _____ each other out

New cards

Systematic errors

do not cancel each other out because they influence test scores in a consistent direction.

they either consistently inflate scores or consistently deflate scores.

New cards

systematic error

Once a ________ becomes known,

it becomes predictable—as well as fixable.

Note that this error does not

affect score consistency

New cards

Bias

The technical term for the degree to which a

measure predictably overestimates or,underestimates a quantity _____

New cards

Bias

refers to the degree to which systematic error

influences the measurement.

New cards

test construction,

administration,

scoring, and/or interpretation

Sources of error variance include (3)

New cards

item sampling or content sampling,

Test constructiom

terms that refer to variation among items within a test as well as to variation among items between tests

It is under what source of error variance?

New cards

Test construction - item sampling or content sampling

The extent to which a testtaker's score

is affected by the content sampled on a test and by the way the content is sampled (i.e.,

the way in which the item is constructed) is a source of error variance

New cards

True variance

Error variance

From the perspective of a test creator, a challenge in test development is to maximize the proportion of the total variance that is _______ and to minimize the proportion of the total variance that is

_____.

New cards

Sampling error

The error in such research may be a result of _______—the extent to which the population of voters in the study actually was representative of voters in the election. The researchers may not have

gotten it right with respect to demographics, political party affiliation, or other factors related to

the population of voters.

New cards

Methodological error

The error in such research may be a result of sampling error—the extent to which the population of voters in the study actually was representative of voters in the election. The researchers may not have gotten it right with respect to demographics, political party affiliation, or other factors related to the population of voters. Alternatively, the researchers may have gotten such factors right but simply did not include enough people in their sample to draw the conclusions that they did.

This

situation brings us to another type of error, called _____

New cards

nonsystematic error

Potential sources of ______ error in such an

assessment situation include forgetting, failing to notice abusive behavior, and misunderstanding

instructions regarding reporting.

New cards

Systematic error

underreporting or overreporting of

perpetration of abuse also may contribute to _____

New cards

Test-retest reliability

_______ is an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.

New cards

Test-retest reliability

______ is appropriate when evaluating the reliability of a test that purports to measure something

that is relatively stable over time, such as a personality trait but if the characteristic being measured is assumed to fluctuate over time, then there would be little sense in assessing the reliability of the test using it

New cards

Test-retest

evaluation of a _______ reliability estimate must extend to a consideration of possible intervening factors between test

administrations.

New cards

test-retest reliability

An estimate of ______ may be most appropriate in gauging the reliability of tests that employ outcome measures such as reaction time or perceptual judgments (including

discriminations of brightness, loudness, or taste).

New cards

alternate-forms or parallel-forms reliability of the test

If you have ever taken a makeup exam in which the questions were not all the same as on the

test initially given, you have had experience with different forms of a test. And if you have

ever wondered whether the two forms of the test were really equivalent, you have wondered

about the _____ or ______

New cards

coefficient of equivalence

The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability, which is often termed the ____

New cards

True

True or false: Although frequently used interchangeably, there is a difference between the terms alternate forms and parallel forms

New cards

Parallel forms

______ of a test exist when, for each form of the test, the means and the variances of observed test scores are equal.

In theory, the means of scores obtained on (balnk 1) correlate equally with the true score. More practically, scores obtained on this tests correlate equally with other measures

New cards

parallel forms reliability

refers to an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observed test scores are equal

New cards

alternate forms

are simply different versions of a test that have been constructed so as to be parallel

New cards

False: Although they do not meet the requirements for

the legitimate designation "parallel," alternate forms of a test

are typically designed to be equivalent with respect to variables such as content and level of difficulty

True or false: Since they meet the requirements for the legitimate designation "parallel," ALTERNATE forms of a test are typically designed to be equivalent with respect to variables such as content and level of difficulty

New cards

Altrrnate forms reliability

refers to an estimate of the extent to which

these different forms of the same test have been affected by item sampling error, or other error

New cards

alternate forms

Estimating ________ reliability is straightforward: Calculate the correlation between scores from a representative sample of individuals who have taken both tests.

New cards

Test -retest

(1) Two test administrations

with the same group are required, and

(2) test scores may be affected by factors such as

motivation, fatigue, or intervening events such as practice, learning, or therapy

Obtaining estimates of alternate-forms reliability and parallel-forms reliability is similar

in two (2) ways to obtaining an estimate of _____ reliability:

New cards

internal consistency estimate of reliability or as an estimate of inter-item consistency.

evaluation of the internal consistency of the test items referred to as _____or _____

New cards

Split-Half Reliability Estimates

The Spearman-Brown formula

Inter-item consistency

Coefficient alpha

Kuder-Richardson 20

different methods of obtaining internal consistency estimates of reliability (5).

New cards

split-half reliability

obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once

It is a useful measure of reliability

when it is impractical or undesirable to assess reliability with two tests or to administer a test

twice (because of factors such as time or expense

New cards

Step 1. Divide the test into equivalent halves.

Step 2. Calculate a Pearson r between scores on the two halves of the test.

Step 3. Adjust the half-test reliability using the Spearman-Brown formula (discussed shortly).

The computation of a coefficient of split-half

reliability generally entails three (3) steps:

New cards

False! Should be randomly! (Own)

True or false: Dividing the test in the middle i nsplit half methods 8s recommended

New cards

odd-even reliability

Another acceptable way to split a test is to assign odd-numbered items to one half of

the test and even-numbered items to the other half. This method yields an estimate of split-half

reliability that is also referred to as _______. Yet another way to split a test is to

divide the test by content so that each half contains items equivalent with respect to content

and difficulty

New cards

The Spearman-Brown formula

allows a test developer or user to estimate internal consistency reliability from a correlation between two halves of a test.

New cards

The Spearman-Brown formula

reliability of a test is affected by its length, a formula like _____ is necessary for estimating the reliability

of a test that has been shortened or lengthened

New cards

r(xy)

In SB formula, _____ is equal to the Pearson r in the original-length test,

New cards

r(hh)

In SB formula, _____ is equal to the Pearson r in the tow half of thr tests

New cards

Spearman-Brown formula

Parallel tests

_______ can be used to see how the sum of many parallel tests becomes more reliable as the number of tests increases( could also be used to determine

the number of items needed to attain a desired level of reliability). When a single test has a low reliability, many _______ must be combined to achieve high levels of reliability

New cards

Inter-item consistency

Kuder and Richardson (1937)

refers to the degree of correlation among all the

items on a scale. A measure of it is calculated from a single administration of a single form of a test.

developed by _____

New cards

coefficient alpha

Cronbach

r(a)

0-1

may be thought of as the mean of all possible split-half correlations, corrected by the Spearman-Brown formula

developed by_____

Its symbol is _____

Typically ranges in value from ____-____

New cards

Loadings

coefficients with the Greek letter lambda (λ). These

coefficients are called _______, and they represent the strength of the relationship between the

true score and the observed scores

New cards

inter-scorer reliability

Variously referred to as scorer reliability, judge reliability, observer reliability, and interrater

reliability, _______ is the degree of agreement or consistency between

two or more scorers (or judges or raters) with regard to a particular measure

New cards

internal consistency

For a test designed for a single administration only, an estimate of _______ would be the reliability measure of choice

New cards

transient error

a source of error attributable to variations in the testtaker's feelings, moods, or mental state over time.

New cards

1) homogenous or heterogenous

2) dynamic or static traits/characteristic

3) restricte dor not ramge

4) speed or power test

5) if has criterion

considerations in nature of the test (5)

New cards

homogeneous

Recall that a test is said to be _______

in items if it is functionally uniform throughout

New cards

dynamic characteristic

is a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive

experiences

(Ex: anxiety)

New cards

static characteristic

one in which hourly assessments of this same stockbroker are made

(Ex: intelligence)

New cards

Power test

if some items are so difficult that no testtaker is able to obtain a perfect score, then the test is a _____

New cards

Speed test

generally contains items of uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly.

New cards

(1) test-retest reliability,

(2) alternate-forms reliability, or

(3) split-half reliability from two separately timed half tests

A reliability estimate of a speed test should be based on performance from two independent

testing periods using one of the following (3)

New cards

Criterion-referenced tests

is designed to provide an indication

of where a testtaker stands with respect to some variable or criterion, such as an educational or a vocational objective

New cards

Criterion-referenced tests

tend to contain material that has been mastered in hierarchical fashion. For example, the would-be pilot masters on-ground skills before attempting to master in-flight skills. Scores on criterion-referenced tests tend to be interpreted in pass-fail (or, perhaps more accurately, "master-failed-to-master") terms, and any scrutiny of performance on individual items tends to be for diagnostic and

remedial purposes

New cards

Variability

A measure of reliability, therefore, depends on the ______ of the test scores: how different

the scores are from one another

New cards

Also decrease

As individual differences (and the variability) decrease, a traditional measure of reliability

would _______, regardless of the stability of individual performance.

New cards

classical test theory (CTT) or true score (or

classical) model of measurement

______ or _______ is the most widely used and accepted model in the psychometric literature today—rumors of its demise have been greatly exaggerated

New cards

True score

a value that according to CTT genuinely reflects

an individual's ability (or trait) level as measured by a particular test. Let's emphasize here

that this value is indeed test dependent

New cards

true

True or false: according to CTT, A person's "true score" on one intelligence test, for

example, can vary greatly from that same person's "true score" on another intelligence test.

New cards

False! Baliktad beh, simple lang CTT

True or false: the assumptions of CTT is mpre difficult to meet than IRT

New cards

CTT

The advantage of ______ over any other model of measurement has to do with its compatibility

and ease of use with widely used statistical techniques

New cards

Domain sampling theory (modified today: generalizability theory)

1950 viable alternative to CTT

New cards

domain sampling theory

seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score.

New cards

Domain of behavior

the universe of items that could conceivably measure that behavior, can be thought of as a hypothetical construct: one that shares certain characteristics with (and is measured by) the sample of items that make up the test

New cards

Lee J. Cronbach (1970) and his colleagues

generalizability theory

universe

Proposed by _______

______ is based on idea that a person's test scores vary from testing to testing because of

variables in the testing situation. Instead of conceiving of all variability in a person's scores

as error, (blank 1 person) encouraged test developers and researchers to describe the details of the particular test situation or __ leading to a specific test score

100

New cards

Universe score (Mp)

True score

According to generalizability theory, given the exact same conditions of all the facets in the universe, the exact same test score should be obtained. This test score is the ______ symbolized by ____ and it is analogous to _____ in CTT