1/113
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Dependability or consistency
reliability is a synonym for _______ or ______66
Reliability
______ refers to consistency in measurement
Reliability coefficient
0-1
______ is a statistic that quantifies reliability, ranging from ____ (not at all reliable) to ____ (perfectly reliable
Error
______ has a broader meaning, referring both to preventable mistakes and to aspects of measurement imprecision that are inevitable.
False! With extremely accurate
measurement devices, fluctuations due to measurement error are still present but might be
trivially small.
True or false:,With extremely accurate
measurement devices, fluctuations due to measurement error are not present but even how
trivially small.
True
True or false: true scores can never be observed directly, they are a useful fiction that allows us to understand the concept of reliability more deeply
averaging many measurements
At best, we can approximate true scores by ______
First, time elapses between measurements
act of measurement can alter what is being estimated.
Unfortunately, when measuring something repeatedly, two influences interfere with
accurate measurement. First, ________. Some psychological
variables are in constant flux, such as mood, alertness, and motivation. Thus, the true score
a moment ago might differ markedly from the true score a moment from now. Second, _______
carryover effects
Measurement processes that alter what is measured are termed ______
Practkce effects
In ability tests, ________ are carryover effects in which the test itself provides an opportunity
to learn and practice the ability being measured.
Fatigue effects
_____ are carryover effects in which repeated testing reduces overall mental energy or motivation to perform on a test.
True score
rewind time and measure the quantity repeatedly without carryover effects, the longterm average of those estimates would equal the _____
True
True or false: true scores can only be approximated.
standard error of measurement
The standard deviation of those repeated measurements is called the _________, which represents the typical distance from an observed score to the true score
measurement instrument
Confusingly, the true score is not necessarily the truth. By definition, a true score is tied to the _______ used
Construct score
If you are interested in the truth independent of measurement, you are not looking for
the so-called true score, but what psychologists call the _______
Construct
______ is a theoretical variable we believe exists, such as depression, agreeableness, or reading ability
Cosntruct
Identical
_______ is a person's standing on a theoretical variable independent of any particular measurement.
If we could create tests that perfectly measured theoretical constructs, the true score and the construct score would be ______
Reliwble, valid
_______ tests give scores that closely approximate true scores.
______ tests give scores that closely approximate construct scores
reliable but not valid.
A deeply flawed test that gives consistent measurements is ______
True score
T
the long-term average of many measurements free of carryover effects. We will symbolize it as ___
Observed score
X
When we take a measurement, that measurement is called an _____ which we will be symbolized as ____
E
This amount of measurement error will be symbolized by the letter ______
X= T + E
The observed score X is related to the true score T and the measurement error score E with this famous equation:
True score
Measurement error
We would like to be able to describe how much the observed score is influenced by the ______ and how much the observed score is determined by ______
variability of test scores
Because we cannot view the true scores or the error scores directly, we need an indirect method of estimating their influence. We can indirectly estimate how much the true score influences the observed score by measuring the ______
Variance
A statistic useful in describing sources of test score variability is the _____—the standard deviation squared
Variance
This statistic is useful because it can be broken into components. If we measured many people on a test, their scores would differ from each other in part because they have different true scores and in part because of measurement error.
True variance
Error variance
Variance from true differences is _______, and variance from irrelevant, random sources is ______
σ 2 = σ 2t + σ 2e
If σ2 represents the total observed variance, its relation with the true variance and the error
variance, can be expressed as _______
Reliability
refers to the proportion of the total variance attributed to true variance.
Stable, consistent
Because true differences are assumed to be _______, they are presumed to yield ______ scores on repeated administrations of the same test as well as on equivalent forms of test
systematic or random
Measurement error can be ______ or ______(sometimes referred to as "noise").
Random error
consists of unpredictable fluctuations and inconsistencies of other variables in the measurement process
Random error
Sometimes referred to as "noise," this source of error fluctuates from one testing situation to another with no discernible pattern that would systematically raise or lower scores
Random errors are
increase or decrease test scores unpredictably. On average and in the long run, these errors tend to _____ each other out
Systematic errors
do not cancel each other out because they influence test scores in a consistent direction.
they either consistently inflate scores or consistently deflate scores.
systematic error
Once a ________ becomes known,
it becomes predictable—as well as fixable.
Note that this error does not
affect score consistency
Bias
The technical term for the degree to which a
measure predictably overestimates or,underestimates a quantity _____
Bias
refers to the degree to which systematic error
influences the measurement.
test construction,
administration,
scoring, and/or interpretation
Sources of error variance include (3)
item sampling or content sampling,
Test constructiom
terms that refer to variation among items within a test as well as to variation among items between tests
It is under what source of error variance?
Test construction - item sampling or content sampling
The extent to which a testtaker's score
is affected by the content sampled on a test and by the way the content is sampled (i.e.,
the way in which the item is constructed) is a source of error variance
True variance
Error variance
From the perspective of a test creator, a challenge in test development is to maximize the proportion of the total variance that is _______ and to minimize the proportion of the total variance that is
_____.
Sampling error
The error in such research may be a result of _______—the extent to which the population of voters in the study actually was representative of voters in the election. The researchers may not have
gotten it right with respect to demographics, political party affiliation, or other factors related to
the population of voters.
Methodological error
The error in such research may be a result of sampling error—the extent to which the population of voters in the study actually was representative of voters in the election. The researchers may not have gotten it right with respect to demographics, political party affiliation, or other factors related to the population of voters. Alternatively, the researchers may have gotten such factors right but simply did not include enough people in their sample to draw the conclusions that they did.
This
situation brings us to another type of error, called _____
nonsystematic error
Potential sources of ______ error in such an
assessment situation include forgetting, failing to notice abusive behavior, and misunderstanding
instructions regarding reporting.
Systematic error
underreporting or overreporting of
perpetration of abuse also may contribute to _____
Test-retest reliability
_______ is an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.
Test-retest reliability
______ is appropriate when evaluating the reliability of a test that purports to measure something
that is relatively stable over time, such as a personality trait but if the characteristic being measured is assumed to fluctuate over time, then there would be little sense in assessing the reliability of the test using it
Test-retest
evaluation of a _______ reliability estimate must extend to a consideration of possible intervening factors between test
administrations.
test-retest reliability
An estimate of ______ may be most appropriate in gauging the reliability of tests that employ outcome measures such as reaction time or perceptual judgments (including
discriminations of brightness, loudness, or taste).
alternate-forms or parallel-forms reliability of the test
If you have ever taken a makeup exam in which the questions were not all the same as on the
test initially given, you have had experience with different forms of a test. And if you have
ever wondered whether the two forms of the test were really equivalent, you have wondered
about the _____ or ______
coefficient of equivalence
The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability, which is often termed the ____
True
True or false: Although frequently used interchangeably, there is a difference between the terms alternate forms and parallel forms
Parallel forms
______ of a test exist when, for each form of the test, the means and the variances of observed test scores are equal.
In theory, the means of scores obtained on (balnk 1) correlate equally with the true score. More practically, scores obtained on this tests correlate equally with other measures
parallel forms reliability
refers to an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observed test scores are equal
alternate forms
are simply different versions of a test that have been constructed so as to be parallel
False: Although they do not meet the requirements for
the legitimate designation "parallel," alternate forms of a test
are typically designed to be equivalent with respect to variables such as content and level of difficulty
True or false: Since they meet the requirements for the legitimate designation "parallel," ALTERNATE forms of a test are typically designed to be equivalent with respect to variables such as content and level of difficulty
Altrrnate forms reliability
refers to an estimate of the extent to which
these different forms of the same test have been affected by item sampling error, or other error
alternate forms
Estimating ________ reliability is straightforward: Calculate the correlation between scores from a representative sample of individuals who have taken both tests.
Test -retest
(1) Two test administrations
with the same group are required, and
(2) test scores may be affected by factors such as
motivation, fatigue, or intervening events such as practice, learning, or therapy
Obtaining estimates of alternate-forms reliability and parallel-forms reliability is similar
in two (2) ways to obtaining an estimate of _____ reliability:
internal consistency estimate of reliability or as an estimate of inter-item consistency.
evaluation of the internal consistency of the test items referred to as _____or _____
Split-Half Reliability Estimates
The Spearman-Brown formula
Inter-item consistency
Coefficient alpha
Kuder-Richardson 20
different methods of obtaining internal consistency estimates of reliability (5).
split-half reliability
obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once
It is a useful measure of reliability
when it is impractical or undesirable to assess reliability with two tests or to administer a test
twice (because of factors such as time or expense
Step 1. Divide the test into equivalent halves.
Step 2. Calculate a Pearson r between scores on the two halves of the test.
Step 3. Adjust the half-test reliability using the Spearman-Brown formula (discussed shortly).
The computation of a coefficient of split-half
reliability generally entails three (3) steps:
False! Should be randomly! (Own)
True or false: Dividing the test in the middle i nsplit half methods 8s recommended
odd-even reliability
Another acceptable way to split a test is to assign odd-numbered items to one half of
the test and even-numbered items to the other half. This method yields an estimate of split-half
reliability that is also referred to as _______. Yet another way to split a test is to
divide the test by content so that each half contains items equivalent with respect to content
and difficulty
The Spearman-Brown formula
allows a test developer or user to estimate internal consistency reliability from a correlation between two halves of a test.
The Spearman-Brown formula
reliability of a test is affected by its length, a formula like _____ is necessary for estimating the reliability
of a test that has been shortened or lengthened
r(xy)
In SB formula, _____ is equal to the Pearson r in the original-length test,
r(hh)
In SB formula, _____ is equal to the Pearson r in the tow half of thr tests
Spearman-Brown formula
Parallel tests
_______ can be used to see how the sum of many parallel tests becomes more reliable as the number of tests increases( could also be used to determine
the number of items needed to attain a desired level of reliability). When a single test has a low reliability, many _______ must be combined to achieve high levels of reliability
Inter-item consistency
Kuder and Richardson (1937)
refers to the degree of correlation among all the
items on a scale. A measure of it is calculated from a single administration of a single form of a test.
developed by _____
coefficient alpha
Cronbach
r(a)
0-1
may be thought of as the mean of all possible split-half correlations, corrected by the Spearman-Brown formula
developed by_____
Its symbol is _____
Typically ranges in value from ____-____
Loadings
coefficients with the Greek letter lambda (λ). These
coefficients are called _______, and they represent the strength of the relationship between the
true score and the observed scores
inter-scorer reliability
Variously referred to as scorer reliability, judge reliability, observer reliability, and interrater
reliability, _______ is the degree of agreement or consistency between
two or more scorers (or judges or raters) with regard to a particular measure
internal consistency
For a test designed for a single administration only, an estimate of _______ would be the reliability measure of choice
transient error
a source of error attributable to variations in the testtaker's feelings, moods, or mental state over time.
1) homogenous or heterogenous
2) dynamic or static traits/characteristic
3) restricte dor not ramge
4) speed or power test
5) if has criterion
considerations in nature of the test (5)
homogeneous
Recall that a test is said to be _______
in items if it is functionally uniform throughout
dynamic characteristic
is a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive
experiences
(Ex: anxiety)
static characteristic
one in which hourly assessments of this same stockbroker are made
(Ex: intelligence)
Power test
if some items are so difficult that no testtaker is able to obtain a perfect score, then the test is a _____
Speed test
generally contains items of uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly.
(1) test-retest reliability,
(2) alternate-forms reliability, or
(3) split-half reliability from two separately timed half tests
A reliability estimate of a speed test should be based on performance from two independent
testing periods using one of the following (3)
Criterion-referenced tests
is designed to provide an indication
of where a testtaker stands with respect to some variable or criterion, such as an educational or a vocational objective
Criterion-referenced tests
tend to contain material that has been mastered in hierarchical fashion. For example, the would-be pilot masters on-ground skills before attempting to master in-flight skills. Scores on criterion-referenced tests tend to be interpreted in pass-fail (or, perhaps more accurately, "master-failed-to-master") terms, and any scrutiny of performance on individual items tends to be for diagnostic and
remedial purposes
Variability
A measure of reliability, therefore, depends on the ______ of the test scores: how different
the scores are from one another
Also decrease
As individual differences (and the variability) decrease, a traditional measure of reliability
would _______, regardless of the stability of individual performance.
classical test theory (CTT) or true score (or
classical) model of measurement
______ or _______ is the most widely used and accepted model in the psychometric literature today—rumors of its demise have been greatly exaggerated
True score
a value that according to CTT genuinely reflects
an individual's ability (or trait) level as measured by a particular test. Let's emphasize here
that this value is indeed test dependent
true
True or false: according to CTT, A person's "true score" on one intelligence test, for
example, can vary greatly from that same person's "true score" on another intelligence test.
False! Baliktad beh, simple lang CTT
True or false: the assumptions of CTT is mpre difficult to meet than IRT
CTT
The advantage of ______ over any other model of measurement has to do with its compatibility
and ease of use with widely used statistical techniques
Domain sampling theory (modified today: generalizability theory)
1950 viable alternative to CTT
domain sampling theory
seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score.
Domain of behavior
the universe of items that could conceivably measure that behavior, can be thought of as a hypothetical construct: one that shares certain characteristics with (and is measured by) the sample of items that make up the test
Lee J. Cronbach (1970) and his colleagues
generalizability theory
universe
Proposed by _______
______ is based on idea that a person's test scores vary from testing to testing because of
variables in the testing situation. Instead of conceiving of all variability in a person's scores
as error, (blank 1 person) encouraged test developers and researchers to describe the details of the particular test situation or __ leading to a specific test score
Universe score (Mp)
True score
According to generalizability theory, given the exact same conditions of all the facets in the universe, the exact same test score should be obtained. This test score is the ______ symbolized by ____ and it is analogous to _____ in CTT