1/59
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
reliability
refers to consistency in measurement
reliability coefficient
is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance
error
refers to the component of the observed test score that does not have to do with the testtaker’s ability
variance
a statistic useful in describing sources of test score variability;
the standard deviation squared
true variance
variance from true differences
error variance
variance from irrelevant, random sources
reliability
refers to the proportion of the total variance attributed to true variance
measurement error
refers to, collectively, all the factors associated with the process of measuring some variable, other than the variable being measured;
ca be categorized as being either systematic or random
random error
a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process
systematic error
a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured
item sampling or content sampling
refer to the variation among items within a test as well as to variation among items between tests
(1) test construction (2) administration (3) scoring (4) interpretation (5) sampling error (6) methodological error
what are the different sources of error variance?
test-retest reliability
an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test
coefficient of stability
referred to as the estimate of test-retest reliability when the interval between testing is greater than six months
coefficient of equivalence
termed as the degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability
parallel forms
this exist when the means and the variances of observed test scores are equal
parallel forms reliability
refers to an estimate of the extent to which items sampling and other errors have affected test scores on versions of the same test when the means and variances of observed test scores are equal
alternate forms
are simply different versions of a test that have been constructed so as to be parallel;
typically designed to be equivalent with respect to variables such as content and level of difficulty
alternate forms reliability
refers to an estimate of the extent to which these different forms of the same test have been affected by items sampling error or other error
internal consistency estimate of reliability or estimate of inter-item consistency
evaluation of the internal consistency of the test items
split-half reliability
is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once;
it is a useful measure of reliability when it is impractical or undesirable
to assess reliability with two tests or to administer a test twice
odd-even reliability
a method of splitting a test by assigning odd-numbered items to one half of the test and even-numbered items to the other half
Spearman-Brown formula
allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test;
a specific application of a more general formula to estimate the reliability of a test
inter-item consistency
refers to the degree of correlation among all the items on a scale
homogeneity
an index of inter-item consistency is useful in assessing the _________ of the test
homogeneity
is the degree to which a test measures a single factor;
is the extent to which items in a scale are unifactorial
heterogeneity
describes the degree to which a test measures different factors
Kuder-Richardson formula 20
where test items are highly homogeneous;
the statistic of choice for determining the inter-item consistency of dichotomous items, primarily those items that can be scored right or wrong
coefficient alpha
developed by Cronbach;
the mean of all possible split-half correlations, corrected by Spearman-Brown formula
average proportional distance method
a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores
inter-scorer reliability
is the degree of agreement or consistency between two or more scorers with regard to a particular measure
coefficient of inter-scorer reliability
the simplest way of determining the degree of consistency among scorers in the scoring of a test is to calculate a coefficient of correlation, which is the __________
dynamic characteristic
is a state, trait, or ability presumed to be ever-changing as a function of situational and cognitive experiences
static characteristic
trait, state, or ability presumed to be relatively unchanging such as intelligence
restriction of range or restriction of variance
an important issue in using and interpreting a coefficient of reliability
inflation of range or inflation of variance
what is the opposite of restriction of range or restriction of variance?
power test
when a time limit is long enough to allow testtakers to attempt all items, and if some tests are so difficult that no testtaker is able to obtain a perfect score
speed test
generally contains items of uniform level of difficulty so that, when given generous time limits, all testtakers should be able to complete all the test items correctly
criterion-referenced test
designed to provide an indication of where a testtaker stands with respect to some variable or criterion
classical test theory
referred to as the true score model of measurement
true score
a value that according to classical test theory genuinely reflects an individual’s ability level as measured by a particular test
domain sampling theory
proponents of this theory seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score;
a test’s reliability is conceived of as an objective measure of how precisely the test score assesses the domain from which the test draws a sample
generalizability theory
is based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation
universe
the particular test situation
facets
the universe is described in terms of its ______, which include things like the number of items in the test, the amount of training the test scorers have had, and the purpose of the test administration
universe score
as Cronbach noted, it is analogous to a true score in the true score model
generalizability study
examines how generalizable scores from a particular test are if the test is administered in different situations
coefficients of generalizability
the influence of particular facets on the test score is represented by this;
these coefficients are similar to reliability coefficients in the true score model
decision study
involves application of information from the generalizability study;
developers examine the usefulness of test scores in helping the test user make decisions
latent-trait theory
Because so often the psychological or educational construct being measured is physically unobservable (stated another way, is latent) and because the construct being measured may be a trait (it could also be something else, such as an ability), a synonym for IRT in the academic literature is
item-response theory
provide a way to model the probability that a person with X ability will be able to perform at a level of Y
discrimination
signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured
dichotomous test items
test items or questions that can be answered with only one of two alternative responses, such as true–false, yes–no, or correct–incorrect questions
polytomous test items
test items or questions with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct
Rasch model
is a reference to an IRT model with very specific assumptions about the underlying distribution;
each item on the test is assumed to have an equivalent relationship with the construct being measured by the test
Georg Rasch
a Danish mathematician who developed the Rasch model
standard error of measurement
is the tool used to estimate or infer the extent to which an observed score deviates from a true score;
standard error of a score
the standard deviation of a theoretically normal distribution of test scores obtained by one person on equivalent tests
confidence interval
a range or band of test scores that is likely to contain the true score
standard error of the difference
comparisons between scores;
a statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant