1/50
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Reliability coefficient
An index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance
Error
the component of the observed test score that does not have to do with the testtaker’s ability
Variance
A statistic useful in describing sources of test score variability
the standard deviation squared
True variance
Variance from true differences
Error variance
Variance from irrelevant, random sources
Relationship of variances can be expressed as
σ2 = σ2 th + σ2 e
Reliability
refers to the proportion of the total variance attributed to true variance
Measurement error
collectively, all of the factors associated with the process of measuring some variable, other than the variable being measured
Random error
a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process
Systematic error
refers to a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.
Item sampling / content sampling
One source of variance during test construction
refer to variation among items within a test as well as to variation among items between tests
Test-retest method
measuring instrument is by using the same instrument to measure the same thing at two points in time
Test-retest reliability
an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test
Coefficient of stability
When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to
Coefficient of equivalence
The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability
Parallel forms
_______________ of a test exist when, for each form of the test, the means and the variances of observed test scores are equal.
Parallel forms reliability
refers to an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observed test scores are equal
Alternate forms
are simply different versions of a test that have been constructed so as to be parallel
Alternate forms reliability
refers to an estimate of the extent to which these different forms of the same test have been affected by item sampling error, or other error
Internal consistency estimate of reliability or estimate of inter-item consistency
Deriving this type of estimate entails an evaluation of the internal consistency of the test items.
Split-half reliability
obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.
a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice
Odd-even reliability
Another acceptable way to split a test is to assign odd-numbered items to one half of the test and even-numbered items to the other half. This method yields an estimate of split-half reliability that is also referred to as
Spearman-Brown Formula
allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test
Can also be used to determine the number of items needed to attain a desired level of reliability
Inter-item consistency
refers to the degree of correlation among all the items on a scale.
Homogeneity
is the degree to which a test measures a single factor.
the extent to which items in a scale are unifactorial.
Heterogeneity
describes the degree to which a test measures different factors.
A _____________ (or nonhomogeneous) test is composed of items that measure more than one trait.
Kuder-Richardson Formula 20 or KR-20
is the statistic of choice for determining the inter-item consistency of dichotomous items, primarily those items that can be scored right or wrong
Coefficient Alpha
Developed by Cronbach
may be thought of as the mean of all possible split-half correlations, corrected by the Spearman–Brown formula
appropriate for use on tests containing non dichotomous items
Average Proportional Distance (APD)
a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores
Inter-scorer reliability
the degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure
is often used when coding nonverbal behavior
Coefficient of inter-scorer reliability
Perhaps the simplest way of determining the degree of consistency among scorers in the scoring of a test is to calculate a coefficient of correlation. This correlation coefficient is referred to as____________
Dynamic Characteristics
a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences
Static characteristics
Contrast this situation to one in which hourly assessments of this same stockbroker are made on a trait, state, or ability presumed to be relatively unchanging, such as intelligence
Power test
When a time limit is long enough to allow testtakers to attempt all items, and if some items are so difficult that no testtaker is able to obtain a perfect score, then the test is _________
Speed test
generally contains items of uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly.
Criterion-referenced test
designed to provide an indication of where a testtaker stands with respect to some variable or criterion, such as an educational or a vocational objective
True score
a value that according to classical test theory genuinely reflects an individual’s ability (or trait) level as measured by a particular test
Domain sampling theory
seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score
Generalizability theory
is based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation
Facets
include things like the number of items in the test, the amount of training the test scorers have had, and the purpose of the test administration
Universe score
analogous to a true score in the true score model.
Generalizability study
examines how generalizable scores from a particular test are if the test is administered in different situations
coefficient of generalizability
The influence of particular facets on the test score is represented by ___________________
Decision study
developers examine the usefulness of test scores in helping the test user make decisions
Discrimination
signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured
Dichotomous test items
test items or questions that can be answered with only one of two alternative responses, such as true–false, yes–no, or correct–incorrect questions
Polytomous test items
test items or questions with three or more alternative responses, where only one is scored correct or scored as being
Rasch model
is a reference to an IRT model with very specific assumptions about the underlying distribution
The Standard Error of Measurement
the tool used to estimate or infer the extent to which an observed score deviates from a true score
provides a measure of the precision of an observed test score
Standard error of a score
an index of the extent to which one individual’s scores vary over tests presumed to be parallel.
Confidence interval
a range or band of test scores that is likely to contain the true score.
Standard error of the difference
a statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant.