1/48
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
RELIABILITY
refers to the consistency of findings or results of
a psychology research under study.
RELIABILITY
refers to the trustworthiness of a measure,
yielding the same results across multiple
applications to the same sample.
TEST CONSTRUCTION
is tem sampling or content sampling, terms that refer
to variation among items within a test as well as to
variation among items between tests.
TEST ADMINISTRATION
may influence the testtaker’s
attention or motivation. The testtaker’s
reactions to those influences are the source of
error variance.
TEST ENVIRONMENT
the room temperature, the level
of lighting, and the
TEST TAKER VARIABLES
Ex: emotional problems,
physical discomfort, lack of sleep, and the effects of
food, drugs or any medication.
EXAMINER RELATED VARIABLES
examiner’s physical
appearance and manners, the presence or absence
of an examiner; head nodding, eye movements,
and non-verbal gestures.
TEST SCORING AND INTERPRETATION
computer scorable items
virtually have mostly eliminated error variance
caused by scorer differences in many tests.
OTHER SOURCES
Females may underreport abuse because of fear,
shame, or social desirability factors and
overreport abuse if they are seeking help. Males may underreport abuse because of
embarrassment and social desirability factors
and overreport abuse if they are attempting to
justify the report.
TEST RETEST RELIABILITY
one way of estimating the reliability of a measuring
instrument is by using the same instrument to
measure the same thing at two points in time. It is obtained by correlating pairs of scores from the same people on two different administrations of
the same test.
PARALLEL AND ALTERNATE FORMS
Evaluates the degree of the relationship
between various forms of a test.
PARALLEL FORMS
exist when for each form
of the test, the means and the variances of
observed test scores are equal.
ALTERNATE FORMS
simply different versions
of a test that have been constructed so as to
be parallel.
Internal consistency estimate
of reliability / inter-item consistency
reliability estimate of a test can still be
obtained without developing an alternate form
of the test nor administering the test twice to
the same people.
SPLIT HALF RELIABILITY
obtained by correlating two pairs of
scores obtained from equivalent halves
of a single test administered once.
ODD EVEN RELIABILITY
assign odd
numbered items to one half of the
test and even-numbered items to the
other half
INTER ITEM CONSISTENCY
refers to the degree of correlation among all the items on a scale. It is calculated from a single administration of a single form of a test. An index of inter-item consistency is useful in
assessing the homogeneity of the test /
measures a single trait.
G FREDERIC KUDER & M W RICHARDSON
developed their own measures for estimating
reliability primarily for dichotomous items
that replaced split-half reliability; no equal
variances.
KR 20
the statistic of choice for
determining the inter-item consistency of
dichotomous items, primarily those items that
can be scored right or wrong (such as multiple-
choice items).
KR 21
used if there is reason to assume that all
the test items have approximately the same
degree of difficulty.
COEFFICIENT ALPHA
Developed by Cronbach and subsequently
elaborated on by others.
COEFFECIENT ALPHA
appropriate for use on tests containing non-
dichotomous/polytomous items (no right or
wrong answers).
COEFFECIENT ALPHA
preferred statistic for obtaining an
estimate of internal consistency reliability.
INTER SCORER RELIABILITY
referred as scorer reliability, judge reliability,
observer reliability, and inter rater reliability. It is the degree of agreement or consistency
between two or more scorers (or judges or raters)
with regards to a particular measure.
NATURE OF TESTS
Considerations concerning the purpose and use
of a reliability coefficient are those concerning
the nature of the test itself.
HOMOGENEOUS
if it is
functionally uniform throughout. Tests designed
to measure one factor, such as one ability or one
trait
DYNAMIC CHARACTERISTICS
is a trait, state, or
ability presumed to be ever changing as a
function of situational and cognitive experiences
are best obtained through internal consistency.
STATIC CHARACTERISTICS
such as trait, state, or
ability presumed to be relatively unchanging
such as intelligence are best measured by test-
retest or the alternate-forms.
RESTRICTION
if the variance of either variable in
a correlational analysis is restricted by the
sampling procedure used, then the resulting
correlation coefficient tends to be low.
INFLATION
the variance of either variable in a
correlational analysis is inflated by the sampling
procedure, then the resulting correlation
coefficient tends to be high.
POWER TEST
no time limit, allows testtakers to
attempt all items, there are some items that
starts from easy to difficult items that no test
taker is no longer able to answer.
SPEED TEST
contains items of uniform level of
difficulty so that, when given generous time
limit, all test takers should be able to answer as
many test items as possible.
CRITERION REFERENCE TEST
designed to
provide an indication of where a testtaker stands
with respect to some variable / criterion, used
frequently to gauge achievement or mastery.
RELIABILITY THEORIES
Classical Test Theory / True Score Theory
Domain sampling Theory
Generalizability Theory
Item response Theory / Latent-trait Theory
CLASSIC TEST THEORY
test is the unit of analysis. it is explained by the formula: X = T + E
TRUE SCORE
the ability to be measured is not
always evident because it is covered by error.
CONFIDENCE INTERVAL
the location where the true score is, the range true score.
TRUE SCORE
equal to a universe / infinity.
DOMAIN SAMPLING THEORY
True score is equal to a universe / infinity.
GENERALIZABILITY THEORY
developed by Lee Cronbach, a person’s test
scores vary from testing to testing because of
variables in the testing situation (bias,
judgments)
ITEM RESPONSE/ LATENT TRAIT THEORY
The test items are the unit of analysis. It is possible to have lesser items to be reliable.
ITEM BRANCHING
calibrates difficulty of items
depending on the testtaker’s performance.
ITEM DIFFICULTY IN IRT
the attribute of not
being easily solved, or comprehended.
ITEM DISCRIMINATION IN IRT
the degree to
which an item differentiates among people with
higher or lower levels of what is being
measured.