1/112
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai | Chat |
|---|
No analytics yet
Send a link to your students to track their progress
Reliability
the extent to which a test yields consistent results, as assessed by the consistency of scores on two halves of the test, on alternate forms of the test, or on retesting
reliability coefficient
an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance
Perfect reliability indicating redundancy
1.0
≥ 0.9 if Clinical and ≥ 0.8 or ≤ 0.9 for normal use
What is a good reliability?
True score
An individual's actual score on a variable being measured, as opposed to the score the individual obtained on the measure itself.
Carryover effects
occur when participants' experience in one condition affects their behavior in another condition of a study
Practice effects
Improvements in performance resulting from opportunities to perform a behavior repeatedly so that baseline measures can be obtained.
Test Sophistication
increase of score due to the test
Fatigue effects
Repeated testing reduces overall mental energy or motivation to perform on a test.
Construct score
A person's standing on a theoretical variable independent of any particular measurement.
Variance
The standard deviation squared
Describing sources of test score variability
True variance
variance from true differences
Error variance
The amount of variability among the scores caused by chance or uncontrolled variables.
Measurement Error
an error that occurs when there is a difference between the information desired by the researcher and the information provided by the measurement process
Random error
an error that occurs when the selected sample is an imperfect representation of the overall population
Systematic error
Error that shifts all measurements in a standardized way. Decreases accuracy. Can result in bias
Test environment
A controlled environment established to test products, services, and other configuration items.
Testtaker Variables
Personal factors affecting test performance.
Examiner-related Variables
physical appearance, demeanor, eye contact are examples of _________
Test Retest Reliability
a method for determining the reliability of a test by comparing a test taker's scores on the same test taken on separate occasions
Coefficient of Stability
An estimate of test-retest reliability obtained during time intervals of six months or longer
Parallel Forms
a method of establishing the reliability of a measurement instrument by correlating scores on two different but equivalent versions of the same instrument
Alternate Forms
if a teacher gives out multiple forms of an exam with different questions, the overall scores should be similar for each form
Immediate Form
Administered at the same time.
Delayed Form
Interval between both administrations.
Split-Half Reliability
A measure of reliability in which a test is split into two parts and an individual's scores on both halves are compared.
Spearman Brown Formula
Used to estimate internal consistency reliability from a correlation between two halves of a test
Coefficient Alpha
A measure of internal-consistency reliability that is the average of all possible split-half coefficients resulting from different splittings of the scale items
For Non-dichotomous items
Answers how similar sets of data are
Kuder-Richardson Formula
Used to calculate interitem consistency when items are dichotomous (yes/no, true/false)
KR20
Dichotomous items with varying levels of difficulty
KR21
dichotomous items; all the test items have approximately the same degree of difficulty.
Average Proportional Distance (APD)
a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores
interrater reliability
the amount of agreement in the observations of different raters who witness the same behavior
Kappa statistics
used formula for nominal data.
Cohen's Kappa
Used to measure the level of agreement between two raters or judges only
Fleiss' Kappa
Determine the level of agreement between two or more raters
Kendall's W
Used for ranking or ordinal data
Homogenous Test
A test that measures only one trait or characteristic.
Heterogenous Test
a test that measures more than one trait or characteristic
Dynamic Test
tests based on Vygotsky's theory that emphasize potential rather than past learning
Static Test
an individual is assessed at a given point in time, and the results of a test are used to determine what the person can and cannot do on his or her own
Speed Tests
large number of relatively easy items in limited test period
Power Tests
reflects the level of difficulty of items the test takers answer correctly
Criterion-Referenced Tests
Tests where the student's performance is compared to a standard or criterion. The student's score is not based on how he/she compared with other students, but rather on how the student did as measured by the criteria or standards. Criterion-referenced test will yield such scores as percentages or number of correct answers.
Classical Test Theory
Each testtaker has a true score on a test that would be obtained but for the action of measurement error.
Domain Sampling Theory
Estimate the extent to which specific sources of variation under defined conditions are contributing to the test scores.
Generalizability Theory
based on the idea that a person's test scores vary from testing to testing because of variables in the testing situation
Item Response Theory (IRT)
a mathematical approach to choosing test items in which the probability of a positive response to an item is determined by the person's estimated position on the underlying trait being measured, as well as by characteristics of the item
The person who has ability 1 would be able to perform the ability 2
Explain IRT
Latent-Trait Theory
Another name for IRT
Item discrimination
the degree to which a test item is able to correctly differentiate test-takers who vary according to the construct measured by the test.
Polytomous Item
A test item for which more than two outcomes are possible, such as "disagree," "neutral," and "agree."
Dichotomous Item
Binary item.
Confidence Interval
a range of values so defined that there is a specified probability that the value of a parameter lies within it.
likely to contain true scores
can aid a test user in determining how large a difference should be before it is considered statistically significant
Standard Error of the Difference
refers to the standard error of the difference between the predicted and observed values
Standard Error of Estimate
Validity
A judgment or estimate of how well a test measures what it supposed to measure
≥ 0.35
What Validity coefficient is valid
Face Validity
extent to which respondents can tell what the items are measuring
Content validity
The degree to which the content of a test is representative of the domain it's supposed to cover.
Test blueprint
A plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, etc.
Underrepresentation
failure to capture needed components
Overrepresentation
disproportionately higher incidence or greater presence of a characteristic than expected; may be desired to ensure inclusion of minority groups; impacts generalizability of findings as proportions do not match what would be found typically or generally
Construct Validity
The ability of a test to represent the underlying construct (the theory developed to organize and explain some aspects of existing knowledge and observations).
Irrelevant variance
Other factors influenced the construct.
Method of Contrasted groups
Demonstrate that scores on the test vary in a predictable way as a function of membership in a group.
Divergent
Constructs are not expected to correlate
Convergent
constructs are expected to correlate
Factor Analysis
Statistical tool used to analyze interrelationships among constructs
Identify the factor/s in common between test scores on sub-scales within a particular test
Factor loading
Conveys info about the extent to which the factor determines the test score or scores.
Criterion-Related Validity
Evaluates test based on an external source
Concurrent Validity
Extent to which test scores may be used to estimate an individual's present standing on a criterion
Predictive Validity
The success with which a test predicts the behavior it is designed to predict; it is assessed by computing the correlation between test scores and the criterion behavior.
Validity coefficient
correlation coefficient between a test score (predictor) and a performance measure (criterion)
Incremental validity
the degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use
Criterion contamination
Occurs when the criterion measure includes aspects of performance that are not part of the job or when the measure is affected by construct-irrelevant factors.
Leniency Error
occurs when ratings of all employees fall at the high end of the scale
Rating Error
Intentional or unintentional misuse of the scale.
Severity Error
Rater is strict in scoring.
Central Tendency Error
Rater's rating would tend to cluster in the middle of the rating scale.
Halo effect
tendency of an interviewer to allow positive characteristics of a client to influence the assessments of the client's behavior and statements
Normative sample
a group of individuals who were given the test to identify standards of performance at specific age levels
Norm
Test performance data of a particular group of test takers that are designed for use as a reference when evaluating and interpreting individual test scores
Norming
Deriving norms
Percentile Norms
Raw data from a test's standardization sample converted to percentile form.
percentage correct
the distribution of raw scores, the number of items that were answered correctly multiplied by 100 and divided by the total number of items
Developmental Norms
Developed on the basis of any trait, ability, skills, or other characteristic that is presumed to develop, deteriorate, or affect stage of life
Age norms
age equivalent scores; indicate the average performance of different test takers who were at various ages at the time the test was administered
Grade norms
Indicate the average test performance of testtakers in a given school grade
National Norms
Norms derived from a standardization sample that was nationally representative of the population
National Anchor Norms
An equivalency table for scores on two nationally standardized tests designed to measure the same thing
Subgroup Norms
Normative sample can be segmented by any criteria initially used in selecting subjects for the sample.
Local Norms
provide normative information with respect to the local population's performance on some test
Expectancy Data
provide an indication that a test taker will score within some interval of scores on a criterion measure - passing, acceptable, failing
Taylor Russel tables
Provide an estimate of the criterion based on another group different from the original group from which the test was validated.
Selection ratio
Numerical value that reflects the relationship between the number of people to be hired and the number of people available to be hired.
Base rate
Percentage of current employees who are considered successful.
Naylor-Shine Tables
Entails obtaining the difference between the means of the selected and unselected groups to derive an index of what the test is adding to already established procedures.
Brogden-Cronbach-Gleser Formula
Used to calculate the dollar amount of a utility gain resulting from the use of a particular selection instrument.
Utility gain
Estimate of the benefit of using a particular test.