1/27
Fundamental vocabulary and key concepts from the 2014 edition of 'Standards for Educational and Psychological Testing' by AERA, APA, and NCME.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Test
A device or procedure in which a sample of an examinee’s behavior in a specified domain is obtained and subsequently evaluated and scored using a standardized process.
Assessment
A broad process that integrates test information with information from other sources, such as inventories, interviews, or the individual’s social and health history.
Validity
The degree to which evidence and theory support the interpretations of test scores for proposed uses of tests; the most fundamental consideration in test development.
Construct
The concept or characteristic—such as mathematics achievement, general cognitive ability, or self-esteem—that a test is designed to measure.
Construct Underrepresentation
The degree to which a test fails to capture important aspects of the intended construct, resulting in a narrowed meaning of test scores.
Construct-Irrelevant Variance
The degree to which test scores are affected by processes and factors extraneous to the test’s intended purpose, such as reading difficulty on a math test.
Reliability/Precision
The consistency of test scores across replications of a testing procedure, such as over different tasks, occasions, or raters.
Standard Error of Measurement (SEM)
An indicator of a lack of consistency in the scores generated by a testing procedure for a specific population; a relatively large SEM indicates low reliability.
True Score
The hypothetical average score for a person over an infinite set of replications of the testing procedure in classical test theory.
Differential Item Functioning (DIF)
Occurs when different groups of test takers with similar overall ability have, on average, systematically different probabilities of responding correctly to a specific item.
Validity Generalization
The degree to which validity evidence based on test-criterion relations can be applied to a new situation without further study, often estimated via meta-analysis.
Fairness
Responsiveness to individual characteristics and testing contexts so that test scores yield valid interpretations for intended uses for all individuals and subgroups.
Accessibility
The notion that all examinees should have an unobstructed opportunity to demonstrate their standing on the construct being measured, regardless of irrelevant characteristics.
Universal Design
An approach to test development that seeks to maximize accessibility by minimizing construct-irrelevant features from the outset of the design process.
Accommodation
A relatively minor change to test presentation, format, administration, or response procedures that maintains the original construct and results in comparable scores.
Modification
A change in test content or administration that affects the construct measured by the test, leading to scores that differ in meaning from the original test.
Test Specifications
Broad documentation including the test purpose, intended uses, content, format, length, psychometric characteristics, administration, and scoring rules.
Scoring Rubric
Detailed rules for evaluating performance on extended-response items that specify the criteria for each score level.
Equating
A statistical process for relating scores from alternate forms of a test so that the scale scores can be used interchangeably.
Vertical Scaling
Methods used to place scores from different levels of a test on a single scale to facilitate inferences about growth or development over time.
Cut Scores
Specific values that divide the score range into categories, such as 'pass/fail' or 'basic/proficient/advanced,' to aid in classification.
Informed Consent
Agreement from test takers that they understand the reasons for testing, the types of tests used, and the likely consequences of their test results.
Job Analysis
The collection of information about job duties, tasks, and responsibilities used as the basis for defining the content domain for employment testing.
Credentialing
A generic term for licensure and certification processes that identifying practitioners who have met standards of competence in an occupation.
Secondary Data Analysis
The analysis of data that was previously collected for a different purpose than the current evaluation or policy study.
Accountability Index
A number or label created by combining scores and other information (such as graduation rates) to inform performance-based rewards or sanctions for institutions.
Matrix Sampling
A procedure in which multiple short test forms are assigned to different subsamples of test takers to represent a broad domain without requiring any single person to take the full test.
Opportunity to Learn
The extent to which examinees have been exposed to the instruction or knowledge required for the content and skills targeted by a test.