Standards for Educational and Psychological Testing

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/27

flashcard set

Earn XP

Description and Tags

Fundamental vocabulary and key concepts from the 2014 edition of 'Standards for Educational and Psychological Testing' by AERA, APA, and NCME.

Last updated 9:27 PM on 5/1/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

28 Terms

1
New cards

Test

A device or procedure in which a sample of an examinee’s behavior in a specified domain is obtained and subsequently evaluated and scored using a standardized process.

2
New cards

Assessment

A broad process that integrates test information with information from other sources, such as inventories, interviews, or the individual’s social and health history.

3
New cards

Validity

The degree to which evidence and theory support the interpretations of test scores for proposed uses of tests; the most fundamental consideration in test development.

4
New cards

Construct

The concept or characteristic—such as mathematics achievement, general cognitive ability, or self-esteem—that a test is designed to measure.

5
New cards

Construct Underrepresentation

The degree to which a test fails to capture important aspects of the intended construct, resulting in a narrowed meaning of test scores.

6
New cards

Construct-Irrelevant Variance

The degree to which test scores are affected by processes and factors extraneous to the test’s intended purpose, such as reading difficulty on a math test.

7
New cards

Reliability/Precision

The consistency of test scores across replications of a testing procedure, such as over different tasks, occasions, or raters.

8
New cards

Standard Error of Measurement (SEM)

An indicator of a lack of consistency in the scores generated by a testing procedure for a specific population; a relatively large SEM indicates low reliability.

9
New cards

True Score

The hypothetical average score for a person over an infinite set of replications of the testing procedure in classical test theory.

10
New cards

Differential Item Functioning (DIF)

Occurs when different groups of test takers with similar overall ability have, on average, systematically different probabilities of responding correctly to a specific item.

11
New cards

Validity Generalization

The degree to which validity evidence based on test-criterion relations can be applied to a new situation without further study, often estimated via meta-analysis.

12
New cards

Fairness

Responsiveness to individual characteristics and testing contexts so that test scores yield valid interpretations for intended uses for all individuals and subgroups.

13
New cards

Accessibility

The notion that all examinees should have an unobstructed opportunity to demonstrate their standing on the construct being measured, regardless of irrelevant characteristics.

14
New cards

Universal Design

An approach to test development that seeks to maximize accessibility by minimizing construct-irrelevant features from the outset of the design process.

15
New cards

Accommodation

A relatively minor change to test presentation, format, administration, or response procedures that maintains the original construct and results in comparable scores.

16
New cards

Modification

A change in test content or administration that affects the construct measured by the test, leading to scores that differ in meaning from the original test.

17
New cards

Test Specifications

Broad documentation including the test purpose, intended uses, content, format, length, psychometric characteristics, administration, and scoring rules.

18
New cards

Scoring Rubric

Detailed rules for evaluating performance on extended-response items that specify the criteria for each score level.

19
New cards

Equating

A statistical process for relating scores from alternate forms of a test so that the scale scores can be used interchangeably.

20
New cards

Vertical Scaling

Methods used to place scores from different levels of a test on a single scale to facilitate inferences about growth or development over time.

21
New cards

Cut Scores

Specific values that divide the score range into categories, such as 'pass/fail' or 'basic/proficient/advanced,' to aid in classification.

22
New cards

Informed Consent

Agreement from test takers that they understand the reasons for testing, the types of tests used, and the likely consequences of their test results.

23
New cards

Job Analysis

The collection of information about job duties, tasks, and responsibilities used as the basis for defining the content domain for employment testing.

24
New cards

Credentialing

A generic term for licensure and certification processes that identifying practitioners who have met standards of competence in an occupation.

25
New cards

Secondary Data Analysis

The analysis of data that was previously collected for a different purpose than the current evaluation or policy study.

26
New cards

Accountability Index

A number or label created by combining scores and other information (such as graduation rates) to inform performance-based rewards or sanctions for institutions.

27
New cards

Matrix Sampling

A procedure in which multiple short test forms are assigned to different subsamples of test takers to represent a broad domain without requiring any single person to take the full test.

28
New cards

Opportunity to Learn

The extent to which examinees have been exposed to the instruction or knowledge required for the content and skills targeted by a test.