A. Psychometric Properties and Principles

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/112

There's no tags or description

Looks like no tags are added yet.

Last updated 5:55 PM on 6/2/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

113 Terms

New cards

Reliability

the extent to which a test yields consistent results, as assessed by the consistency of scores on two halves of the test, on alternate forms of the test, or on retesting

New cards

reliability coefficient

an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance

New cards

Perfect reliability indicating redundancy

1.0

New cards

≥ 0.9 if Clinical and ≥ 0.8 or ≤ 0.9 for normal use

What is a good reliability?

New cards

True score

An individual's actual score on a variable being measured, as opposed to the score the individual obtained on the measure itself.

New cards

Carryover effects

occur when participants' experience in one condition affects their behavior in another condition of a study

New cards

Practice effects

Improvements in performance resulting from opportunities to perform a behavior repeatedly so that baseline measures can be obtained.

New cards

Test Sophistication

increase of score due to the test

New cards

Fatigue effects

Repeated testing reduces overall mental energy or motivation to perform on a test.

New cards

Construct score

A person's standing on a theoretical variable independent of any particular measurement.

New cards

Variance

The standard deviation squared

Describing sources of test score variability

New cards

True variance

variance from true differences

New cards

Error variance

The amount of variability among the scores caused by chance or uncontrolled variables.

New cards

Measurement Error

an error that occurs when there is a difference between the information desired by the researcher and the information provided by the measurement process

New cards

Random error

an error that occurs when the selected sample is an imperfect representation of the overall population

New cards

Systematic error

Error that shifts all measurements in a standardized way. Decreases accuracy. Can result in bias

New cards

Test environment

A controlled environment established to test products, services, and other configuration items.

New cards

Testtaker Variables

Personal factors affecting test performance.

New cards

Examiner-related Variables

physical appearance, demeanor, eye contact are examples of _________

New cards

Test Retest Reliability

a method for determining the reliability of a test by comparing a test taker's scores on the same test taken on separate occasions

New cards

Coefficient of Stability

An estimate of test-retest reliability obtained during time intervals of six months or longer

New cards

Parallel Forms

a method of establishing the reliability of a measurement instrument by correlating scores on two different but equivalent versions of the same instrument

New cards

Alternate Forms

if a teacher gives out multiple forms of an exam with different questions, the overall scores should be similar for each form

New cards

Immediate Form

Administered at the same time.

New cards

Delayed Form

Interval between both administrations.

New cards

Split-Half Reliability

A measure of reliability in which a test is split into two parts and an individual's scores on both halves are compared.

New cards

Spearman Brown Formula

Used to estimate internal consistency reliability from a correlation between two halves of a test

New cards

Coefficient Alpha

A measure of internal-consistency reliability that is the average of all possible split-half coefficients resulting from different splittings of the scale items

For Non-dichotomous items

Answers how similar sets of data are

New cards

Kuder-Richardson Formula

Used to calculate interitem consistency when items are dichotomous (yes/no, true/false)

New cards

KR20

Dichotomous items with varying levels of difficulty

New cards

KR21

dichotomous items; all the test items have approximately the same degree of difficulty.

New cards

Average Proportional Distance (APD)

a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores

New cards

interrater reliability

the amount of agreement in the observations of different raters who witness the same behavior

New cards

Kappa statistics

used formula for nominal data.

New cards

Cohen's Kappa

Used to measure the level of agreement between two raters or judges only

New cards

Fleiss' Kappa

Determine the level of agreement between two or more raters

New cards

Kendall's W

Used for ranking or ordinal data

New cards

Homogenous Test

A test that measures only one trait or characteristic.

New cards

Heterogenous Test

a test that measures more than one trait or characteristic

New cards

Dynamic Test

tests based on Vygotsky's theory that emphasize potential rather than past learning

New cards

Static Test

an individual is assessed at a given point in time, and the results of a test are used to determine what the person can and cannot do on his or her own

New cards

Speed Tests

large number of relatively easy items in limited test period

New cards

Power Tests

reflects the level of difficulty of items the test takers answer correctly

New cards

Criterion-Referenced Tests

Tests where the student's performance is compared to a standard or criterion. The student's score is not based on how he/she compared with other students, but rather on how the student did as measured by the criteria or standards. Criterion-referenced test will yield such scores as percentages or number of correct answers.

New cards

Classical Test Theory

Each testtaker has a true score on a test that would be obtained but for the action of measurement error.

New cards

Domain Sampling Theory

Estimate the extent to which specific sources of variation under defined conditions are contributing to the test scores.

New cards

Generalizability Theory

based on the idea that a person's test scores vary from testing to testing because of variables in the testing situation

New cards

Item Response Theory (IRT)

a mathematical approach to choosing test items in which the probability of a positive response to an item is determined by the person's estimated position on the underlying trait being measured, as well as by characteristics of the item

New cards

The person who has ability 1 would be able to perform the ability 2

Explain IRT

New cards

Latent-Trait Theory

Another name for IRT

New cards

Item discrimination

the degree to which a test item is able to correctly differentiate test-takers who vary according to the construct measured by the test.

New cards

Polytomous Item

A test item for which more than two outcomes are possible, such as "disagree," "neutral," and "agree."

New cards

Dichotomous Item

Binary item.

New cards

Confidence Interval

a range of values so defined that there is a specified probability that the value of a parameter lies within it.

likely to contain true scores

New cards

can aid a test user in determining how large a difference should be before it is considered statistically significant

Standard Error of the Difference

New cards

refers to the standard error of the difference between the predicted and observed values

Standard Error of Estimate

New cards

Validity

A judgment or estimate of how well a test measures what it supposed to measure

New cards

≥ 0.35

What Validity coefficient is valid

New cards

Face Validity

extent to which respondents can tell what the items are measuring

New cards

Content validity

The degree to which the content of a test is representative of the domain it's supposed to cover.

New cards

Test blueprint

A plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, etc.

New cards

Underrepresentation

failure to capture needed components

New cards

Overrepresentation

disproportionately higher incidence or greater presence of a characteristic than expected; may be desired to ensure inclusion of minority groups; impacts generalizability of findings as proportions do not match what would be found typically or generally

New cards

Construct Validity

The ability of a test to represent the underlying construct (the theory developed to organize and explain some aspects of existing knowledge and observations).

New cards

Irrelevant variance

Other factors influenced the construct.

New cards

Method of Contrasted groups

Demonstrate that scores on the test vary in a predictable way as a function of membership in a group.

New cards

Divergent

Constructs are not expected to correlate

New cards

Convergent

constructs are expected to correlate

New cards

Factor Analysis

Statistical tool used to analyze interrelationships among constructs

Identify the factor/s in common between test scores on sub-scales within a particular test

New cards

Factor loading

Conveys info about the extent to which the factor determines the test score or scores.

New cards

Criterion-Related Validity

Evaluates test based on an external source

New cards

Concurrent Validity

Extent to which test scores may be used to estimate an individual's present standing on a criterion

New cards

Predictive Validity

The success with which a test predicts the behavior it is designed to predict; it is assessed by computing the correlation between test scores and the criterion behavior.

New cards

Validity coefficient

correlation coefficient between a test score (predictor) and a performance measure (criterion)

New cards

Incremental validity

the degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use

New cards

Criterion contamination

Occurs when the criterion measure includes aspects of performance that are not part of the job or when the measure is affected by construct-irrelevant factors.

New cards

Leniency Error

occurs when ratings of all employees fall at the high end of the scale

New cards

Rating Error

Intentional or unintentional misuse of the scale.

New cards

Severity Error

Rater is strict in scoring.

New cards

Central Tendency Error

Rater's rating would tend to cluster in the middle of the rating scale.

New cards

Halo effect

tendency of an interviewer to allow positive characteristics of a client to influence the assessments of the client's behavior and statements

New cards

Normative sample

a group of individuals who were given the test to identify standards of performance at specific age levels

New cards

Norm

Test performance data of a particular group of test takers that are designed for use as a reference when evaluating and interpreting individual test scores

New cards

Norming

Deriving norms

New cards

Percentile Norms

Raw data from a test's standardization sample converted to percentile form.

New cards

percentage correct

the distribution of raw scores, the number of items that were answered correctly multiplied by 100 and divided by the total number of items

New cards

Developmental Norms

Developed on the basis of any trait, ability, skills, or other characteristic that is presumed to develop, deteriorate, or affect stage of life

New cards

Age norms

age equivalent scores; indicate the average performance of different test takers who were at various ages at the time the test was administered

New cards

Grade norms

Indicate the average test performance of testtakers in a given school grade

New cards

National Norms

Norms derived from a standardization sample that was nationally representative of the population

New cards

National Anchor Norms

An equivalency table for scores on two nationally standardized tests designed to measure the same thing

New cards

Subgroup Norms

Normative sample can be segmented by any criteria initially used in selecting subjects for the sample.

New cards

Local Norms

provide normative information with respect to the local population's performance on some test

New cards

Expectancy Data

provide an indication that a test taker will score within some interval of scores on a criterion measure - passing, acceptable, failing

New cards

Taylor Russel tables

Provide an estimate of the criterion based on another group different from the original group from which the test was validated.

New cards

Selection ratio

Numerical value that reflects the relationship between the number of people to be hired and the number of people available to be hired.

New cards

Base rate

Percentage of current employees who are considered successful.

New cards

Naylor-Shine Tables

Entails obtaining the difference between the means of the selected and unselected groups to derive an index of what the test is adding to already established procedures.

New cards

Brogden-Cronbach-Gleser Formula

Used to calculate the dollar amount of a utility gain resulting from the use of a particular selection instrument.

100

New cards

Utility gain

Estimate of the benefit of using a particular test.