Validity
Defined as the agreement between a test score or measure and the quality is believed to measure
Standards of psychological testing
Foundation, operations and applications
Face validity
Not an actual category. Measure looks like it has validity but it's based on judgment without systematic evidence
Content related evidence for validity
Evidence that show the test adequately covers the content it is supposed to measure
Construct underrepresentation
Failure to capture important components of a construct
Construct irrelevant variance
Where scores are influenced by factors irrelevant to the construct
Criterion related evidence for validity
Evidence that support the test ability to predict or correlate with external criteria
Construct related evidence for validity
It's evidence that supports the underlying theoretical construct being measured by the test
Predictive evidence
How well a test can predict future outcome like SAT
Effect of restricted range
Most of the data point Falls within a small or limited range of values
Concurrent validity
Evaluating whether a new test or questionnaire to provide results that are consistent with an existing measure
Convergent evidence
Measure correlates well with other tests
Discriminant evidence
A test should have low correlation with the measure of unrelated construct
Criteria - refence list
Have items that are designed to match certain specific instructional objective
Relationship between reliability and validity
We can have reliability without validity but we can't have validity without reliability
Item format
The way in which questions or statements are presented in a test such as true or false multiple choice or polytomous formats
Dichotomous format
A type of format where each item provides two Alternatives true or false, one being correct
Polytomous format
A type of item format where each item has more than two alternatives. this is multiple choice
Distractors
Incorrect choices and multiple choice items that test takers can select
Correction for guessing
A formula used to adjust test scores in multiple choice exams to account for the likelihood of obtaining Answers by random guessing
Omitted responses
Answers left blank or not attempted by test takers which can typically not account in correction for guessing formulas
Random guessing
Selecting answers in multiple choice items without any knowledge of the correct answer which may or may not be advantageous
Speeded tests
Test with time constraints where the correction for guessing formula may only include items attempted, making random guessing and leaving items like have the same expected effect
Elimination method
A strategy where test takers eliminate obviously incorrect Alternatives and multiple choice items increasing their chances of getting the right answer
Likert format
A scale that uses strongly disagree to strongly agree to a particular question
Reverse scoring
Reversing the original scoring used to maintain consistency in a scale construction
Category format
Similar to the Likert method, but with greater numbers of choices
Endpoints
The extreme values or labels of the category scale which should be avoided to minimize potential response bias
Context effect
The phenomenon where ratings on a category format skills may change based on the context or grouping of people
Optimal number of categories
The number of response categories and a format scale varies depending on the level of involvement of respondent, considered sufficient for most waiting tasks
Visual analogue scale
A method where there is a scale that is like a line and and you're supposed to Mark between two endpoints
Confidence intervals
A statistical method used to calculate a range of values that is likely to contain a population parameter
Adjective checklist
A method commonly used and personality measurement where subject received a list of adjectives and indicate how characteristic of them
Q sort
Technique that increases the number of responses but have a subject sort statements into nine piles to describe themselves
Forced choice format
Item formats that require subjects to make choices from given alternative
Checklists
A format that has become less popular in the recent years were subject respond to the list of items
Item writing
The process of creating tests items including selecting appropriate format wording and response choices
All of the above
A response option commonly advised against in item writing as it can be problematic and lead to confusion in multiple choice questions
Item analysis techniques
Methods used to evaluate the effectiveness and quality of test items after they have been administered including measures of reliability difficulty and discrimination
Precise language
The use of clear specific on ambiguous wording and test item to ensure they are accurately assesses the intended trait or knowledge
Subject matter knowledge
A deep understanding of the content and concept being tested in order to create accurate effective test item
Item difficulty
In the context of a test that measures achievement or ability, item difficulty is defined by the number of people who answer the particular item correctly
Optimal difficulty level
Ideal level of difficulty for test items usually halfway between 100% correct responses and the level of success expected by chance
Discriminability
And measure of the value of test items assessing the extent to which individuals who perform well on specific items also perform well on the entire test
Extreme group method
A method to assess an item discriminability by comparing performance of individual who have done well to those who have not done well
Point biserial method
And approach the evaluate discriminability of test items by finding the correlation between performance on a specific item and overall test performance
Item characteristic curve
Represents the relationship between the tests items difficulty and the proportion of examines who answer it correctly
Item discriminability
Is the extent to which high performing individuals on a specific item also perform well on the entire test
Difficulty and discriminability
An items difficulty level is essential and items should ideally have a difficulty between 30% and 70%
Item selection
Final version of the test should consider both difficulty and discriminability
Item response Theory
And you were approved to test construction that considers the probability of getting specific items correct based on the individual's ability level.
Computer adaptive testing
Significant advantage of IRT allowing for personalized assessments
Measurement precision
The choice of this design impact the measurement Precision across various ability levels. computer adaptive testing offers the advantage of maintaining consistent measurement position for defined ability levels