1/179
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
test development
umbrella term for all that goes into the process of creating a test
test conceptualization
test construction
test tryout
item analysis
test revision
five stages of the process of developing a test
test conceptualization
test begins from a test developer’s decision to create a certain test that measures a construct in a certain way
Preliminary Questions
What is the test designed to measure?
What is the objective of the test?
Is there a need for this test?
Who will use this test?
Who will take this test?
What content will the test cover?
How will the test be administered?
What is the ideal format of the test?
Should more than one form of the test be developed?
What special training will be required of test users for administering or interpreting the test?
What types of responses will be required of testtakers?
Who benefits from an administration of this test?
Is there any potential for harm as the result of an administration of this test?
How will meaning be attributed to scores on this test?
norm-referenced tests
compare a person’s performance to other’s performances
designed to rank individuals in relation to one another
goal: to see how a test-taker’s score compares to the average in a specific group
criterion-referenced tests
designed to measure whether a person has mastered a specific skill or met a particular standard, regardless of how others perform
scaling
the process of setting rules for assigning numbers in measurement
the process by which measuring a device is designed and calibrated and by which numbers-scale-values are assigned to different amounts of trait, attitude, or characteristics being measured
rating scale
grouping of words, statements, or symbols on which judgments of the strength of a particular traits, attitude, or emotion are indicated by the test taker
summative scale
type of rating scale where the final score is obtained by adding up the individual ratings or responses given by a testtaker across multiple items or questions
likert scale
each item presents five or seven usually an agree/disagree or approve/disapprove continuum
unidimensional rating
only one dimension is presumed to underlie the ratings
multidimensional rating
more than one dimension is thought to guide the testtaker’s responses
method of paired comparisons
presents testtakers with pairs of stimuli (such as statements, behavior, or options) and asks them to choose which option they believe is more justifiable or more preferable based on a particular criterion
comparative scaling
entails judgements of a stimulus in comparison with every other stimulus on the scale. Testtakers would be asked to sort the cards from most justifiable to least justifiable
categorical scaling
stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum
Guttman scale
items on it range sequentially from weaker to stronger expressions of attitude, belief, or feeling being measured. All respondents who agree with the stronger statements of the attitude will also agree with milder statements
scalogram analysis
an item-analysis procedure and approach to test development that involves a graphic mapping of a testtaker’s responses
item pool
the reservoir or well from which items will or will not be drawn for the final version of the testi
item format
variables such as the form, plan, structure, arrangement, and layout of individual test items
selected response format
require testtakers to select a response from a set of alternative responses
multiple choice
has three elements, a stem, a correct alternative of option, and several incorrect alternatives or options variously referred to as distractors or foils
matching item
testtaker is presented with two columns: premises on the left and responses on the right. The testaker’s task is to determine which response is best associated with which premise.tru
true-false item
the most familiar binary-choice item. Requires the testatker to indicated whether the statement is or not a fact
constructed-reponse format
requires testtakers to supply or to create the correct answer, not merely to select it
completion item
requires examinee to provide a word or phrase that completes a sentence
the short answer
testtaker can respond brieflt-that is with a short answer
essay
a test item that requires the testtaker to respond to a question by writing a composition, typically one that demonstrates recall of facts, understanding, analysis, and or interpretation
item bank
a relatively large and easily accessible collection of test questions
computerized adaptive testing (CAT)
refers to an interactive, computer administered test-taking process wherein items presented to the testtakers are based in part on the testtaker’s performance on previous items
floor effect
refers to the diminished utility of an assessment tool for distinguishing testtakers atthe low end of the ability, trait, or other attribute being measured.
ceiling effect
refers to the diminished utility of an assessment tool for distinguishing testtakers atthe high end of the ability, trait, or other attribute being measured.
item branching
The ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items is referred to as
Cumulative Scoring Model
the higher the score, the higher the testtaker is on the ability or trait being measured. Each correct response adds to the overall score, which indicates the testtaker’s proficiency in a particular area. Example: A test on math contains 10 questions. For every correct answer, the testtaker earns one point. The more correct answers they provide, the higher their score, indicating a higher level of math proficiency.
Class or Category Scoring
the testtaker’s responses are grouped into specific categories or classes. The testtaker earns credit towards being placed in a particular category, which groups people with similar response patterns
Ipsative Scoring
This model differs from both cumulative and class scoring. It compares a testtaker’s scores on different scales within the same test, rather than comparing to others or a general population.
TEST TRYOUT
the next step is to try it out on a group of people similar to the target audience for the test.
number of people used for the test tryout is important. A general rule is to have at least 5 to 10 people per test item. The more people involved, the more accurate the results will be. If too few people are used, the findings may not be reliable and could include "phantom factors" that are not real.
item analysis
general term for a set of methods used to evaluate test items to determine their quality including how well they differentiate between high and low scorers, and to identify any items that might need to be revised or removed to improve the overall reliability and validity of the assessment. it is one of the most important aspects of test construction.
item difficulty index
It is a form of item analysis used to assess how difficult items are.
obtained by calculating the proportion of the total number of test-takers who answered the item correctly.
range of difficulty levels is from 0.3 to 0.7
items should have a variety of difficulty levels because a good test discriminates at many levels.
Optimal average item difficulty
the best balance of questiondifficulty in a test, helping it fairly measure and rank test-takerswhile avoiding questions that are too easy or too hard.
ITEM-RELIABILITY INDEX
provides an indication of the internal consistency of a test the higher this index, the greater the test’s internal consistency.
Factor Analysis and Inter-Item Consistency
a statistical tool useful in determining whether items on a test paper the same thing (s).
can be useful in the test interpretation process, especially when comparing the constellation of responses to the items from two or more groups.
item-validity index
is a statistic designed to provide an indication of the degree to which a test item is measuring what it purports to measure. The higher the item-validity index, the greater the test’s criterion-related validity
Factor Analysis
is a statistical procedure that measure how different test items are related to each other.
Factor Loading
The loadings represent the relationship between the item and the variable being measured.
The Standard Factor Loading is at least .50.
Indicators for which these criteria are not satisfied may be removed
constructs
are NOT directly observable
NEGATIVE DISCRIMINATION
happens when more students in the lower group than in the upper group select the right answer to an item.
Guessing
how test-takers may guess on multiple-choice items, which can artificially inflate scores. Some scoring methods attempt to correct for guessing, but there is no universally accepted approach
Omitted Responses
Items that are left blank can be problematic. The test developer must decide whether to treat them as incorrect, exclude them from scoring, or interpret them differently based on the testing context.
Test Speededness
If a test is speeded (meaning many test-takers don’t finish), items near the end may appear more difficult than they actually are. This can distort item statistics, making it important to differentiate between genuine difficulty and time constraints
Differential Item Functioning (DIF)
Some items may function differently for different demographic groups even when test-takers have the same underlying ability level. Identifying and addressing DIF is crucial for ensuring fairness and reducing bias in testing
Qualitative Item Analysis
Statistical analyses don’t capture all potential item flaws. The book highlights methods like "think-aloud" protocols, expert reviews, and focus groups to evaluate how test-takers interpret items and whether wording or content needs improvement
Item Response Theory (IRT) Considerations
Using item characteristic curves (ICCs),test developers can analyze how well an item differentiates between high- and low-ability test-takers. provides a more detailed look at item performance than classical test theory
test revision
is a systematic process aimed at improving the quality, reliability, and validityof a test. It involves reevaluating items, modifying content, and ensuring the testaligns with current standards and practices
Process of Test Revision
The steps in test revision mirror those in new test development
Test Conceptualization
Defining the purpose and objectives of the revision.
Test Construction
Making necessary changes to content, format, or structure
Test Tryout
Administering the revised test to a sample group
Item Analysis
Evaluating item performance, reliability, andvalidity.
Final Refinements
Adjusting the test based on findings before finalizing
psychological assessment process
Reason for Referral — conduct intake interview — choose tests to be included in the test battery — administer the tests — score and interpret each test results — communicate findings in a psychological evaluation report — intervention
reason for referral
–the nature of the client’s problem should be determined in order to determine the tests/ assessment tools needed to administer.
–Note: Don’t forget the Informed Consent
conduct the intake interview
–Valuable information is gained through interviewing.
–When it’s for a child, interviews are conducted not only the child, but parents, teachers and other individuals familiar with the child.
–Interviews are more open and less structured than formal testing and give those being interviewed an opportunity to convey information in their own words.
Choose Tests to be Included in the Test Battery
–Usually includes a test of cognitive capacity (e.g. SB-5, WISC, CFIT), achievement test/ aptitude test (SAT, WIAT, DAT), behavioral assessments (Child Behavioral Checklist [CBCL], Student Behavior Survey [SBS], Folstein Mini-Mental Exam), structured/projective personality techniques.
–Consider availability and expertise (Level A, B, C)
Administer the Tests
Build Rapport
Make sure that the child/ client understand instructions
Speak clearly when dictating test item
Give short breaks in between tests or subtests
Take note of significant behavioral observations
Administer the Tests
Behavior Observations
Attitude towards testing/ examiner
Verbal and non-verbal behavior
Attention and Concentration
Other significant observation (sensory / motor disturbances etc.)
Score and Interpret Each Test Results
Make sure all items are completely administered
Exercise caution in scoring and interpreting test results
If you’re not sure, you can always refer back to the test manual
Integrate all Test Results
After Step 5, the findings should be integrated together to write the results (together with the information from interviews, observations, etc.)
Consistent observations from different sources/ findings that manifest across several tests are deemed as reliable
Communicate Findings in a Psychological Evaluation Report
Identifying data and reason for referral
Family and personal history
Behavior observations
Tests administered and results (intellectual, socio-emotional, adaptive functioning)
Summary and recommendations
Set feedback sessions with parent/counselor/ person referred
social facilitation
we tend to act like the models around us.
Judgmental or evaluative statements
are particularly likely to inhibit the interviewee. Being judgmental means evaluating the thoughts, feelings, or actions of another. When we use such terms as good, bad, excellent, terrible, disgusting, disgraceful, and stupid, we make evaluative statements.
avoid probing statements
These demand more information than the interviewee wishes to provide voluntarily. The most common way to phrase a probing statement is to ask a question that begins with “Why?” Asking “Why?” tends to place others on the defensive.
verbatim playback
the interviewer simply repeats the interviewee’s last response
Paraphrasing and restatement responses
are also interchangeable with the interviewee’s response. A — tends to be more similar to the interviewee’s response than a restatement, but both capture the meaning of the interviewee’s response.
restatement
there is a an introduction of new words, and in paraphrasing, there is a substitution of key words
Summarizing and clarification statements
go just beyond the interviewee’s response
summarizing
the interviewer pulls together the meaning of several interviewee responses
clarification
serves to clarify the interviewee’s response
empathy or understanding response
This response communicates that the interviewer understands how the interviewee feels
To establish a positive atmosphere
interviewers begin with an open-ended question followed by understanding statements that capture the meaning and feeling of the interviewee’s communication
statements to avoid in an unstructured interview
judgmental or evaluative statements
probing statements
hostile responses
false reassurance
Level-one responses.
responses bear little or no relationship to the interviewee’s response. (The two are really talking only to themselves
Level-two responses
response communicates a superficial awareness of the meaning of a statement. The individual who makes a level-two response never quite goes beyond his or her own limited perspective. impede the flow of communication
Level-three responses
interchangeable with the interviewee’s statement. is the minimum level of responding that can help the interviewee. Paraphrasing, verbatim playback, clarification statements, and restatements are all examples of level-three responses.
Level-four and level-five responses
not only provide accurate empathy but also go beyond the statement given. In a __ response, the interviewer adds “noticeably” to the interviewee’s response. In a _ response, the interviewer adds “significantly” to it. We recommend that beginning interviewers learn to respond at level three before going on to the more advanced levels.
active listening
is the foundation of good interviewing skills for many different types of interviews
confrontation
a statement that points out a discrepancy or inconsistency. Carkhuff (1969) distinguished among three types: (1) a discrepancy between what the person is and what he or she wants to become, (2) a discrepancy between what the person says about him- or herself and what he or she does, and (3) a discrepancy between the person’s perception of him- or herself and the interviewer’s experience of the person
case history
a biographical sketch—one often needs to ask specific questions. Case history data may include a chronology of major events in the person’s life, a work history, a medical history, and a family history.
obtaining a case history
the interviewer often takes a developmental approach, examining an individual’s entire life, beginning with infancy or the point at which the given type of history is first relevant. The purpose of obtaining a case history is to understand individuals’ background so that one can accurately interpret individual test scores
mental status examination
is used primarily to diagnose psychosis, brain damage, and other major mental health problems. Its purpose is to evaluate a person suspected of having neurological or emotional problems in terms of variables known to be related to these problems
E. L. Thorndike (1920
labeled this tendency to judge specific traits on the basis of a general impression the halo effect. People apparently tend to generalize judgments from a single limited experience.
halo effects
occur when the interviewer forms a favorable or unfavorable early impression
general standoutishness.
people tend to judge on the basis of one outstanding characteristic. Hollingworth (1922) first called this error
crossethnic, cross-cultural, and cross-class interviewing
Another potential source of error in the interview can be found in
variables to consider in selecting tests in relation to prospective test takers
variables related to test medium
variables related to test format
variables related to the language of test items
Overview/Introduction
Includes the background of the test, its purpose, its applicability and scope
test development
The format of the tests, the scales/ subscales that are in the test
psychometric soundness
Reliability evidence, Validity evidence
Directions for Administration
Verbatim instructions on how administer the tests, general considerations for test users
Scoring and Interpretation Guide
How to score, Interpretation of scores, Profile sheet guidelines, Answer Key
norms
Tables of norms from different norm group