Psychological Assessment (Laboratory)
Concepts; unseen processes postulated to explain behavior
not straightforward or simple to measure
cannot be observed directly
describes the behaviors and internal processes that make up that construct, along with how it relates to other variables
a definition of a variable in terms of precisely how it is to be measured
the process of developing indicators or items for measuring these constructs
to develop a psychological test, use a theory of the target construct
key to this is that there must be a rational link between the items’ content and the definition and understanding of the construct
Unidimensional Construct: expected to have a single underlying dimension; measured using a single measure or test
Multidimensional Construct: consist of 2 or more underlying dimensions
the mental process by which fuzzy and imprecise constructs (concepts) and their constituent components are defined in concrete and precise terms
The process of understanding what is included and what is excluded in the concept
Test Items: units that make up a test and the means through which samples of test taker’s behavior are gathered
It follow that the overall quality of the test depends primarily on the quality of the items that make it up, although the number of items in a test, and their sequencing or position within the test, are also matters of fundamental importance
Item Analysis: a general term that refers to all the techniques used to asses the characteristics of test items and evaluate their quality during the process of test development and test construction
involves both qualitative and quantitative procedures
Qualitative Item Analysis Procedures
rely on the judgments of reviewers concerning the substantive and stylistic characteristics of items as well as their accuracy and fairness
appropriateness of item content and format to the purpose of the test and the populations for whom the test is designed
clarity of expression
grammatical correctness
adherence to some basic rules for writing items that have evolved over time
Quantitative Item Analysis
involves a variety of statistical procedures designed to ascertain the psychometric characteristics of items based on the responses obtained from the samples used in the process of test development
the constructs or knowledge domains that the test will assess
the type of population with which the test will be used
the objectives of the items to be developed, within the framework of the test’s purpose
the concrete means through which the behavior samples will be gathered and scored
the last point includes decisions about the method of administration, the format of the test item stimuli and responses, and the scoring procedures to be used
after these issues are decided and a preliminary plan for the test is made, the process of test development usually involves the following steps:
Generating the item pool by writing or otherwise creating the test items, as well as the administration and scoring procedures to be used
Submitting the item pool to reviewers for qualitative item analysis, and revising or replacing items as needed
Trying out the items that have been generated and reviewed on samples that are representative of the population for whom the test is intended
Evaluating the results of trial administrations of the item pool through quantitative item analysis and additional quantitative analysis
Adding, deleting, and/or modifying items as needed, on the basis of both qualitative and quantitative item analysis
Conducting additional trial administrations for the purpose of checking whether item statistics remain stable across different groups -- cross-validation -- until a satisfactory set of items is obtained
Standardizing or fixing the length of the test and the sequencing of items, as well as the administration and scoring procedures to be used in the final form of the test, on the basis of the foregoing analyses
Administering the test to an new sample of individuals -- carefully selected to represent the population of test takers for whom the test is intended -- in order to develop normative data or performance criteria, indexes of test score reliability and validity, as well as item-level statistics for the final version of the test
Publishing the test in its final form, along with an administration and scoring manual, accompanying documentation of standardization data, reliability and validity studies, and the materials needed for test administration and scoring
close-ended in nature
they present a limited number of alternatives from which the test taker must choose
in ability tests, items of this type include multiple-choice true-false, ranking and matching, as well as items that call for a rearrangement of the options provided
in personality tests, objective items may be either dichotomous or polytomous
Dichotomous Items: require a choice between 2 alternatives
Polytomous Items: present the test taker with 3 or more alternative responses to a statement
These alternatives are typically scaled in terms of degree of acceptance (e.g., ike, indifferent, or dislike), intensity of agreement (e.g., from strongly agree to strongly disagree), frequency (e.g., from never to very often), and so forth -- with the midpoint usually signifying a neutral, uncertain, or middle-of-the-road position
Objective items that require test takers to choose which one of 2 or more alternatives is most or least characteristic of them
This kind of item is used mainly in multidimensional personality inventories to control for the tendency of test takers to respond in the direction they perceive as more socially desirable
Advantages
objective items are by far the most popular and frequently used type of test item
Their advantages derive from the ease and objectivity with which they can be scored, which result in significant time savings and enhance test score reliability
make efficient use of testing time because more of them can be administered within any given time period than is the case with constructed-response items
Although they can also be administered individually, most tests that use selected-response items are intended for group testing
All the responses to objective items can easily and reliably be transformed into a numerical scale for scoring purposes, a fact that greatly simplifies the quantitative analysis of these items
in ability tests, correct and incorrect answers are usually assigned values of 1 or 0, respectively; occasionally, variations, such as 2 or 1 or 0 are available for partial credit
in personality tests, dichotomous items are also scored 1 or 0, depending on whether the test taker’s response is or not in the direction of the construct that the test is designed to assess
Disadvantages
more susceptible than constructed-response items to certain problems
the possibility of correct guessing
incorrect answers to objective items can easily occur as a result of haste, inattention, carelessness, malingering, or other chance factors unrelated to the test taker’s level of knowledge or ability in the area covered by the item
test-taking response sets can be intentionally or unintentionally misleading
carelessly written multiple-choice items, in particular, often include alternatives that are
grammatically incompatible with the stem of the item
susceptible to various interpretations due to imprecise wording
selected-response items are clearly less flexible than constructed response items with regard to the possible range of responses
they offer no opportunity for assessing characteristics that may be special or unique to an individual test taker or that lie outside the range of alternatives provided
open ended
may involve writing samples, free oral responses, performances of any kind, and products of all sorts
the most common type of constructed-response items are essay questions and fill-in-the-blanks
directions for administering constructed-response tests should include stipulation on matters such as
time limits
medium, manner, or length of the required response
whether access to materials or instruments pertinent to the test (e.g., textbooks, calculators, computers, etc.) is permitted
interviews, biographical data questionnaires, and behavioral observations are tools for the assessment of personality that often rely on open-minded responses
in personality testing proper, the use of constructed responses is limited mainly to projective techniques
Advantages
provide richer samples of the behavior of examinees and allow for their unique characteristics to emerge
open-minded items offer a wider range of possibilities and more creative approaches to testing and assessment than selected-response items
Disadvantages
related to score reliability, and as a consequence, to validity as well
scoring constructed responses, both in ability and personality tests, is always a more time consuming and complex matter than scoring selected responses because some degree of subjectivity is invariably necessary
there is always the possibility that a response will be evaluated differently by different scorers due to its uniqueness or to some other factor
test length, response length
Measurement: the assignment of numerals to objects or events according to rules
good research in psychology and social psychology depends on good measurement
understanding the concepts behind scaling would help up understand the ways on how we could assign numerical values in psychological measurement
in a psychological measurement, we use numerals to represent on individual level of a psychological battle
it is therefore true that numerals can represent psychological attributes in many different ways
these different ways can be described in the 3 properties of numerals: identity, order, and quantity
the most fundamental form of measurement is the ability to reflect “sameness” vs “differentness”
all people within a category must satisfy the property of identity
all people within a category must be “identical” with respect to the feature reflected by the category
in this case, numerals have no actual mathematical value
when making categorical differentiations between people, the distinction between people of different categories represent differences in quality rather than quantity
indicates the rank of people relative to each other along some dimension
when numerals are used to indicate order, they essentially serve as labels
when numerals have the property of quantity, they also provide information about the magnitude or differences between people
“0” is a strange number as it has various meanings
absolute zero
arbitrary zero
knowing what 0 means is essential in psychological measurement
a scale’s response format refers to the way in which items are presented and responses are obtained
includes the Likert scale and semantic differential scales
must consider the number of response options available. A minimum of 2 is required but having more has pros and cons as well
the use of midpoints is a common consideration in scale construction
Midpoints are presented with terms such as “neutral” or “neither agree nor disagree”, often achieved through an odd number of response options
may have pros and cons also
Some researchers might want to accommodate respondents who have no opinion about the item or who don’t know what their true perspectives are
avoid using neutral options as “I don’t know” responses
These responses might mean more than just lack of knowledge or opinion
You better focus on simplicity, clarity, and breath of the psychological dimension
In constructing a psychological scale, one should consider at least 2 issues regarding the consistency of response options across items
a scale’s items should have equal number of response options
the logical order of the response options should be consistent across items
Item content must reflect the intended psychological variables
the breadth of the variable must be reflected in the scale’s content
number of items should be considered for each construct to be measured, with each having its own set of items and receiving its own score; depends on several issues
longer scales have higher reliability
broadly-defined constructs may require more items reflecting more items reflecting narrowly-defined constructs
consider the context of administration or time-sensitive contexts
items and instructions should be relatively clear and easily understood
should entail little cognitive effort
avoid psychological (technical) jargons, double negatives, double-barreled items (reflecting 2 separate ideas)
as a general rule, scales should be “balanced” by including positively-keyed and negatively-keyed items
negatively-keyed items must be reversely scored
Concepts; unseen processes postulated to explain behavior
not straightforward or simple to measure
cannot be observed directly
describes the behaviors and internal processes that make up that construct, along with how it relates to other variables
a definition of a variable in terms of precisely how it is to be measured
the process of developing indicators or items for measuring these constructs
to develop a psychological test, use a theory of the target construct
key to this is that there must be a rational link between the items’ content and the definition and understanding of the construct
Unidimensional Construct: expected to have a single underlying dimension; measured using a single measure or test
Multidimensional Construct: consist of 2 or more underlying dimensions
the mental process by which fuzzy and imprecise constructs (concepts) and their constituent components are defined in concrete and precise terms
The process of understanding what is included and what is excluded in the concept
Test Items: units that make up a test and the means through which samples of test taker’s behavior are gathered
It follow that the overall quality of the test depends primarily on the quality of the items that make it up, although the number of items in a test, and their sequencing or position within the test, are also matters of fundamental importance
Item Analysis: a general term that refers to all the techniques used to asses the characteristics of test items and evaluate their quality during the process of test development and test construction
involves both qualitative and quantitative procedures
Qualitative Item Analysis Procedures
rely on the judgments of reviewers concerning the substantive and stylistic characteristics of items as well as their accuracy and fairness
appropriateness of item content and format to the purpose of the test and the populations for whom the test is designed
clarity of expression
grammatical correctness
adherence to some basic rules for writing items that have evolved over time
Quantitative Item Analysis
involves a variety of statistical procedures designed to ascertain the psychometric characteristics of items based on the responses obtained from the samples used in the process of test development
the constructs or knowledge domains that the test will assess
the type of population with which the test will be used
the objectives of the items to be developed, within the framework of the test’s purpose
the concrete means through which the behavior samples will be gathered and scored
the last point includes decisions about the method of administration, the format of the test item stimuli and responses, and the scoring procedures to be used
after these issues are decided and a preliminary plan for the test is made, the process of test development usually involves the following steps:
Generating the item pool by writing or otherwise creating the test items, as well as the administration and scoring procedures to be used
Submitting the item pool to reviewers for qualitative item analysis, and revising or replacing items as needed
Trying out the items that have been generated and reviewed on samples that are representative of the population for whom the test is intended
Evaluating the results of trial administrations of the item pool through quantitative item analysis and additional quantitative analysis
Adding, deleting, and/or modifying items as needed, on the basis of both qualitative and quantitative item analysis
Conducting additional trial administrations for the purpose of checking whether item statistics remain stable across different groups -- cross-validation -- until a satisfactory set of items is obtained
Standardizing or fixing the length of the test and the sequencing of items, as well as the administration and scoring procedures to be used in the final form of the test, on the basis of the foregoing analyses
Administering the test to an new sample of individuals -- carefully selected to represent the population of test takers for whom the test is intended -- in order to develop normative data or performance criteria, indexes of test score reliability and validity, as well as item-level statistics for the final version of the test
Publishing the test in its final form, along with an administration and scoring manual, accompanying documentation of standardization data, reliability and validity studies, and the materials needed for test administration and scoring
close-ended in nature
they present a limited number of alternatives from which the test taker must choose
in ability tests, items of this type include multiple-choice true-false, ranking and matching, as well as items that call for a rearrangement of the options provided
in personality tests, objective items may be either dichotomous or polytomous
Dichotomous Items: require a choice between 2 alternatives
Polytomous Items: present the test taker with 3 or more alternative responses to a statement
These alternatives are typically scaled in terms of degree of acceptance (e.g., ike, indifferent, or dislike), intensity of agreement (e.g., from strongly agree to strongly disagree), frequency (e.g., from never to very often), and so forth -- with the midpoint usually signifying a neutral, uncertain, or middle-of-the-road position
Objective items that require test takers to choose which one of 2 or more alternatives is most or least characteristic of them
This kind of item is used mainly in multidimensional personality inventories to control for the tendency of test takers to respond in the direction they perceive as more socially desirable
Advantages
objective items are by far the most popular and frequently used type of test item
Their advantages derive from the ease and objectivity with which they can be scored, which result in significant time savings and enhance test score reliability
make efficient use of testing time because more of them can be administered within any given time period than is the case with constructed-response items
Although they can also be administered individually, most tests that use selected-response items are intended for group testing
All the responses to objective items can easily and reliably be transformed into a numerical scale for scoring purposes, a fact that greatly simplifies the quantitative analysis of these items
in ability tests, correct and incorrect answers are usually assigned values of 1 or 0, respectively; occasionally, variations, such as 2 or 1 or 0 are available for partial credit
in personality tests, dichotomous items are also scored 1 or 0, depending on whether the test taker’s response is or not in the direction of the construct that the test is designed to assess
Disadvantages
more susceptible than constructed-response items to certain problems
the possibility of correct guessing
incorrect answers to objective items can easily occur as a result of haste, inattention, carelessness, malingering, or other chance factors unrelated to the test taker’s level of knowledge or ability in the area covered by the item
test-taking response sets can be intentionally or unintentionally misleading
carelessly written multiple-choice items, in particular, often include alternatives that are
grammatically incompatible with the stem of the item
susceptible to various interpretations due to imprecise wording
selected-response items are clearly less flexible than constructed response items with regard to the possible range of responses
they offer no opportunity for assessing characteristics that may be special or unique to an individual test taker or that lie outside the range of alternatives provided
open ended
may involve writing samples, free oral responses, performances of any kind, and products of all sorts
the most common type of constructed-response items are essay questions and fill-in-the-blanks
directions for administering constructed-response tests should include stipulation on matters such as
time limits
medium, manner, or length of the required response
whether access to materials or instruments pertinent to the test (e.g., textbooks, calculators, computers, etc.) is permitted
interviews, biographical data questionnaires, and behavioral observations are tools for the assessment of personality that often rely on open-minded responses
in personality testing proper, the use of constructed responses is limited mainly to projective techniques
Advantages
provide richer samples of the behavior of examinees and allow for their unique characteristics to emerge
open-minded items offer a wider range of possibilities and more creative approaches to testing and assessment than selected-response items
Disadvantages
related to score reliability, and as a consequence, to validity as well
scoring constructed responses, both in ability and personality tests, is always a more time consuming and complex matter than scoring selected responses because some degree of subjectivity is invariably necessary
there is always the possibility that a response will be evaluated differently by different scorers due to its uniqueness or to some other factor
test length, response length
Measurement: the assignment of numerals to objects or events according to rules
good research in psychology and social psychology depends on good measurement
understanding the concepts behind scaling would help up understand the ways on how we could assign numerical values in psychological measurement
in a psychological measurement, we use numerals to represent on individual level of a psychological battle
it is therefore true that numerals can represent psychological attributes in many different ways
these different ways can be described in the 3 properties of numerals: identity, order, and quantity
the most fundamental form of measurement is the ability to reflect “sameness” vs “differentness”
all people within a category must satisfy the property of identity
all people within a category must be “identical” with respect to the feature reflected by the category
in this case, numerals have no actual mathematical value
when making categorical differentiations between people, the distinction between people of different categories represent differences in quality rather than quantity
indicates the rank of people relative to each other along some dimension
when numerals are used to indicate order, they essentially serve as labels
when numerals have the property of quantity, they also provide information about the magnitude or differences between people
“0” is a strange number as it has various meanings
absolute zero
arbitrary zero
knowing what 0 means is essential in psychological measurement
a scale’s response format refers to the way in which items are presented and responses are obtained
includes the Likert scale and semantic differential scales
must consider the number of response options available. A minimum of 2 is required but having more has pros and cons as well
the use of midpoints is a common consideration in scale construction
Midpoints are presented with terms such as “neutral” or “neither agree nor disagree”, often achieved through an odd number of response options
may have pros and cons also
Some researchers might want to accommodate respondents who have no opinion about the item or who don’t know what their true perspectives are
avoid using neutral options as “I don’t know” responses
These responses might mean more than just lack of knowledge or opinion
You better focus on simplicity, clarity, and breath of the psychological dimension
In constructing a psychological scale, one should consider at least 2 issues regarding the consistency of response options across items
a scale’s items should have equal number of response options
the logical order of the response options should be consistent across items
Item content must reflect the intended psychological variables
the breadth of the variable must be reflected in the scale’s content
number of items should be considered for each construct to be measured, with each having its own set of items and receiving its own score; depends on several issues
longer scales have higher reliability
broadly-defined constructs may require more items reflecting more items reflecting narrowly-defined constructs
consider the context of administration or time-sensitive contexts
items and instructions should be relatively clear and easily understood
should entail little cognitive effort
avoid psychological (technical) jargons, double negatives, double-barreled items (reflecting 2 separate ideas)
as a general rule, scales should be “balanced” by including positively-keyed and negatively-keyed items
negatively-keyed items must be reversely scored