[Psychomet] Validity, Bias, and Fairness

FINALS PERIOD

WEEK 10: VALIDITY, BIAS, & FAIRNESS


SYNONYMS OF “BIAS” OR “BIASED”

  • Bigotry

  • Favoritism

  • Leaning towards another

  • Preference

  • Prejudice

  • Intolerance

  • Tilted towards or on the opposite direction

  • Unfairness


Test Bias

  • For psychpmetricians, bias is a factor inherent in a test that systematically prevents accurate, impartial measurement (Cohen and Swerdrik, 2018)


When do we say that test bias exists?

  • When test items are “easier” for one group of people, than for another

  • When people with certain qualities perform better than those without such qualities, particularly when ths quality is NOT RELATED to the variable being measured

  • Reflects a systematic variation in test scores

  • Remember “systematic variance”? — the members of one group score higher on a certain variable (i.e., pare-pareho silang mataas sa isang variable);

  • In the same manner, the members of another group score lower on the same variable


Categories of Test Bias

  1. Construct Validity Bias

  • Refers to whether a test accurately measures what is was designed to measure

  • Ex: on an intelligence test like the SRA Verbal Test (which has items that make use of words in English), students who are still in the process of learning English may score low in the test, suggesting their low intellectual ability, when in fact what the test may possibly reflect is their weak English-language skills.


  1. Content Validity Bias

  • Occurs when the content of a test is comparatively more difficult for one group than for others

  • It can occur when members of a subgroup, such as various minority groups, have not been given the same opportunity to learn the material being tested (hence they would score low when the test contains such items)

  • It can occur when scoring is unfair to a group (ex” the answers that would make sense in one group’s culture are deemed incorrect). Among the Japanese, for instance. Abasement (or the tendency to readily admit one’s fault) is perceived to be positive, so they would be expected to score high in that traits.

  • It can occur when questions are worded in ways that are unfamiliar to certain groups because of linguistic or cultural differences. 

  • Item-selection bias: subcategory of this bias; the use of individual test items that are more suited to one group’s language and cultural experiences


  1. Predictive-validity Bias (bias in criterion-related validity)

  • Refers to a test’s accuracy in predicting how well a certain group will perform in the future 

  • Ex: a test would be considered “unbiased” if it predicted future academic and test performance equally well for all groups of students


Other factors that can give rise to test bias

  1. If the test developer is not demographically or culturally representative of the intended test takers, test items may reflect inadvertent bias

  • EX: if test developers are predominantly white, upper-middle class males, the resulting test could, due to cultural oversights, advantage demographically similar test takers and disadvantage others

  1. Norm-referenced tests may be biased if the “norming process” does not include representative samples of all the tested subgroups

  • EX: if the test developers do not include linguistically, culturally, and socioeconomically diverse students in the initial comparison groups (which are used to determine the norms used in the test), the resulting test could potentially disadvantage excluded groups

  1. Certain test formats may have an inherent bias toward some groups, at the expense of others

  • EX: evidence suggests that timed, multiple-choice tests may favor certain styles of thinking more characteristic of males than females, such as a willingness to risk guessing the right answer or questions that reflect black-and-white logic rather than nuanced logic

  1. The choice of language in test questions can introduce bias

  • EX: if idiomatic cultural expressions—such as “an old flame” or “an apples-and-oranges comparison”--- are used that may be unfamiliar to people who may not be too familiar with these words (i.e., apples and oranges may not be common in some places in the country)

  1. Tests may be considered biased if they include references to cultural detaills that are not familiar to particular groups

  • EX: those from tropical countries may never have experienced winter, snow, or a snow-related phenomenon, and may therefore not understand iterms with reference to such terms

EXAMPLES OF RATING ERRORS 

(Cohen and Swerdlik, 2018)

  1. Leniency/Generosity Error

  • An error in rating that arises from the tendency on the part of the rater to be lenient in scoring, marking, and/or grading

  1. Severity Error

  • An error in rating wherein the rater becomes overly strict and gives low ratings

  1. Central Tendency Error

  • Rater is reluctant to give extremely high or low ratings; ratings cluster a the middle of the continuum

—------- these errors are also called distribution errors or restriction-of-range rating errors —-----

  • One remedy to address these errors is to use rankings – a procedure that requires the rater to measure individuals against one another instead of against an absolute scale. By using rankings instead of ratings, the rater (not the “ranker”) is forced to select first, second, thirs choices, and so forth

  • Another remedy is to provide raters with a list of specific competencies to be evaluated, as well as how such evaluations for competency should be evaluated.


  1. Halo Effect

  • Tendency for a rater to give a particular ratee a higher rating that he/she objectively deserves because of the rater’s failure to discriminate among conceptually distinct and potentially independent aspects of a ratee’s behavior


ADDRESSING RATING ERRORS

  • Training programs to familiarize raters with common rating errors and sources of rater bias have shown promise in reducing rating errors and increasing measures of reliability and validity. 

  • Lecture, role playing, discussion, watching oneself on videotape, and computer simulation of different situations are some of the many techniques that could be brought to bear in such training programs


TEST FAIRNESS

  • The extent in which a test is used in an impartial, just, and equitable way (Cohen & Swerdlik, 2018)

  • Has to do with the appropriate use of test scores, and it is a social, philosophical, or perhaps legal term that represents one’s value judgment (Fuhr & Bacharach, 2014)

  • A test may be valid, but it can be used fairly or unfairly. The issue of test fairness, by itself, leads to a lot of debates and arguments

TEST BIAS AND TEST FAIRNESS COMPARED

Test bias is closely related to the issue of test fairness—i.e., do the social applications of test results have consequences that unfairly advantages or disadvantage certain groups of student?

  • College-admissions exmes often raise concerns about both test bias and test fairness, given their significant role in determining access to institutions of higher education, especially elite colleges and universities. EX: female students tend to score lower than males (possibly because of gender bias in test design), even though female students tend to earn higher grades in college on average (which possibly suggests evidence of predictive-validity bias)


  • There is evidence of a consistent connection between family income and scores on college-admissions exams, with higher-income students, on average, outscoring lower-income students

  • The fact that students can boost their scores considerably with tutoring or test coaching adds to the perception of socioeconomic unfairness, given that test preparation classes and services may be prohibitively expensive for many students. (Concerns about bias and unfairness are one contributing factor ina trend toward “test-optional” or “test-flexible” collegiate admissions policies”


Can Test Bias and (lack of) Test Fairness be avoided?


  • Very much like measurement error, some degree of bias and unfairness in testing may be avoidable. The inevitability of test bias and unfairness are among the reasons that many test developers and testing experts caution against making important decisions based on a single test result

  • Given the fact that test results continue to be widely used when making important decisions, text developers and experts have identified a number of strategies than can reduce, if not eliminate, test bias and unfairness…


  1. Striving for diversity in test-development staffing, and training test developers and scorers to be aware of the potential for cultural, linguistic, and socioeconomic bias

  2. Having test materials revied by experts trained in identifying cultural bias and by representatives of culturally and linguistically diverse subgroups

  3. Ensuring that norming processes and sample sized used to develop norm-referenced tests are inclusive of diverse subgroups and large enough to constitute a representative sample

  4. Eliminating items that produce the largest racial and cultural performance gaps, and selecting items that produce the smallest gaps—a technique known as “the golden rule” (This particular strategy may be logistically difficult to achieve. However, given the number of racial, ethnic, and cultural groups that may be represented in any given testing population

  5. Screening for and eliminating items, references, and tems that are more likely to be offensive to certain groups

  6. Translating tests into a test taker’s native language or using interpreters to translate test items

  7. Including more “performance-based” items to limit the role that language and word-choice plays in test performance

  8. Using multiple assessment measures to determine academic achievement and progress, and avoiding the use of test scores, in exclusion of other information, to make important decisions about students

  • Note: these recos are set to be more appropriate for tests in the educational setting, although they may be applied to other settings (i.e., industrial, clinical, etc.) as well