[Psychomet] Validity, Bias, and Fairness

FINALS PERIOD

WEEK 10: VALIDITY, BIAS, & FAIRNESS

SYNONYMS OF “BIAS” OR “BIASED”

Bigotry
Favoritism
Leaning towards another
Preference
Prejudice
Intolerance
Tilted towards or on the opposite direction
Unfairness

Test Bias

For psychpmetricians, bias is a factor inherent in a test that systematically prevents accurate, impartial measurement (Cohen and Swerdrik, 2018)

When do we say that test bias exists?

When test items are “easier” for one group of people, than for another
When people with certain qualities perform better than those without such qualities, particularly when ths quality is NOT RELATED to the variable being measured
Reflects a systematic variation in test scores

Remember “systematic variance”? — the members of one group score higher on a certain variable (i.e., pare-pareho silang mataas sa isang variable);
In the same manner, the members of another group score lower on the same variable

Categories of Test Bias

Construct Validity Bias

Refers to whether a test accurately measures what is was designed to measure

Ex: on an intelligence test like the SRA Verbal Test (which has items that make use of words in English), students who are still in the process of learning English may score low in the test, suggesting their low intellectual ability, when in fact what the test may possibly reflect is their weak English-language skills.

Content Validity Bias

Occurs when the content of a test is comparatively more difficult for one group than for others

It can occur when members of a subgroup, such as various minority groups, have not been given the same opportunity to learn the material being tested (hence they would score low when the test contains such items)
It can occur when scoring is unfair to a group (ex” the answers that would make sense in one group’s culture are deemed incorrect). Among the Japanese, for instance. Abasement (or the tendency to readily admit one’s fault) is perceived to be positive, so they would be expected to score high in that traits.
It can occur when questions are worded in ways that are unfamiliar to certain groups because of linguistic or cultural differences.

Item-selection bias: subcategory of this bias; the use of individual test items that are more suited to one group’s language and cultural experiences

Predictive-validity Bias (bias in criterion-related validity)

Refers to a test’s accuracy in predicting how well a certain group will perform in the future

Ex: a test would be considered “unbiased” if it predicted future academic and test performance equally well for all groups of students

Other factors that can give rise to test bias

If the test developer is not demographically or culturally representative of the intended test takers, test items may reflect inadvertent bias

EX: if test developers are predominantly white, upper-middle class males, the resulting test could, due to cultural oversights, advantage demographically similar test takers and disadvantage others

Norm-referenced tests may be biased if the “norming process” does not include representative samples of all the tested subgroups

EX: if the test developers do not include linguistically, culturally, and socioeconomically diverse students in the initial comparison groups (which are used to determine the norms used in the test), the resulting test could potentially disadvantage excluded groups

Certain test formats may have an inherent bias toward some groups, at the expense of others

EX: evidence suggests that timed, multiple-choice tests may favor certain styles of thinking more characteristic of males than females, such as a willingness to risk guessing the right answer or questions that reflect black-and-white logic rather than nuanced logic

The choice of language in test questions can introduce bias

EX: if idiomatic cultural expressions—such as “an old flame” or “an apples-and-oranges comparison”--- are used that may be unfamiliar to people who may not be too familiar with these words (i.e., apples and oranges may not be common in some places in the country)

Tests may be considered biased if they include references to cultural detaills that are not familiar to particular groups

EX: those from tropical countries may never have experienced winter, snow, or a snow-related phenomenon, and may therefore not understand iterms with reference to such terms

EXAMPLES OF RATING ERRORS

(Cohen and Swerdlik, 2018)

Leniency/Generosity Error

An error in rating that arises from the tendency on the part of the rater to be lenient in scoring, marking, and/or grading

Severity Error

An error in rating wherein the rater becomes overly strict and gives low ratings

Central Tendency Error

Rater is reluctant to give extremely high or low ratings; ratings cluster a the middle of the continuum

—------- these errors are also called distribution errors or restriction-of-range rating errors —-----

One remedy to address these errors is to use rankings – a procedure that requires the rater to measure individuals against one another instead of against an absolute scale. By using rankings instead of ratings, the rater (not the “ranker”) is forced to select first, second, thirs choices, and so forth
Another remedy is to provide raters with a list of specific competencies to be evaluated, as well as how such evaluations for competency should be evaluated.

Halo Effect

Tendency for a rater to give a particular ratee a higher rating that he/she objectively deserves because of the rater’s failure to discriminate among conceptually distinct and potentially independent aspects of a ratee’s behavior

ADDRESSING RATING ERRORS

Training programs to familiarize raters with common rating errors and sources of rater bias have shown promise in reducing rating errors and increasing measures of reliability and validity.
Lecture, role playing, discussion, watching oneself on videotape, and computer simulation of different situations are some of the many techniques that could be brought to bear in such training programs

TEST FAIRNESS

The extent in which a test is used in an impartial, just, and equitable way (Cohen & Swerdlik, 2018)
Has to do with the appropriate use of test scores, and it is a social, philosophical, or perhaps legal term that represents one’s value judgment (Fuhr & Bacharach, 2014)
A test may be valid, but it can be used fairly or unfairly. The issue of test fairness, by itself, leads to a lot of debates and arguments

TEST BIAS AND TEST FAIRNESS COMPARED

Test bias is closely related to the issue of test fairness—i.e., do the social applications of test results have consequences that unfairly advantages or disadvantage certain groups of student?

College-admissions exmes often raise concerns about both test bias and test fairness, given their significant role in determining access to institutions of higher education, especially elite colleges and universities. EX: female students tend to score lower than males (possibly because of gender bias in test design), even though female students tend to earn higher grades in college on average (which possibly suggests evidence of predictive-validity bias)

There is evidence of a consistent connection between family income and scores on college-admissions exams, with higher-income students, on average, outscoring lower-income students
The fact that students can boost their scores considerably with tutoring or test coaching adds to the perception of socioeconomic unfairness, given that test preparation classes and services may be prohibitively expensive for many students. (Concerns about bias and unfairness are one contributing factor ina trend toward “test-optional” or “test-flexible” collegiate admissions policies”

Can Test Bias and (lack of) Test Fairness be avoided?

Very much like measurement error, some degree of bias and unfairness in testing may be avoidable. The inevitability of test bias and unfairness are among the reasons that many test developers and testing experts caution against making important decisions based on a single test result
Given the fact that test results continue to be widely used when making important decisions, text developers and experts have identified a number of strategies than can reduce, if not eliminate, test bias and unfairness…

Striving for diversity in test-development staffing, and training test developers and scorers to be aware of the potential for cultural, linguistic, and socioeconomic bias
Having test materials revied by experts trained in identifying cultural bias and by representatives of culturally and linguistically diverse subgroups
Ensuring that norming processes and sample sized used to develop norm-referenced tests are inclusive of diverse subgroups and large enough to constitute a representative sample
Eliminating items that produce the largest racial and cultural performance gaps, and selecting items that produce the smallest gaps—a technique known as “the golden rule” (This particular strategy may be logistically difficult to achieve. However, given the number of racial, ethnic, and cultural groups that may be represented in any given testing population
Screening for and eliminating items, references, and tems that are more likely to be offensive to certain groups
Translating tests into a test taker’s native language or using interpreters to translate test items
Including more “performance-based” items to limit the role that language and word-choice plays in test performance
Using multiple assessment measures to determine academic achievement and progress, and avoiding the use of test scores, in exclusion of other information, to make important decisions about students

Note: these recos are set to be more appropriate for tests in the educational setting, although they may be applied to other settings (i.e., industrial, clinical, etc.) as well