Utility

What is utility? In the context of testing and assessment, utility refers to the usefulness or practical value of a test in improving efficiency.

  • Testing – anything from a single test to a large-scale testing program that employs a battery of tests.

  • Utility – the usefulness or practical value of a training program or intervention, primarily on testing.

    • A test’s utility is often linked to its reliability (how consistent it is) and validity (how well it measures what it claims to measure).

  • A good test should not only be accurate but also practical and beneficial in real situations.

Factors that affect a test’s utility

  1. Psychometric soundness - refers to the reliability and validity of a test; a test is psychometrically sound if it is consistent and measures what it purports to measure

  2. Costs - refers to disadvantages, losses, or expenses in both economic and noneconomic terms.

    • When conducting a test, it is necessary to allocate funds for: particular test; a supply of blank test protocols; and computerized test processing, scoring, and interpretation from the test publisher or some independent service.

  3. Benefits - refers to profits, gains, or advantages that can be derived from the successful implementation of a test, including improved decision-making, enhanced recruitment processes, and better understanding of individual capabilities.

What is utility analysis? Utility analysis is a family of techniques that entail a cost-benefit analysis designed to yield information relevant to a decision about an assessment tool's usefulness and/or practical value.

  • This is a way to evaluate whether a test, training program or intervention is worth the cost. It helps decision-makers determine if a particular tool is effective and practical by weighing the benefits against the expenses.

  • Since utility analysis is a broad term, it includes different techniques, ranging from complex mathematical models to simple comparisons like "Which test gives us more value for our money?"

How is a utility analysis conducted? The way a utility analysis is conducted depends on the goal. One common method uses expectancy data, which helps predict outcomes based on test scores.

  • Expectancy data - This method organizes test results into an expectancy table—a simple chart showing the chances of a test-taker performing at different levels, such as passing, acceptable, or failing.

    • For example, if a company is testing a new hiring exam, an expectancy table can show that higher test scores are linked to better job performance. If the data proves that the test accurately predicts success, the company may decide to use it permanently to improve hiring decisions and boost productivity.

  • Top-down selection - is a process of awarding available positions to applicants whereby the highest scorer is awarded the first position, the next scorer the next position, and so on and so forth until all the positions are filled.

  • Hit - a correct classification; it implicates that the predictor successfully predicted performance on the criterion. (A qualified driver is hired, a not qualified driver is not hired)

  • Miss - an incorrect classification; it implicates that the predictor has not predicted performance on the criterion. (a qualified driver is not hired, an unqualified driver is hired)

  • Hit rate - The proportion of people that an assessment tool accurately identifies as possessing or exhibiting a particular trait, ability, behavior, or attribute

  • Miss rate - The proportion of people that an assessment tool inaccurately identifies as possessing or exhibiting a particular trait, ability, behavior, or attribute

  • False positive - A specific type of miss whereby an assessment tool falsely indicates that the testtaker possesses or exhibits a particular trait, ability, behavior, or attribute.

  • False negative - A specific type of miss whereby an assessment tool falsely indicates that the testtaker does not possess or exhibit a particular trait, ability, behavior, or attribute

In 1939, H.C Taylor and J.T Russel published tables that can be used as an aid for personnel directors in their decision-making chores: Taylor-Russel Tables provide an estimate of the extent to which the inclusion of a particular test in the selection system will improve selection. Its purpose, in simpler terms, is to estimate how much a test improves hiring decisions.

So, how does it work? It uses three factors:

  1. Test validity (How well the test predicts job performance)

  2. Selection ratio - a numerical value that reflects the relationship between the number of people to be hired and the number of people available to be hired. (How many people are hired vs. how many apply)

  3. Base rate - the percentage of people hired under the existing system for a particular position (The percentage of current employees who are successful)

*The higher the test validity and the lower the selection ratio, the more a test helps in hiring.

What are its advantages and limitations?

  • Advantage: easy to use for estimating hiring success.

  • Its limitation includes:

    • The relationship between the predictor (the test) and the criterion (performance rating) must be linear.

    • The difficulty of identifying a criterion score that separates “successful” and “unsuccessful” employees.

The limitations in the Taylor-Russell tables were addressed in an alternative set of tables developed in 1965 by Naylor and Shine: Naylor-Shine Table compares the average job performance scores between those hired with vs. without the test.

How It Works:

  • Measures the difference in average performance scores between employees selected using a test vs. those not selected using it.

  • Helps HR decide if a test actually improves hiring decisions.

The advantage of the Naylor-Shine table is that it is more precise than the Taylor-Russell table. However, it requires more historical data and additional calculations.

The Brogden-Cronbach-Gleser (BCG) Formula is a way to estimate the financial impact of selecting better employees using a test or assessment. It helps organizations determine how much money they can save or gain by improving their hiring decisions.

Think of it this way: If you hire better people, they perform better, which leads to higher productivity, efficiency, and profits. The BCG formula calculates the increase in value that comes from using a test to choose better candidates.

  • Utility Gain – an estimate of the benefit of using a particular test or selection method

  • Productivity Gain – an estimated increase in work input

It takes into account:

  • Validity of the test – How well the test predicts job performance.

  • Standard deviation of performance – How much employees differ in their performance.

  • Number of hires – How many people are hired using the test.

  • Improvement in average score – The difference in performance between those selected with and without the test.

Decision Theory in psychological testing helps employers make better hiring decisions by using statistics to determine which selection methods work best. The goal is to maximize job performance and minimize hiring mistakes.

Key Ideas from Cronbach & Gleser (1965):

  • Types of Decision Problems: Understanding different hiring challenges.

  • Selection Strategies: Using tests in one-step or multi-step hiring processes.

  • Test Utility Factors: How test accuracy, selection ratio, and costs impact hiring success.

  • Adaptive Treatment: Instead of fitting people into rigid job roles, adjust job expectations based on applicants' skills.

Decision Theory in Hiring

A hiring test divides candidates into four groups based on whether they pass or fail and whether they actually perform well on the job:

  • True Positives – Passed the test and performed well.

  • True Negatives – Failed the test and would have underperformed.

  • False Positives – Passed the test but failed on the job. (Bad hire)

  • False Negatives – Failed the test but could have done well. (Missed opportunity)

Example: If 90% of applicants are hired, false positives (bad hires) will be high. If only 5% are hired, false negatives (missed talent) will be high.

Why Test Utility Matters

  • Test utility measures how valuable a hiring test is in terms of improving hiring success and saving money.

  • Studies found that using intelligence tests instead of unstructured interviews boosted productivity and cut payroll costs.

Even though decision-theory-based hiring could improve efficiency and save money, many companies avoid it because: it’s complex to apply; legal risks (discrimination lawsuits); many companies stick to traditional hiring methods (interviews, resumes).

Some Practical Considerations

  1. The pool of job applicants: There will always be a large number of job applicants waiting to be evaluated and fill positions. However, some jobs require unique skills or demand great sacrifices, making them less appealing to many candidates. Additionally, the availability of job seekers fluctuates with economic conditions—there tend to be more applicants during periods of high unemployment and fewer when employment rates are high.

  2. The complexity of the job: The more complex the job, the more people differ on how well or poorly they do that job.

  3. The cut score in use: a set of reference point derived as a result of a judgment and used to divide a set of data into two or more classifications (this is made to establish an inference for the basis of the classification).

Different Types of Cut Scores

  1. Relative cut score - used to divide a set of data into two or more classifications- that is set based on norm-related considerations rather than on the relationship of test scores to a criterion; also referred to as norm-referenced cut score

  2. Fixed cut score - this is a contrast to a norm-referenced cut score that is typically set with reference to a judgment concerning a minimum level of proficiency required to be included in a particular classification; also referred to as absolute cut scores

  3. Multiple cut score - refers to the use of two or more cut scores with reference to one predictor for the purpose of categorizing testtakers.

    • Under multiple cut scores is the concept called “hurdles,” which refers to a sequential or staged evaluation process where candidates must meet specific cut scores at different stages to proceed, it is often called “The Multiple Hurdle Approach

    • The Compensatory Model of Selection, unlike the multiple hurdle approach, where candidates must meet a minimum score at each stage, the compensatory approach model allows for a balanced evaluation of overall performance. A high score in one area can offset a lower score in another.

Methods for Setting Cut Scores

  • Angoff Method for setting fixed cut scores can be applied to personnel selection tasks as well as to questions regarding the presence or absence  of a particular trait, attribute, or ability

  • Known Groups Method entails collection of data on the predictor of interest from groups known to possess, and not to possess, a trait, attribute, or ability of interest; also referred to as the method of contrasting groups

  • Item-Mapping Method – a technique that has found application in setting cut scores for licensing examinations

  • Bookmark Method – more typically used in academic applications; begins with the training of experts with regard to the minimal knowledge, skills,  and/or abilities that testtakers should possess in order to “pass”

  • Method of Predictive Yield – a technique for setting cut scores that took into account the number of positions to be  filled, projections regarding the likelihood of offer acceptance, and the distribution of applicant  scores

  • Discriminant Analysis (discriminant function analysis) – used to shed light on the relationship between identified variables (such as scores on a battery of tests) and two (and in some cases more) naturally occurring groups (such as persons judged to be successful at a job and persons judged unsuccessful at a job).

robot