UTILITY
- Refers to how useful a test is
- Refers to the practical value of using a test to aid in decision making
Factors That Affect a Test’s Utility
- Psychometric Soundness: reliability and validity of a test
o Test is said to be psychometrically sound for a particular purpose if reliability and
validity coefficients are acceptably high
o Index of utility: the practical value of the information derived from scores on the
test
o Test scores are said to have utility if their use in a particular situation helps us to
make better decisions—better, that is, in the sense of being more cost-effective
o The higher the criterion-related validity of test scores for making a particular
decision, the higher the utility of the test is likely to be
- Costs: disadvantages, losses, or expenses in both economic and noneconomic terms
o Allocate funds to purchase (1) a particular test, (2) a supply of blank test
protocols, and (3) computerized test processing, scoring, and interpretation from
the test publisher or some independent service
o Costs of testing may come in the form of (1) payment to professional personnel
and staff associated with test administration, scoring, and interpretation, (2)
facility rental, mortgage, and/or other charges related to the usage of the test
facility, and (3) insurance, legal, accounting, licensing, and other routine costs of
doing business
o Costs may be offset by revenue, such as fees paid by testtakers (clinics) or test
user’s funds from government grants/private donations (research orgs)
o Noneconomic costs are far more subtle and more significant despite the
improved costs
- Benefits: profits, gains, or advantages
o Financial returns in dollars and cents a successful testing program can yield
o Good working environment
o Beneficial to society at large
Utility Analysis
- other, less definable elements—such as prudence, vision, and, for lack of a better (or
more technical) term, common sense—must be ever-present in the process
- Family of techniques that entail a cost–benefit analysis designed to yield information
relevant to a decision about the usefulness and/or practical value of a tool of
assessment
- Umbrella term covering various possible methods, each requiring various kinds of data
to be inputted and yielding various kinds of output
- May be undertaken for the purpose of evaluating whether the benefits of using a test
(or training program or intervention) outweigh the costs
- Endpoint of a utility analysis is typically an educated decision about which of many
possible courses of action is optimal
- Can be decisions about (test):
o Preference of test
o Preference of assessment tool
o Additional tests or nah?
o No testing at all sksksk
- Can be decisions about (training/intervention)
o Preference of training program
o Preference of intervention
o Addition or subtraction of elements to an existing training program
o No training at all
o No intervention
Hits and Misses
- Hit: a correct classification
- Miss: an incorrect classification
- Hit Rate: The proportion of people that an assessment tool accurately identifies as
possessing or exhibiting a particular trait, ability, behavior, or attribute
- Miss rate: The proportion of people that an assessment tool inaccurately identifies as
possessing or exhibiting a particular trait, ability, behavior, or attribute
- False Positive: A specific type of miss whereby an assessment tool falsely indicates that
the testtaker possesses or exhibits a particular trait, ability, behavior, or attribute
- False Negative: A specific type of miss whereby an assessment tool falsely indicates that
the testtaker does not possess or exhibit a particular trait, ability, behavior, or attribute
How Is a Utility Analysis Conducted?
- Expectancy data: Provide an indication of the likelihood that a testtaker will score within
some interval of scores on a criterion measure
o Passing, acceptable, failing
o Taylor-Russell tables: provide an estimate of the extent to which inclusion of a
particular test in the selection system will improve selection
Provide an estimate of the percentage of employees hired by the use of a
particular test who will be successful at their jobs, given different
combinations of three variables: the test’s validity, the selection ratio
(numerical value that reflects the relationship between the number of
people to be hired and the number of people available to be hired) used,
and the base rate
Determining the increase over current procedures
Relationship between predictor (test) and the criterion (variable) must be
linear
Potential difficulty of identifying a criterion score that separates
“successful” from “unsuccessful” employees
o Naylor-Shine tables: obtaining the difference between the means of the selected
and unselected groups to derive an index of what the test (or some other tool of
assessment) is adding to already established procedures
Determines the increase in average score on some criterion measure
- Brodgen-Cronbach-Gleser formula: calculate the dollar amount of a utility gain resulting
from the use of a particular selection instrument under specified conditions
o Utility gain: estimate of the benefit (monetary or otherwise) of using a particular
test or selection method
U.G. = (N)(T)(rxy)(SDy)(Zm) − (N)(C)
o N= number of applicants selected per year
o T= represents the average length of time in the position (or, tenure)
o rxy= (criterion related) validity coefficient for the given predictor and criterion
o SDy= standard deviation of performance (in dollars) of employees
o z´m= mean (standardized) score on the test for selected applicants
o 2nd part of the formula represents the cost of testing
o C= cost of each applicant
o One recommended way to estimate SDy is by setting it equal to 40% of the mean
salary for the job
o Productivity gain: estimated increase in work output
o In order to check productivity gain just changed the SDy into SDp
P.G. = (N)(T)(rxy)(SDp)(Zm) − (N)(C)
- Decision theory and test utility
o Decision theory: provides guidelines for setting optimal cutoff scores
o Employers are reluctant to use decision-theory-based strategies in their hiring
practices because of the complexity of their application and the threat of legal
challenges
Some Practical Considerations
- The pool of job applicants: there will be a ready supply of viable applicants from which
to choose and fill positions
o There are certain jobs, however, that require such unique skills or demand such
great sacrifice that there are relatively few people who would even apply, let
alone be selected
o pool of possible job applicants for a particular type of position may vary with the
economic climate
o How many people would actually accept the employment position offered to
them even if they were found to be a qualified candidate
- The complexity of the job: the more complex the job, the more people differ on how
well or poorly they do that job
- The cut score in use
o Relative cut score: reference point that is set based on norm-related
considerations rather than on the relationship of test scores to a criterion
o Norm-referenced cut score: Type of cut score set with reference to the
performance of a group (or some target segment of a group)
o Fixed cut score: reference point—in a distribution of test scores used to divide a
set of data into two or more classifications—that is typically set with reference
to a judgment concerning a minimum level of proficiency required to be included
in a particular classification. Also called absolute cut scores
o Multiple cut scores: use of two or more cut scores with reference to one
predictor for the purpose of categorizing testtakers.
o Compensatory model of selection: assumption made that high scores on one
attribute can, in fact, “balance out” or compensate for low scores on another
attribute
Method for Setting Cut Scores
- Angoff Method: can be applied to personnel selection tasks as well as to questions
regarding the presence or absence of a particular trait, attribute, or ability; an expert
panel makes judgments concerning the way a person with that trait, attribute, or ability
would respond to test items. In both cases, the judgments of the experts are averaged
to yield cut scores for the test
- Known Groups Method/Method of contrasting groups: entails collection of data on the
predictor of interest from groups known to possess, and not to possess, a trait,
attribute, or ability of interest
o Determination of where to set the cutoff score is inherently affected by the
composition of the contrasting groups (no standard set of guidelines exist for
choosing contrasting groups)
- IRT Based Methods: each item is associated with a particular level of difficulty. In order
to “pass” the test, the testtaker must answer items that are deemed to be above some
minimum level of difficulty, which is determined by experts and serves as the cut score
o Item-mapping method: entails the arrangement of items in a histogram, with
each column in the histogram containing items deemed to be of equivalent
value. Judges who have been trained regarding minimal competence required
for licensure are presented with sample items from each column and are asked
whether or not a minimally competent licensed individual would answer those
items correctly about half the time. If so, that difficulty level is set as the cut
score; if not, the process continues until the appropriate difficulty level has been
selected
o Bookmark method: Expert places a “bookmark” between the two pages (or, the
two items) that are deemed to separate testtakers who have acquired the
minimal knowledge, skills, and/or abilities from those who have not. The
bookmark serves as the cut score
- Other Methods
o Method of predictive yield: technique for setting cut scores which took into
account the number of positions to be filled, projections regarding the likelihood
of offer acceptance, and the distribution of applicant scores
o Discriminant analysis (discriminant function analysis): typically used to shed
light on the relationship between identified variables and two naturally occurring
groups