WHITE PAPER:
BUILDING OUR APPROACH TO
PERFORMANCE RATINGS
JAN 2022
PERFORMANCE RATINGS
Performance Rating /pəˈfɔːm(ə)ns ˈreɪtɪŋ/ noun.
1. A classification, score or ranking that is assigned to an individual to
represent the degree to which they display a particular
characteristic, e.g. performance, competency, potential, etc.
2. An item of HR data
“Without data, you’re just another person with an opinion”
W Edwards Deming
Management consultant, writer and academic
2
3 CRITICAL QUESTIONS
Question: Considerations:
1. Do our performance ratings • Construct validity
measure what they claim to • Content validity
measure?
2. How do we ensure ratings are • Validity and reliability
accurate and free from • Minimise bias
spurious error?
3. Is our performance rating • Business value
process efficient for the (benefit:cost ratio)
business? • User fatigue
3
THE IMPORTANCE OF DATA QUALITY
Without accurate and reliable HR data the results of any analysis, no
matter how sophisticated the model, will be flawed and will result in
faulty decision-making
*DUEDJHLQ *DUEDJHRXW
6RSKLVWLFDWHG+5DQDO\WLFV
GDWDP RGHOOLQJ
4
HR DATA – 3 ESSENTIAL REQUIREMENTS
“Consistency” - in the data over time and between different sources
• It is important that the data does not vary due to the spurious (non-causal) effects of data being
1. Reliability collected at different points in time or from different sources
• E.g. Do different people (managers, direct reports, peers) give different ratings when assessing the
same employee’s performance?
“Accuracy” – does the data measure what it’s supposed to measure?
• Construct validity – do variations in the data explain differences in the phenomenon (e.g. employee
performance) being assessed? Or are the differences the results of spurious effects and biases, e.g.
2. Validity ‒ Recency and primacy effects
‒ Leniency and severity effects
• Content validity – does the data capture all aspects of the dimension (e.g. performance) to be
measured?
“Value” – is the data worth collecting?
• The ratio of benefits/costs of collecting and measuring the data
‒ Benefits of collecting the data e.g. improved decision-making and performance, increased
3. Efficiency employee accountability and engagement
‒ Time required to collect the data
‒ Costs ($) and other resources required to collect and manage the data
‒ “Fatigue” – impact on employees and managers in providing the data
5
WHAT DO PERFORMANCE RATINGS REALLY MEASURE?
• Accurate and reliable ratings will measure the true ‘effect’ (job performance), whilst
minimising variance due to ‘error’ (non-performance factors)
• However, extensive research shows that performance ratings can be a poor indicator
of an employee’s true performance at work and should be interpreted with care
• A recent HBR article* showed that less than 40% of the variance in performance
ratings was explained by the employee’s performance
Actual employee
performance (‘effect’)
in the job accounts for
less than 40% of the Over 60% of the actual
variance in Performance
variance in
performance ratings accounts for just
performance ratings is
38% of the accounted for by
rating individual managers’
and raters’ peculiarities
* See Appendix 2 on slide 17 of perception (‘error’)
6
MULTIRATER/360’ FEEDBACK
• 360 or mutirater feedback is a form of performance assessment where feedback is sought
from multiple sources and ‘levels’, e.g. peers, direct reports, line managers, clients, etc
• 360 feedback is most commonly used in personal development planning to build a ‘rich’
picture of an individuals’ strengths and areas for growth. In this respect, qualitative comments
from raters can be more valuable than purely numeric ratings
• The use of 360 feedback ratings for ‘high stakes’ performance management is prone to a
number of difficulties and research has found that only one-third of the variance in such
ratings are attributable to the ratee’s performance. See slide 16
• The majority of variation is due to spurious factors including a desire to preserve good
relationships, status, limited information and attribution biases that can be exacerbated given
the implications of the 360-performance data for reward and promotion decisions. See slide 8
• The implementation of a firm-wide 360 feedback programme can have significant implications
for the organisational resources in terms of managers and participants’ time, administration
and ensuring that the feedback is effectively communicated and used by participants
7
SOURCES OF ERROR
There is a wide variety of ‘errors’ that can affect the accuracy and reliability of performance
ratings. Some of the most common error effects include:
Leniency & Severity Status & Relationships Halo & Horns
• The tendency to evaluate all people as • Tendency to inflate or suppress ratings • The tendency to make inappropriate
either outstanding or poor to preserve good superior-subordinate generalisations from one aspect of a
relations person’s job performance
• Results in inflated or suppressed ratings
than do not represent accurate • Avoiding ‘difficult truths’ to preserve • This is due to being influenced by one
assessments of performance the status quo in personal relationships outstanding characteristic or event,
at work either positive or negative
Attribution Bias Recency & Primacy Other
• Cognitive bias, often unconscious, that • Giving excessive weight to more recent Inconsistent goals - unclear, inconsistent
leads a person to make judgements or earlier events in the performance goals resulting in ‘garbage in, garbage out’
about others based on their own period
interpretation of the world, rather than Limited Information – the rater has
the reality of the situation • This unduly influences the assessment insufficient experience of the person being
of the person’s performance at other rated to give an accurate assessment
• Fundamental attribution error – times during the performance period
overemphasising the role of personality Central Tendency – tendency to evaluate
whilst minimising situational factors every person in the middle of the scale
8
RESEARCH FINDINGS
• Extensive research* has been conducted into the subject of performance ratings in
work settings
• This shows that, at best, performance ratings are an imprecise measure of employee
performance. In many cases, ratings explain less than 50% of ‘true’ job performance
• As such, users should be aware of their considerable limitations and avoid using
ratings with ‘False Precision’
• At the same time, we need to avoid ‘throwing the baby out with bath water’
• Ratings are a valuable HR data point and - used with care and thoughtfulness - can
play an important role in management decision-making regarding CarVal’s most
important asset – our people
* See Appendix 1 on slide 16
9
WHAT TO DO – IMPLICATIONS FOR PRACTICE
1. Avoid ‘False Precision’ - recognise the inherent limitations of HR data
2. Cross validate with other more manifest data – e.g. P&L data, completion rates, cost
savings, etc
3. Raise awareness and recognise the potential errors and bias inherent in the rating
process
4. Work to optimise validity and reliability, whilst recognising the limitations of HR
data
5. Invest time and effort in goal setting to ensure that individual performance is
assessed against clear and relevant standards
6. Avoid 360 feedback for the purpose of generating performance ratings (other than
personal development)
10
RATING DISTRIBUTIONS – DON’T ASSUME NORMALITY!
The Assumption vs. The Expectation vs. The Reality!
It is commonly assumed that job However, research shows human However, in reality, performance
performance is normally performance is actually more accurately ratings tend to be significantly
distributed modelled as a power law distribution negatively skewed – with the
and not a normal distribution. majority of employees being rated
‘above average’ - the opposite of
This follows the notion that the small what one would expect!
proportion of ‘top talent’ contribute
disproportionately to the business’
output and success; akin to Pareto's law
This clearly challenges the
This does not imply that most people are
fundamental validity of
‘poor’ performers – just that relatively
performance ratings
very few people are ‘outstanding’
performers. The vast majority of the
workforce are acceptable – just not stars
Lower Higher Lower Higher
performance performance Lower Higher performance performance
performance performance
11
BALANCING BENEFITS AND COSTS
• There is a compromise to be struck when investing in any performance
management system
• Any improvements in the accuracy and completeness of the systems outputs
need to be considered against the associated costs
• Critical question – how do we optimise the benefits whilst minimising the
costs to maintain the active engagement of the business and our people?
Benefits Costs
• Accuracy • Time
• Reliability • Fatigue
• Completeness • Demotivation
12
RECOMMENDATIONS
• Importance of agreeing robust and accurate (‘SMART’) goals from
the outset
1. Training • Raise awareness of bias/errors in the rating process
• How to manage a successful performance review discussion
• Recognise the inherent limitations of rating data
• Avoid using ratings based on precise decimal places (e.g. 3.8 vs. 3.9)
2. Avoid ‘false precision’ for decision-making
• Don’t rank across groups/depts/ teams based on decimal data – this
compounds biases and errors
• Don’t look at ratings in isolation
Cross-validate with other • Cross validate with other more objective and manifest performance
3. data, e.g. P&L, AUM, etc
performance data
• Integrate with work on Project Metrique
• Managers work together to share, challenge and cross-validate each
4. Calibration reviews others’ work in goal-setting and performance reviews and ratings
• Adopt on a pilot basis for developmental purposes
5. 360’ feedback • Avoid using 360’ ratings for performance rating purposes
13
REMEMBER…
…it’s people that manage people, not
data and systems!
14
APPENDICES
APPENDIX 1 –RESEARCH FINDINGS
• Job performance and its assessment have been extensively researched and investigated over
many decades. Some of the most widely cited studies are shown below.
• One of the most common research designs is to assess inter-rater reliability as a proxy for
contruct validity. This involves asking different raters (e.g. managers, peers, direct reports, etc)
to provide their performance ratings of the same person working on the same work task
Researchers: Study: Key Findings:
Scullen et al (2000) Upward performance feedback ratings to • Less than half the variation (42% on average) in performance ratings
managers. 2 large data sets (n = 4,492) were due to job performance
• Idiosyncratic rater effects, not related to employee performance
(‘error’), accounted for 58% of the variance in ratings
Viswesvaran et al Meta-analysis of the inter-rater reliability • Managers: inter-rater reliability of ratings, α = 0.56
(1996) of manager ratings and peer ratings • Peers: inter-rater reliability of ratings, α = 0.45
Conway and Huffcutt Meta-analysis (n = 28,999) of the inter- • Managers: inter-rater reliability of ratings, α = 0.51
(1997) rater reliability of subordinate,
• Peers: inter-rater reliability of ratings, α = 0.39
supervisor, and peer on job performance
• Subordinates: inter-rater reliability of ratings, α = 0.27
Rothstein (1990) Very large study (n=9,975) across 70 • Inter-rater reliabilities (α) of between 0.51 and 0.55
companies
Note: Cronbach’s alpha (α) as a measure of reliability:
α ≥ 0.9 0.9 > α ≥ 0.8 0.8 > α ≥ 0.7 0.7 > α ≥ 0.6 0.6 > α ≥ 0.5 α < 0.5
Excellent Good Acceptable Questionable Poor Unacceptable
16
APPENDIX 2 - SELECTED REFERENCES & FURTHER READING
Buckingham M & Goodall A (2015) ‘Reinventing Performance Management’, Harvard Business Review, April 2015.
Mckinsey & Co (2020) ‘How to be great at people analytics’, downloaded from [Link]
organizational-performance/our-insights
London M, Mone E & Scott J (2004) ‘Performance Management and Assessment: Methods for Improved Rater Accuracy and Goal Setting’, Human
Resource Management, Winter 2004.
Aguinis H & Bradley K (2015) ‘The secret sauce for organizational success: Managing and producing star performers’, Organizational Dynamics, 44.
Viswesvaran C, Ones D & Schmidt F (1996) ‘Comparative analysis of the reliability of job performance ratings’, Journal of Applied Psychology, 81.
Conway J & Huffcutt A (1997) ‘Psychometric properties of multisource performance ratings: A meta-analysis of subordinate, supervisor, peer, and
self-ratings’, Human Performance, 10.
Rothstein h (1990) ‘Interrater reliability of job performance ratings: growth to asymptote level with increasing opportunity to observe’ Journal of
Applied Psychology, 75.
Hoffman B, Lance C and Bynum B (2010) ‘Rater source effects are alive and well’ Personnel Psychology, 63.
Salgado F & Moscoso S (2019) ‘Meta-Analysis of Interrater Reliability of Supervisory Performance Ratings: Effects of Appraisal Purpose, Scale Type
and Range Restriction’, Frontiers in Psychology, 10.
Adler, S, Campion M, Colquitt, A, Grubb, A, Murphy K, Ollander-Krane R & Pulakos E (2016) ‘Getting rid of performance ratings: Genius or folly?’
Industrial and Organizational Psychology, 9
Mckinsey & Co (2020) ‘How effective goal setting motivates employees’, downloaded from [Link]
and-organizational-performance/our-insights/the-organization-blog
Greguras G & Robie C (1998) ‘A new look at within-source interrater reliability of 360-degree feedback ratings’ Journal of Applied Psychology, 83.
Pachidi S (2021) ‘People and Organisational Effectiveness: People Analytics’, materials from Cambridge University Executive Programme
Scullen S, Mount M & Goff M (2000) ‘Understanding the latent structure of job performance ratings’ Journal of Applied Psychology, 85.
17
APPENDIX 3 – HOW HR DATA MAY DIFFER FROM OTHER BUSINESS DATA
A lot of the HR data that we are interested in regards the ‘H’ – the human – in HR. Unlike
other business data, it focuses on abstract phenomenon that cannot be directly measured,
e.g. job performance, potential for promotion, etc
Latent variables Manifest variables
₋ Typical HR data ₋ Typical business performance data
₋ Latent variables are not directly ₋ Manifest variables can be directly
observable, rather they are assessed
by their effects on other variables that
are observable
vs. observed and measured. As such they
are typically more readily quantified
₋ Examples include P&L, expenses,
₋ A lot of HR data are latent variables, AUM, ROI, payback period, etc
e.g. employee performance,
engagement, potential, managerial
effectiveness, etc
“False Precision” – the tendency to treat HR data in the same way as more manifest business
data and to ignore those factors that can undermine its validity and reliability
18