0% found this document useful (0 votes)

29 views24 pages

Measuring Information Retrieval Effectiveness

Uploaded by

Dawit Sebhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views24 pages

Measuring Information Retrieval Effectiveness

Uploaded by

Dawit Sebhat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Retrieval Effectiveness

• Evaluation of IR systems
• Relevance judgement
• Performance measures
• Recall, Precision
• Single-valued measures
• User centred measures
Why System Evaluation?
• Any systems needs validation and verification
–Check whether the system is right or not
–Check whether it is the right system or not
• It provides the ability to measure the difference between
IR systems
–How well do our search engines work?
–Is system A better than B? Under what conditions?
• Evaluation drives what to study
–Identify techniques that work well and do not work
–There are many retrieval models/algorithms. which one is the
best?
–What is the best component for:
• Index term selection (tokenization, stop-word removal,
stemming, normalization…)
• Term weighting (TF, IDF, TF*IDF, P(R|D)…)
• Similarity measures (cosine, Euclidean, string editing…)
2
Evaluation Criteria
What are the main evaluation measures to check the
performance of an IR system?
• Efficiency
– Time and space complexity
❑ Speed in terms of retrieval time, indexing time, query
processing time
❑ The space taken by corpus vs. index file
• Index size: determine Index/corpus size ratio
• Is there a need for compression
• Effectiveness
–How is a system capable of retrieving relevant documents
from the collection: precision & recall?
–Is system X better than other systems?
–User satisfaction: How “good” are the documents that are
returned as a response to user query?
–Relevance of results to meet information need of users
Types of Evaluation Strategies
•User-centered evaluation
– Given several users, and at least two retrieval
systems
• Have each user try the same task on both systems
• Measure which system works the “best” for users
information need
• How to measure users satisfaction?

•System-centered evaluation
– Given documents, queries, and relevance
judgments
• Try several variations of the system
• Measure which system returns the “best” hit list
The Notion of Relevance Judgment
• Relevance is the measure of a correspondence
existing between a document and query.
–Construct document - query matrix as determined by:
(i) the user who posed the retrieval problem;
(ii) an external judge;
(iii) information specialist
–Is the relevance judgment made by users and external
person the same ?

• Relevance judgment is usually:

– Subjective: depends upon a specific user’s judgment.
– Situational: relates to user’s current needs.
– Cognitive: depends on human perception and behavior.
– Dynamic: changes over time.
Measuring Retrieval Effectiveness
Relevant Irrelevant
Metrics often “Type one
used to evaluate
effectiveness of
the system
Retrieved
A B error”

Not
retrieved C D
“Type two error”

• Retrieval of documents may result in:

–False positive (Errors of omission): some irrelevant
documents may be retrieved by the system as relevant.
–False negative (False drop or Errors of commission):
some relevant documents may not be retrieved by the system
as irrelevant.
–For many applications a good index should not permit any
false drops, but may permit a few false positives.
Measuring Retrieval Effectiveness
Relevant Not
relevant Collection size = A+B+C+D
Retrieved A B Relevant = A+C
Retrieved = A+B
Not retrieved C D

| {Relevant} {Retrieved} |
Re call =
| {Relevant} |
Relevant +
Relevant
Retrieved Retrieved
| {Relevant}  {Retrieved} |
Pr ecision =
| {Retrieved} |
Irrelevant + Not Retrieved

• When is precision important? When is recall important?

Example
Assume that there are a total of 10 relevant document
Ranking Relevance Recall Precision
1. Doc. 50 R 0.10 1.00
2. Doc. 34 NR 0.10 0.50
3. Doc. 45 R 0.20 0.67
4. Doc. 8 NR 0.20 0.50
5. Doc. 23 NR 0.20 0.40
6. Doc. 16 NR 0.20 0.33
7. Doc. 63 R 0.30 0.43
8. Doc 119 R 0.40 0.50
9. Doc 21 NR 0.40 0.44
10. Doc 80 R 0.50 0.50
Graphing Precision and Recall
• Plot each (recall, precision) point on a graph
• Recall is a non-decreasing function of the number of
documents retrieved,
–Precision usually decreases (in a good system)
• Precision/Recall tradeoff
– Can increase recall by retrieving many documents (down to
a low level of relevance ranking),
•but many irrelevant documents would be fetched, reducing precision
– Can get high recall (but low precision) by retrieving all
documents for all queries
1 The ideal
Precision

Returns relevant
documents but
misses many Returns most relevant
useful ones too documents but includes
0 lots of junk
Recall 1
Need for Interpolation
•Two issues:
–How do you compare performance across
queries?
–Is the sawtooth shape intuitive of what’s going on?
1

0.8

0.6
Precision

0.4

0.2

0
0 0.2 0.4 0.6 0.8 1
Recall

Solution: Interpolation!
Interpolate a precision value for each standard recall level
Interpolation
• It is a general form of precision/recall calculation
• Precision change w.r.t. Recall (not a fixed point)
– It is an empirical fact that on average as recall increases,
precision decreases
• Interpolate precision at 11 standard recall levels:
– rj {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0},
where j = 0 …. 10
• The interpolated precision at the j-th standard recall
level is the maximum known precision at any recall
level between the jth and (j + 1)th level:

P(rj ) = max P(r )

r j  r  r j +1
Result of Interpolation

0.8

0.6
Precision

0.4

0.2

0
0 0.1 0.2 0.3 0.4 0.5
Recall
Exercise
• Let total number of relevant documents = 6, compute recall and
precision for each cut off point n:
n doc # relevant Recall Precision
1 588 x 0.167 1
2 589 x 0.333 1
3 576
4 590 x 0.5 0.75
5 986
6 592 x 0.667 0.667
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x 0.833 0.38
14 990
Missing one relevant document. Never reach 100% recall 13
Interpolating a Recall/Precision
Curve: Exercise
Precision

1.0

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1.0

Recall

14
Computing Recall/Precision Points:
Exercise 2
n doc # relevant
Let total # of relevant docs = 6
1 588 x
Check each new recall point:
2 576
3 589 x
R=1/6=0.167;
4 342
P=1/1=1
5 590 x
R=2/6=0.333;
6 717 P=2/3=0.66
7 984 R=3/6=0.5;
7
8 772 x P=3/5=0.6
9 321 x R=4/6=0.667;
10 498 P=4/8=0.5
11 113 R=5/6=0.833;
12 628 P=5/9=0.556
13 772
14 592 x R=6/6=1.0;
15 p=6/14=0.42
9
Interpolating a Recall/Precision Curve:
Exercise 2
Precision

1.0

0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 1.0

Recall

16
Interpolating across queries
• For each query, calculate precision at 11 standard
recall levels
• Compute average precision at each standard recall
level across all queries.
• Plot average precision/recall curves to evaluate
overall system performance on a document/query
corpus.

• Average precision favors systems which produce

relevant documents high in rankings
Single-valued measures
• Single value measures: calculate a single value for
each query to evaluate performance
– Mean Average Precision at seen relevant
documents
• Typically average performance over a large set of
queries.
– R-Precision
• Precision at Rth relevant documents
MAP (Mean Average Precision)
• Computing mean average for more than one query
MAP =   ( ) 
1 1 j
n Qi | Ri | D j Ri rij
– rij = rank of the jth relevant document for Qi
– |Ri| = number of relevant documents for Qi
– n = number of test queries
• E.g. Assume there are 3 relevant documents for Query 1 and 2
for query 2. Calculate MAP?
Relevant Docs. retrieved Query 1 Query 2
1st rel. doc. 1 4
2nd rel. doc. 5 8
3rd rel. doc. 10 -
1 1 1 2 3 1 1 2
MAP = [ ( + + ) + ( + )]
2 3 1 5 10 2 4 8
R- Precision
• Precision at the R-th position in the ranking of results
for a query, where R is the total number of relevant
documents.
– Calculate precision after R documents are seen
– Can be averaged over all queries
n doc # relevant
1 588 x
2 589 x R = # of relevant docs = 6
3 576
4 590 x
5 986
6 592 x
7 984
R-Precision = 4/6 = 0.67
8 988
9 578
10 985
11 103
12 591
13 772 x
20
14 990
F-Measure
• One measure of performance that takes into
account both recall and precision.
• Harmonic mean of recall and precision:

2 PR 2
F= = 1 1
P + R R+P
• Compared to arithmetic mean, both need to be high
for harmonic mean to be high.
• What if no relevant documents exist?

21
E Measure
• Associated with Van Rijsbergen
• Allows user to specify importance of recall and
precision
• It is parameterized F Measure. A variant of F
measure that allows weighting emphasis on precision
over recall:
(1 +  ) PR (1 +  )
2 2
E= = 2 1
 P+R
2
+
R P

• Value of  controls trade-off:

– = 1: Equal weight for precision and recall (E=F).
– > 1: Weight recall more. It emphasizes recall.
– < 1: Weight precision more. It emphasizes precision.
22
Problems with both precision and recall
• Number of irrelevant documents in the collection is not taken
into account.
• Recall is undefined when there is no relevant document in
the collection.
• Precision is undefined when no document is retrieved.

Other measures
• Noise = retrieved irrelevant docs / retrieved docs
• Silence/Miss = non-retrieved relevant docs / relevant docs
– Noise = 1 – Precision; Silence = 1 – Recall

| {Relevant}  {NotRetrieved} |
Miss =
| {Relevant} |
| {Retrieved} {NotRelevant} |
Fallout =
| {NotRelevant} | 23
Programming Assignment (Due date: ____)
• Design an IR system for one of the local language following the
principle discussed in class.
– Form a group having up to three members
1. Construct inverted file (vocabulary file & posting file)
– Taking N document corpus, generate content-bearing index terms and
organize them using inverted file indexing structure, include frequencies
(TF, DF, CF) & position/location information of terms in each document.
[Link] Vector space retrieval model
– Implement a Vector space model that retrieve relevant documents in
ranking order
3. Test your system using five queries (with three to six words)
and report its performance
• Required: write a publishable report that have abstract (½ page),
introduction, problem statement & objective (1 page), literature
review (2 pages), methods used & architecture of your system (1
page) & experimentation (test results & findings) (2-3 pages),
concluding remarks with one basic recommendation (1 page) and
References (1 page).

Multi-Hop Retrieval Evaluation Insights
No ratings yet
Multi-Hop Retrieval Evaluation Insights
28 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
26 pages
Evaluating Information Retrieval Effectiveness
No ratings yet
Evaluating Information Retrieval Effectiveness
20 pages
IR Chapter V
No ratings yet
IR Chapter V
44 pages
Measuring Information Retrieval Effectiveness
No ratings yet
Measuring Information Retrieval Effectiveness
41 pages
IR System Evaluation Metrics Explained
No ratings yet
IR System Evaluation Metrics Explained
13 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
20 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
36 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
36 pages
Precision and Recall in IR Evaluation
No ratings yet
Precision and Recall in IR Evaluation
20 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
80 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
32 pages
Evaluating Modern Information Retrieval
No ratings yet
Evaluating Modern Information Retrieval
58 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
18 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
41 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
108 pages
9-Rank-Based Evaluation Measures and Result Summaries
No ratings yet
9-Rank-Based Evaluation Measures and Result Summaries
49 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
45 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
31 pages
Retrieval Evaluation in Information Systems
No ratings yet
Retrieval Evaluation in Information Systems
14 pages
Retrieval Evaluation Metrics Explained
No ratings yet
Retrieval Evaluation Metrics Explained
34 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
76 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
46 pages
Retrieval Performance Evaluation Metrics
No ratings yet
Retrieval Performance Evaluation Metrics
31 pages
Search Engine Evaluation Metrics Guide
No ratings yet
Search Engine Evaluation Metrics Guide
49 pages
IR Evaluation Methods and Metrics
No ratings yet
IR Evaluation Methods and Metrics
28 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
28 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
37 pages
Precision and Recall in Information Retrieval
No ratings yet
Precision and Recall in Information Retrieval
6 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
36 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
79 pages
Evaluation Metrics for Information Retrieval
No ratings yet
Evaluation Metrics for Information Retrieval
6 pages
IR System Evaluation Metrics
No ratings yet
IR System Evaluation Metrics
25 pages
Unit 3 (Isr)
No ratings yet
Unit 3 (Isr)
9 pages
IR Performance Evaluation Study Guide
No ratings yet
IR Performance Evaluation Study Guide
64 pages
IR Chapt 5
No ratings yet
IR Chapt 5
55 pages
Unit 3
No ratings yet
Unit 3
16 pages
Evaluating Information Retrieval Systems
No ratings yet
Evaluating Information Retrieval Systems
9 pages
Information Retrieval Evaluation Strategies
No ratings yet
Information Retrieval Evaluation Strategies
63 pages
Understanding Retrieval Model Performance
No ratings yet
Understanding Retrieval Model Performance
23 pages
Result Presentation in Information Retrieval
No ratings yet
Result Presentation in Information Retrieval
45 pages
Chapter Five
No ratings yet
Chapter Five
10 pages
Unit III Notes
No ratings yet
Unit III Notes
4 pages
Information Retrieval Evaluation Metrics
No ratings yet
Information Retrieval Evaluation Metrics
63 pages
Information Retrieval Models & Evaluation
No ratings yet
Information Retrieval Models & Evaluation
58 pages
8-Evaluation Measures
No ratings yet
8-Evaluation Measures
34 pages
Understanding Search Engine Indexing
No ratings yet
Understanding Search Engine Indexing
30 pages
Information Retrieval Evaluation Methods
No ratings yet
Information Retrieval Evaluation Methods
50 pages
Retrieval Evaluation Metrics in IR
No ratings yet
Retrieval Evaluation Metrics in IR
54 pages
Retrieval Evaluation Techniques
No ratings yet
Retrieval Evaluation Techniques
7 pages
L09
No ratings yet
L09
39 pages
User-Oriented Measures in IR System Evaluation
No ratings yet
User-Oriented Measures in IR System Evaluation
12 pages
Relevance Metrics in Software Engineering
No ratings yet
Relevance Metrics in Software Engineering
13 pages
CHAP 6-Evaluation-In-Ir
No ratings yet
CHAP 6-Evaluation-In-Ir
24 pages
Evaluating Information Retrieval Performance
No ratings yet
Evaluating Information Retrieval Performance
52 pages
Unit3 ISR
No ratings yet
Unit3 ISR
15 pages
IR Evaluation Techniques and Metrics
No ratings yet
IR Evaluation Techniques and Metrics
22 pages
Introduction to Information Retrieval
No ratings yet
Introduction to Information Retrieval
18 pages
Network Device Setup and Management Guide
No ratings yet
Network Device Setup and Management Guide
83 pages
Introduction to Multimedia Systems
No ratings yet
Introduction to Multimedia Systems
101 pages
Information Assurance & Cybersecurity Essentials
No ratings yet
Information Assurance & Cybersecurity Essentials
30 pages
Understanding Subnetting and IP Addressing
No ratings yet
Understanding Subnetting and IP Addressing
58 pages
Network Troubleshooting Fundamentals
No ratings yet
Network Troubleshooting Fundamentals
90 pages
ICANN's Role in DNS Management
No ratings yet
ICANN's Role in DNS Management
60 pages
PHP and MySQL for Dynamic Websites
No ratings yet
PHP and MySQL for Dynamic Websites
29 pages
Intellectual Property Rights in IT
No ratings yet
Intellectual Property Rights in IT
25 pages
Year Two ICT Semester One Exam Schedule
No ratings yet
Year Two ICT Semester One Exam Schedule
2 pages
IP Addressing and Network Configuration
No ratings yet
IP Addressing and Network Configuration
38 pages
Ethics in IT: Professional Responsibilities
No ratings yet
Ethics in IT: Professional Responsibilities
25 pages
Research Problem Formulation in IT
No ratings yet
Research Problem Formulation in IT
26 pages
AI Problem Solving: Search Strategies
No ratings yet
AI Problem Solving: Search Strategies
52 pages
Chemistry Concepts and Questions Guide
No ratings yet
Chemistry Concepts and Questions Guide
2 pages
Understanding Research Papers and Abstracts
No ratings yet
Understanding Research Papers and Abstracts
21 pages
Research Design and Sampling Methods
100% (1)
Research Design and Sampling Methods
30 pages
Data Analysis and Presentation Techniques
No ratings yet
Data Analysis and Presentation Techniques
41 pages
IT Research Proposal Guidelines
No ratings yet
IT Research Proposal Guidelines
20 pages
Remedial Mathematics Model Exam
No ratings yet
Remedial Mathematics Model Exam
4 pages
Emerging Technology Mid Exam 2016
100% (8)
Emerging Technology Mid Exam 2016
4 pages
Alliance College IT Final Exam 2016
No ratings yet
Alliance College IT Final Exam 2016
7 pages
Final Exam: Moral & Citizenship Ed.
100% (1)
Final Exam: Moral & Citizenship Ed.
6 pages
Architectural Programming Research Guide
No ratings yet
Architectural Programming Research Guide
53 pages
Ratio and Proportion Practice Questions
No ratings yet
Ratio and Proportion Practice Questions
1 page
CDM Service Charges Overview
No ratings yet
CDM Service Charges Overview
5 pages
Historical-Developmental Theory in Archaeology
No ratings yet
Historical-Developmental Theory in Archaeology
98 pages
Exam Jan-2023: Corrosion & Inspection Insights
100% (4)
Exam Jan-2023: Corrosion & Inspection Insights
15 pages
Toyota 8FG25B Illustrated Parts Index
100% (7)
Toyota 8FG25B Illustrated Parts Index
316 pages
Leymann Workplace Harassment Inventory
No ratings yet
Leymann Workplace Harassment Inventory
4 pages
Understanding Canada's Food Guide
No ratings yet
Understanding Canada's Food Guide
3 pages
Electric Vehicle Conversion Optimization
No ratings yet
Electric Vehicle Conversion Optimization
7 pages
Cold DM
No ratings yet
Cold DM
4 pages
Pondicherry University Exam Papers Guide
No ratings yet
Pondicherry University Exam Papers Guide
184 pages
IoT vs SCADA: A Comparative Analysis
No ratings yet
IoT vs SCADA: A Comparative Analysis
14 pages
Literature Review on Women's Empowerment
No ratings yet
Literature Review on Women's Empowerment
3 pages
DSC PowerSeries Neo Expansion Modules
No ratings yet
DSC PowerSeries Neo Expansion Modules
3 pages
Insulation Tester Calibration Certificate
No ratings yet
Insulation Tester Calibration Certificate
22 pages
Understanding Assertiveness Skills
No ratings yet
Understanding Assertiveness Skills
56 pages
Customer Focus in Quality Management
No ratings yet
Customer Focus in Quality Management
34 pages
Three-Phase Separator Sizing Guide
No ratings yet
Three-Phase Separator Sizing Guide
28 pages
Numerical Investigation of The Flow Dynamics Past A Three Element Aerofoil
No ratings yet
Numerical Investigation of The Flow Dynamics Past A Three Element Aerofoil
44 pages
History and Limits of GDP Measurement
No ratings yet
History and Limits of GDP Measurement
8 pages
Lecture 1
No ratings yet
Lecture 1
15 pages
(Ebook) The Palgrave Handbook of African Women's Studies by Olajumoke Yacob-Haliso Toyin Falola ISBN 9783030280987, 3030280985 Available Any Format
100% (2)
(Ebook) The Palgrave Handbook of African Women's Studies by Olajumoke Yacob-Haliso Toyin Falola ISBN 9783030280987, 3030280985 Available Any Format
104 pages
Optimal Speed Control for PMSM Drives
No ratings yet
Optimal Speed Control for PMSM Drives
8 pages
Beneficiary Data for Vembakkam Block
No ratings yet
Beneficiary Data for Vembakkam Block
4 pages
Power Supply Design
No ratings yet
Power Supply Design
24 pages
GNU Emacs 30 Key Bindings Guide
No ratings yet
GNU Emacs 30 Key Bindings Guide
2 pages
Si47 1971 Frenic Lift Loader Im Web
No ratings yet
Si47 1971 Frenic Lift Loader Im Web
128 pages
Advancements and Challenges in The Use of Cold Mix Asphalt For Sustainable and Cost-Effective Pavement Solutions
No ratings yet
Advancements and Challenges in The Use of Cold Mix Asphalt For Sustainable and Cost-Effective Pavement Solutions
13 pages
BOSS 11T/107 Installation Manual
100% (1)
BOSS 11T/107 Installation Manual
54 pages
ECR Process for Inspection Plan Changes
No ratings yet
ECR Process for Inspection Plan Changes
11 pages

Measuring Information Retrieval Effectiveness

Uploaded by

Measuring Information Retrieval Effectiveness

Uploaded by

Retrieval Effectiveness

• Relevance judgment is usually:

• Retrieval of documents may result in:

• When is precision important? When is recall important?

P(rj ) = max P(r )

0.2 0.4 0.6 0.8 1.0

0.2 0.4 0.6 0.8 1.0

• Average precision favors systems which produce

• Value of  controls trade-off:

You might also like