0% found this document useful (0 votes)
12 views76 pages

Evaluating Information Retrieval Systems

This chapter focuses on the evaluation of Information Retrieval (IR) systems, emphasizing the importance of assessing relevance judgment and comparing performance measures like recall and precision. It discusses formative and summative evaluations, the significance of user-centered design, and the measurable quantities for evaluating IR systems. Additionally, it highlights the complexities of defining relevance and the subjective nature of user judgments in the context of information retrieval.

Uploaded by

bekeletamirat931
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views76 pages

Evaluating Information Retrieval Systems

This chapter focuses on the evaluation of Information Retrieval (IR) systems, emphasizing the importance of assessing relevance judgment and comparing performance measures like recall and precision. It discusses formative and summative evaluations, the significance of user-centered design, and the measurable quantities for evaluating IR systems. Additionally, it highlights the complexities of defining relevance and the subjective nature of user judgments in the context of information retrieval.

Uploaded by

bekeletamirat931
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Information Retrieval and Storage

Chapter Six
Evaluation of IR systems
Target Group –IT 3rd year students

Injibara, Ethiopia
Chapter Objectives

– At the end of this chapter, you will be able to


understand about:
 Evaluate IR systems

 Assess relevance judgment

 Compare performance measures (Recall,


Precision, etc.)
Evaluation of IR Systems

 Evaluation is the process of systematically collecting data that informs us


about what it is like for a particular user or group of users to use a
product/system for a particular task in a certain type of environment.

 Formative evaluation is done at different stages of development to check


that the product meets users’ needs.

o Part of the user-centered design approach

o Supports design decisions at various stages

o May test parts of the system or alternative designs


Evaluation Workflow

IR
Information
Query Retrieval
Need (IN)
Docs

Improve
Evaluation
Query?

IN satisfied
Evaluation of IR Systems Cont’d

 Summative evaluation assesses the quality of a finished


product/information.
May test the usability or the output quality
May compare competing systems
 Why we evaluate IR system?

 To provide the relevant information to the information users

 To retrieve the information quickly

 To use comparative techniques/methods to assess the system

 To identify relevant and non relevant information


Evaluation of IR Systems Cont’d

There are six main measurable quantities when evaluating an


information retrieval system:
[Link] coverage of the collection, that is, the extent to which the
system includes relevant matter
2. The time lag, that is, the average interval between the time the
search request is made and the time an answer is given
3. The form of presentation of the output
4. The effort involved on the part of the user in obtaining answers to
his search requests
5. The recall of the system, that is, the proportion of relevant material
actually retrieved in answer to a search request
Evaluation of IR Systems Cont’d

6. The precision of the system, that is, the proportion of retrieved material that

is actually relevant.

 Quantities (1)-(4) are readily assessed. Therefore, recall and precision,

because they are not readily assessed are known as the effectiveness of the

retrieval system.

 Therefore, effectiveness is used to mean a measure of the ability of the

system to retrieve relevant documents while at the same time holding back

non-relevant one. It is assumed that the more effective the system the more it

will satisfy the user.


Evaluation of IR Systems Cont’d

 In classic information retrieval, the performance of an Information Retrieval


system is evaluated by assessing recall and precision.

 The goal is to return both high-relevance and high-quality (in other word,
valuable) pages.

 The final question is what technique should be used for evaluation. It is


important to note that the technique of measuring retrieval effectiveness has
been largely influenced by the particular retrieval strategy adopted and the
form of its output.

 For example, when the output is a ranking of documents, an obvious


parameter such as rank position is immediately available for control.
How IR System can be Evaluated?

 Represent the user’s information problem (the query)

 Represent (surrogate) and organize (classify) the contents of

the knowledge resource

 Compare query to surrogates (predict relevance)

 Present results to the user for interaction/judgment


Relevance as Factor in Evaluation of Information Retrieval

 Relevance of the returned results indicates how appropriate the results are in
satisfying your information need

 Relevance is a relation between the person and the information object(s),


and is dependent upon user’s interpretation, so prediction of relevance (or
appropriateness) is inherently uncertain

 Relevance of the retrieved documents is a measure of the evaluation.

 To place information retrieval on a systematic basis, we need repeatable


criteria to evaluate how effective a system is in meeting the information
needs of the user of the system.
Relevance as a measure

 This proves to be very difficult with a human in the loop. It proves hard to
define:

 The task that the human is attempting

 The criteria to measure success


 Relevancy, from a human standpoint, is:
 Subjective: Depends upon a specific user’s judgment.

 Situational: Relates to user’s current needs.

 Cognitive: Depends on human perception and behavior.

 Dynamic: Changes over time.


Relevance as a measure Cont’d

Relevance is a subjective judgment and may include:

Being on the proper subject.


Being timely (recent information).
Being authoritative (from a trusted source).
Satisfying the goals of the user and his/her intended use
of the information (information need).
IR System Components

 Text Operations forms index words (tokens).

Stop word removal

Stemming
 Indexing constructs an inverted index of word to document pointers.

 Searching retrieves documents that contain a given query token from


the inverted index.

 Ranking scores all retrieved documents according to a relevance


metric.
IR System Components Cont’d

 User Interface: manages interaction with the user:

 Query input and document output.

 Relevance feedback.

 Visualization of results.

 Query Operations: transform the query to improve retrieval:

 Query expansion using a thesaurus.

 Query transformation using relevance feedback.


Precision and Recall

 Precision and recall measure the results of a single query using a specific
search system applied to a specific set of documents.

 Matching methods:

 Precision and recall are single numbers.

 Ranking methods:

 Precision and recall are functions of the rank order.

 If information retrieval were perfect ...

 Every document relevant to the original information need would be ranked


above every other document.

 With ranking, precision and recall are functions of the rank order.
Precision and Recall Cont’d

 Precision: the number of relevant documents retrieved by a search


divided by the total number of documents retrieved by that search.

• The ability to retrieve top-ranked documents that are mostly


relevant.

 Recall: the number of relevant documents retrieved by a search divided


by the total number of existing relevant documents.

• The ability of the search to find all of the relevant items in the corpus.
Measuring Recall and Precision

Relevant Not Collection size = A+B+C+D


relevant Relevant = A+C
Retrieved A B Retrieved = A+B

Not C D
retrieved

| {Relevant}  {Retrieved} |
Pr ecision 
| {Retrieved} | Relevant
Relevant +
Retrieved Retrieved

| {Relevant}  {Retrieved} |
Re call 
| {Relevant} |
Not Relevant + Not Retrieved

• When is precision important? When is recall important?


Fallout and Silence

 Noise = retrieved irrelevant docs / retrieved docs

 Silence/Miss = non-retrieved relevant docs / relevant docs


Noise = 1 – Precision;
Silence = 1 – Recall

| {Relevant}  {NotRetrieved} |
Miss 
| {Relevant} |
| {Retrieved}  {NotRelevant} |
Fallout 
| {NotRelevant} |
Specificity and Effectiveness

 Specificity
 The specificity is defined as fraction of document not retrieved over total
number of documents not retrieved (Dodges).

Specificity = C/(C+ D) Where D = document not relevant, not retrieved


C = total number of documents not retrieved

 Effectiveness
 Effectiveness is defined as recall plus specificity minus one

Ei = (Recall + Specificity)-1
Precision and Recall Cont’d

Returns relevant documents but


misses many useful ones too
The ideal

1
Precision

0 Recall Returns most relevant


documents but includes
lots of junk
Example 1

• Documents available: D1,D2,D3,D4,D5,D6,D7,D8,D9,D10


• Relevant: D1, D4, D5, D8, D10
• Query to search engine retrieves: D2, D4, D5, D6, D8, D9

relevant not relevant

retrieved

not retrieved
Example 1 Cont’d

• Documents available: D1,D2,D3,D4,D5,D6,D7,D8,D9,D10

• Relevant: D1, D4, D5, D8, D10

• Query to search engine retrieves: D2, D4, D5, D6, D8, D9

relevant not relevant

retrieved D4,D5,D8 D2,D6,D9

not retrieved D1,D10 D3,D7


Precision and Recall – Contingency Table
Retrieved Not retrieved

Relevant
w=3 x=2 Relevant = w+x= 5

Not relevant
y=3 z=2 Not Relevant = y+z = 5

Retrieved = w+y = 6 Not Retrieved = x+z = 4

Total documents N = w+x+y+z = 10

• Precision: P= w / w+y =3/6 =.5


• Recall: R = w / w+x = 3/5 =.6
Example 2

Let total number of relevant documents = 6, compute recall and


precision for each cut off point n:
n doc # relevant Recall Precision
1 588 x 0.167 1
2 589 x 0.333 1
3 576
4 590 x 0.5 0.75 Missing one
5 986 relevant
6 592 x 0.667 0.667 document.
7 984 Never reach
8 988 100% recall
9 578
10 985
11 103
12 591
13 772 x 0.833 0.38
14 990
Precision=1/1=1, 2/2=1, ¾=0.75 , 4/6=0.667, 5/13=0.38
Recall=1/6=0.167, 2/6=0.333, 3/6=0.5, 4/6=0.667, 5/6=0.833
Example 3: Recall and Precision with Exact Matching

 Collection of 10,000 documents, 50 on a specific topic. The ideal


search finds these 50 documents and reject all others. Actual
search identifies 25 documents; 20 are relevant but 5 were on
another topics.

 Precision: 20/ 25 = 0.8 (80% of hits were relevant)

 Recall: 20/50 = 0.4 (40% of relevant were found)


R-Precision

 Precision at the R-th position in the ranking of results for a query that
has R relevant documents.

n doc # relevant
1 588 x R = # of relevant docs = 6
2 589 x
3 576
4 590 x
5 986 R-Precision = 4/6 = 0.67
6 592 x
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x
14 990
F-Measure

One measure of performance that takes into account both


recall and precision.
Harmonic mean of recall and precision:
2 PR 2
F  1 1
P  R RP
P = precision
R = recall

Compared to arithmetic mean, both need to be high for


harmonic mean to be high.
F = [0,1]
F = 1; when all ranked documents are relevant
F = 0; no relevant documents have been retrieved
E-Measure

 A variant of F-measure that allows weighting emphasis on precision


over recall:

(1   2 ) PR (1   2 )
E  2 1
 PR
2

R P

Value of  controls trade-off:


 = 1: Equally weight precision and recall (E=F).

 > 1: Weight recall more.

 < 1: Weight precision more.


Mean Average Precision (MAP)
 Average precision at each retrieved relevant document

 Relevant documents not retrieved contribute zero to score

 Average Precision: Average of the precision values at the points at which each
relevant document is retrieved.

 Ex2: (1 + 1 + 0.75 + 0.667 + 0.38 + 0)/6 = 0.633

 Mean Average Precision: Average of the average precision value for a set of
queries.

 If a relevant document never gets retrieved, we assume the precision


corresponding to that relevant doc to be zero

 MAP assumes user is interested in finding many relevant documents for each
query
Example – Average Precision
Example – MAP
MAP (Mean Average Precision)
 Computing mean average for more than one query

MAP 
1
 
1
( 
j
) 
n Qi | Ri | r
D j Ri ij

– rij = rank of the j-th relevant document for Qi


– |Ri| = #rel. doc. for Qi
– n = # test queries
o E.g.
Relevant Docs. retrieved Query 1 Query 2

1st rel. doc. 1 4


2nd rel. doc. 5 8
3rd rel. doc. 10
1 1 1 2 3 1 1 2
MAP  [ (   )  (  )]
2 3 1 5 10 2 4 8
problems of precision and recall

Problems with both precision and recall:


 Number of irrelevant documents in the collection is not taken into
account.

 Recall is undefined when there is no relevant document in the


collection.

 Precision is undefined when no document is retrieved.

 Should average over large corpus/query ensembles

 Need human relevance judgments

 Heavily skewed by corpus/authorship


Summary

 In this chapter, you have learned about Evaluation of


IR systems, Relevance judgment and Performance
measures (Recall, Precision, etc.)
Information Retrieval and Storage

Chapter Seven
Query Languages and Query Operations
Target Group –IT 3rd year students

Injibara, Ethiopia
Chapter Objectives

– At the end of this chapter, you will Express about:


 Keyword-based queries

 Query Formulation

 Relevance feedback

 Query expansion
Keyword – based queries

 Queries are combinations of words. The document collection is searched


for documents that contain these words.

 Word queries are intuitive, easy to express and provide fast ranking.

 A word is a sequence of letters terminated by a separator (period, comma,


blank, etc).

 Definition of letter and separator is flexible; e.g., hyphen could be defined


as a separator. Usually, “trivial words”(such as “a”, “the”, or “of”) are
ignored.
Basic Queries

 Single-word queries:

 A query is a single word

 Simplest form of query.

 All documents that include this word are retrieved.

 Documents may be ranked by the frequency of this word in the


document.
Phrase Queries

 A query is a sequence of words treated as a single unit.

 Also called “literal string” or “exact phrase” query.


 Phrase is usually surrounded by quotation marks.
 All documents that include this phrase are retrieved.
 Usually, separators (commas, colons, etc.) and “trivial words” (e.g., “a”,
“the”, or “of”) in the phrase are ignored.
 In effect, this query is for a set of words that must appear in sequence.
 Allows users to specify a context and thus gain precision.
Example: “United States of America”.
Multiple-Word Queries

 A query is a set of words (or phrases).

 Two interpretations:
 A document is retrieved if it includes any of the query words.
 A document is retrieved if it includes each of the query words.
 Documents may be ranked by the number of query words they contain.
 A document containing n query words is ranked higher than a document containing m < n
query words.
Multiple-Word Queries Cont’d

 Documents containing all the query words are ranked at the top.
 Documents containing only one query word are ranked at bottom.
 Frequency counts may still be used to break ties among documents that
contain the same query words.

Example:

 The phrase "Venetian blind” finds documents that discuss Venetian blinds.
 The set(Venetian, blind) finds in addition documents that discuss blind
Venetians.
Proximity Queries

 Proximity queries: restrict the distance within a document between two search
terms.
 Important for large documents in which the two search words may appear in
different contexts.
 Proximity specifications limit the acceptable occurrences and hence increase the
precision of the search.
 General Format: Word1 within m units of Word2.
 Unit may be character, word, paragraph, etc.
Examples:
 united within 5 words of american: Finds documents that discuss “United Airlines
and American Airlines” but not “United States of America and the American dream”.
 Nuclear within 0 paragraphs of Science: Finds documents that discuss “Nuclear”
and “Science” in the same paragraph.
Boolean Queries

 Boolean queries: describe the information needed by relating multiple words


with Boolean operators.
 Operators: and, or, except
 except corresponds to and not
 Semantics: For each query word w a corresponding set Dw is constructed that
includes the documents that contain w.
 The Boolean expression is then interpreted as an expression on the
corresponding document sets with corresponding set operators:
 and indicates intersection
 or same as union
 except similar to NOT
Boolean Queries Cont’d

 The use of except prevents creation of very large answers: not B computes
all the documents that do not include B(complement), whereas A except B
limits the universe to the documents that include A.

 Precedence: except, and, or; use parentheses to override; process left-to-


right among operators with the same precedence.

 Examples:

[Link] or server except mainframe

 Select all documents that discuss computers, or document that discuss


servers but do not discuss mainframes.
Boolean Queries Cont’d

2. (computer or server) except mainframe


 Select all documents that discuss computers or servers, do not
select any documents that discuss mainframes.
3. computer except (server or mainframe)
 Select all documents that discuss computers, and do not discuss
either servers or mainframes.
 Classical Boolean systems do not rank documents:
 A document either satisfies the query (and is retrieved)
 or it does not satisfy the query (and is not retrieved).
 The Boolean formalism is not simple for users without
training in mathematics.
Weighted Multiple-Word Queries

 Each of the words is assigned a different weight, expressing the relative


importance of the word within the request. A query is then a set of word-
weight pairs: (k1, w1), …, (kn, wn).

 The ranking of a document is the sum of the weights for the query words
that it satisfies. Example:
 Query: (A, 0.8,), (B, 0.5), (C, 0.3) , Document 1: (A, B, D) , Document 2: (A, C, D)
 Ranking of Document 1: 0.8+0.5 = 1.3
 Ranking of Document 2: 0.8+0.3 = 1.1

 Each document includes two words from the query, but Document1 is ranked higher
because it includes more important words.
Weighted Boolean Queries

 Each word in a Boolean query is associated with a weight.

 A document with A and B satisfies this query better than a


document with A and C (without such weights, both documents
satisfy the query equally).

 Example: Two documents indexed by four terms:

 Document 1 = A 0.2, B 0.5, C 0.6, D 0

 Document 2 = A 0.7, B 0.4, C 0.1, D 0.8

 Query: (A and B) or (C and D)


Weighted Boolean Queries Cont’d

A AND B C AND D RELEVANCE

A AND B
OR
C AND D

DOCMENT 1 0.2 0 0.2

DOCUMENT 2 0.4 0.1 0.4


Weighted Queries With Similarity

 When interpreting queries, some models demote documents that


include keywords that were not requested.
 Example: Assume the vector model with the cosine measure and
the simple case that both documents and queries use binary values.
Consider these two documents and a query:
 d1 = (0, 1, 0, 1, 0), d2= (0, 1, 1, 1, 0), q= (0, 1, 0, 1, 0)
 sim(q, d1) = 1.0, sim(q, d2) = 0.82
 d2 is demoted because it includes an extra keyword not requested
by q.
Natural Language

 Using natural language for querying is very attractive.


 Example: “Find all the documents that discuss campaign finance
reforms, including documents that discuss violations of campaign
financing regulations.
 Do not include documents that discuss campaign contributions by
the gun and the tobacco industries”.
 Natural language queries are converted to a formal language for
processing against a set of documents.
 Such translation requires intelligence and is still a challenge
Natural Language Cont’d

 Pseudo NL processing: System scans the prose and extracts


recognized terms and Boolean connectors. The grammaticality of
the text is not important.
 Often used by WWW search engines.
 Problem: Recognize the negation in the search statement (“Do not
include...”).
 Compromise: Users enter natural language clauses connected
with Boolean operators.
 In the above example: “campaign finance reforms” or “violations
of campaign financing regulations" and not “campaign
contributions by the gun and the tobacco industries”.
Query Formulation

 Query Formulation is simply the process by which a user or searcher defines his/her
information needs.

 This means that in every query formulation technique there is a human in the loop.
From very simple or narrow queries to extremely complex or broad queries, there
must be a person to define the information need in the form of a query.

 Query formulation is an essential part of successful information search and retrieval.


It is typically based on search keys given by the user and is a major step in the
complex process of information search.

 Therefore, poses a huge challenge to users to formulate effective queries for their
web information search. This is more so, given that the web is used by a diverse
population varying in their levels of expertise.
Query Formulation Cont’d

Information search consists of four main steps:


 Problem identification,
 Need articulation,
 Query formulation, and
 Results evaluation.
The process of information search is affected by:
 Environment (e.g., the database and the search topic),
 User or searcher (e.g., online search experience),
 Search process (e.g., commands used), and
 Search outcome variables (e.g., precision and recall).
Factors Affecting Query Formulation

 There are three main factors affect query formulation:

 Media expertise

 Including familiarity with the search environment, search engine expertise,


computer expertise and expertise in information retrieval

 In media (web) expertise, the more experienced web user the searcher is, the
more likely he/she is to use a “straight to information” search style (or narrow
query) rather than a broader “navigating to information” style (or broad
query).
Factors Affecting Query Formulation Cont’d

Domain expertise

 Domain expertise presumably helps people in query formulation


by giving them a possibility to use either more terms in their
queries (synonyms), or possibly fewer, but more accurate terms.

 Thus, domain expertise is not directly expected to lead to longer


queries, but the quality of the selected terms is expected to be
high.
Factors Affecting Query Formulation Cont’d

 Type of search task


 Search task is divided into three broad categories: fact-finding,
exploratory, and comprehensive search tasks.
 Fact-finding: the source of information is not a key issue, but
precision of the result set is a key issue for efficient search.
 Exploratory search tasks: the searcher’s aim is to obtain a general
idea of the search topic or possibly to retrieve a couple of documents
as an example. So here, high precision of the result set is not
necessarily the most important thing.
 When the task is to find as many documents as possible on a given
topic, then Comprehensive search task is indicated. In this case, the
recall should be as high as possible for the search to be successful.
Relevance Feedback

 Relevance feedback is one of the techniques for improving retrieval

effectiveness.

 Relevance feedback is a feature of some information retrieval systems.

 The idea behind relevance feedback is to take the results that are initially

returned from a given query and to use information about whether or not

those results are relevant to perform a new query.

 We can usefully distinguish between three types of feedback: explicit

feedback, implicit feedback, and blind or "pseudo" feedback.


Relevance Feedback Cont’d

Explicit feedback
 Explicit feedback is obtained from assessors of relevance indicating
the relevance of a document retrieved for a query.
 This type of feedback is defined as explicit only when the assessors (or
other users of a system) know that the feedback provided is interpreted
as relevance judgments.
 Users may indicate relevance explicitly using a binary or graded
relevance system.
 Binary relevance feedback indicates that a document is either relevant
or irrelevant for a given query.
Relevance Feedback Cont’d

 Graded relevance feedback indicates the relevance of a document to a query

on a scale using numbers, letters, or descriptions (such as "not relevant",

"somewhat relevant", "relevant", or "very relevant").

 Graded relevance may also take the form of a cardinal ordering of documents

created by an assessor; that is, the assessor places documents of a result set in

order of (usually descending) relevance.

 An example of this would be the SearchWiki feature implemented by Google

on their search website.


Relevance Feedback Cont’d

Implicit feedback
 Implicit feedback is inferred from user behavior, such as noting which
documents they do and do not select for viewing, the duration of time spent
viewing a document, or page browsing or scrolling actions.
 The key differences of implicit relevance feedback from that of explicit
include :
 The user is not assessing relevance for the benefit of the IR system, but only
satisfying their own needs and
 The user is not necessarily informed that their behavior (selected documents)
will be used.
Relevance Feedback Cont’d

 An example of this is the Surf Canyon browser extension, which advances

search results from later pages of the result set based on both user interaction

(clicking an icon) and time spent viewing the page linked to in a search result.

Blind feedback

 Pseudo relevance feedback, also known as blind relevance feedback, provides

a method for automatic local analysis.

 It automates the manual part of relevance feedback, so that the user gets

improved retrieval performance without an extended interaction.


Relevance Feedback Cont’d

 The method is to do normal retrieval to find an initial set of most relevant

documents, to then assume that the top "k" ranked documents are relevant,

and finally to do relevance feedback as before under this assumption.

 Generally the three relevance feedbacks are:

 Explicit feedback: users explicitly mark relevant and irrelevant documents

 Implicit feedback: system attempts to infer user intentions based on

observable behavior

 Blind feedback: feedback in absence of any evidence, explicit or otherwise


Query Expansion

o In relevance feedback, users give additional input (relevant/non-relevant) on

documents, which is used to reweight terms in the documents

o In query expansion, users give additional input (good/bad search term) on

words or phrases

o Query expansion is the process of reformulating the original query to improve

retrieval performance in information retrieval operations.


Query Expansion Cont’d

 In the context of web search engines, query expansion involves evaluating a


user's input (what words or other types of data were typed into the search
query area) and expanding the search query to match additional documents.

 This is because the original query does not always give satisfactory results.

 Typical query expansion involves techniques such as:

 Finding synonyms of words, and searching for the synonyms as well

 Finding all the various morphological forms of words by stemming each


word in the search query
Query Expansion Cont’d

 Fixing spelling errors and automatically searching for the corrected form or
suggesting it in the results

 Re-weighting the terms in the original query

The purpose of query expansion is to make the query resemble


more closely the relevant documents and thus, to retrieve those
relevant documents.
Query expansion mean adding or deleting terms from the original
query or even changing terms. This can be done using
information from relevance feedback with relevant documents
identified manually by the user or by assuming the top-ranked
documents from an initial ranking are relevant.
Query Expansion Cont’d

The query formulation generally has three levels .

 Conceptual level

Linguistic and

String level

let us assume that the following is a query task or request:

Storage of radioactive waste produced in nuclear power plants,


examples of risks and accidents.
Query Expansion Cont’d

 At the conceptual level, about four facets and seven concepts can
be recognized from this request. Therefore a typical query “plan”
can be as follows:

nuclear power plants AND radioactive waste AND storage AND


(risk OR accident)

 At the string level, terms are replaced by search keys. The query is
expressed with syntax of the query language (the linguistic level). So
sample query is first formulated into the Boolean query structure
(depending on the information retrieval model used).

 With query expansion, synonyms of the terms are added to the query.
Summary

 In this chapter, you have learned about Keyword-based


queries, Pattern matching, Structural queries,
Relevance feedback, and Query expansion.
Information Retrieval and Storage
Chapter Eight
Current Issues in IR
Target Group –IT 3rd year students

Injibara, Ethiopia
Contents
Current Issues in IR
 Research in IR (Multimedia Retrieval,

 Web Retrieval,

 Question answering. etc.)


Current IR Research Trends

 Information Retrieval deals with uncertainty and vagueness in information


systems.
• Uncertain representations of the semantics of objects (text, images,…)
• Vague specifications of information needs (iterative querying)
 Focuses on web retrieval, question answering and multimedia retrieval ….
 ……focuses on models, methods and systems for information properties
and access methods:
 Media * Structure * Heterogeneity * Access methods
Current IR Research Trends Cont’d

Information Media Information structure


• Text • Unstructured
• Facts
• 2D: graphics, images • Semi-structured (XML)
• Speech • Fully structured
• Video
• Hyperlinked (Web)
• 3D
Current IR Research Trends Cont’d
Information Access
Heterogeneity Methods
• Language: multilingual • Ad-hoc retrieval: One time
queries (e.g. Web search)
• Media: multimedia
• Filtering/Routing: Constant
• Heterogeneous structures
search profile (e.g. Spam filtering)
• Heterogeneous services • Categorization/Clustering: Group
documents into predefined classes/
adaptive clusters
Current IR Research Trends Cont’d

 Global information access

 Satisfy human information needs through natural, efficient


interaction with an automated system that leverages
worldwide structured and unstructured data in any
language.”
 Media semantics

 Exploiting structure

 Heterogeneous structures and services


Current IR Research Trends Cont’d

 Contextual Retrieval
– Combine search technologies and knowledge about query and user
context into a single framework in order to provide the most
appropriate answer for a user’s information needs.

• Consideration of time, social and work context

• Major chance for improving IR quality

• Promises significant quality improvements

• Requires close cooperation between research and


industry

You might also like