0% found this document useful (0 votes)
4 views9 pages

Information Retrieval System MCQs and Concepts

The document contains multiple choice questions, fill-in-the-blanks, and matching exercises related to Information Retrieval Systems and indexing techniques. It covers topics such as precision, recall, search capabilities, stemming algorithms, and the differences between Information Retrieval Systems and Database Management Systems. The exercises are designed to test knowledge and understanding of key concepts in the field.

Uploaded by

Ayush Astiker
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views9 pages

Information Retrieval System MCQs and Concepts

The document contains multiple choice questions, fill-in-the-blanks, and matching exercises related to Information Retrieval Systems and indexing techniques. It covers topics such as precision, recall, search capabilities, stemming algorithms, and the differences between Information Retrieval Systems and Database Management Systems. The exercises are designed to test knowledge and understanding of key concepts in the field.

Uploaded by

Ayush Astiker
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Unit I: Set 1 - Objective Questions

Multiple Choice Questions (MCQs)

1. What is the primary objective of an Information Retrieval System from a user's perspective?

a) To store the maximum number of documents.


b) To provide the fastest possible query execution time.
c) To minimize the overhead for a user to find needed information.
d) To ensure 100% recall for every search.

2. If a search returns 100 items, of which 20 are relevant to the user's need, what is the
precision of the search?

a) 100%
b) 80%
c) 20%
d) Cannot be determined.

3. What is a key difference between an Information Retrieval System and a Database


Management System (DBMS)?

a) Only a DBMS can store numeric data.


b) IR systems are older than DBMS.
c) An IR system is optimized for "fuzzy" text, while a DBMS is optimized for "structured"
data.
d) IR systems cannot use Boolean logic.

4. The search “United States of America” is an example of which search capability?

a) Proximity Search
b) Fuzzy Search
c) Term Masking
d) Contiguous Word Phrase

5. Which browse capability presents search results in order of their potential relevance to the
user?

a) Zoning
b) Ranking
c) Highlighting
d) Vocabulary Browse

6. The process of refining a search to locate additional items of interest is called:

a) Canned Query
b) Iterative Search
c) Zoning
d) Selective Dissemination
7. The function that dynamically compares newly received items against standing user profiles
is known as:

a) Document Database Search


b) Automatic File Build
c) Selective Dissemination of Information (Mail)
d) Item Normalization

Fill in the Blanks

8. The two major measures commonly associated with the performance of information systems
are precision and recall.

9. The functional process of parsing an item into logical sub-divisions such as Title, Author, and
Abstract is called Zoning.

10. A Stop List is used to save system resources by eliminating words that have little value as a
searchable token, such as "the," "is," or "an."

11. The search capability that allows a user to locate spellings of words similar to the entered
term, compensating for errors, is known as a Fuzzy Search.

12. In an Information Retrieval System, the smallest complete unit that is processed and
manipulated by the system is referred to as an item or document.

Match the Following

13. Match the search capability with its correct description.

Search Capability Description

i) Expands a search term using a semantic hierarchy to include related


a) Proximity Search
concepts.

b) Term Masking ii) Restricts the distance allowed within an item between two search terms.

c) Thesaurus Expansion iii) Uses a wildcard character (e.g., comput*) to find all variants of a word.

Export to Sheets

Answer:

 a) - ii)

 b) - iii)

 c) - i)

Unit I: Set 2 - Objective Questions


Multiple Choice Questions (MCQs)

1. Which of the following systems is most focused on structured data, decision support
technologies, and "data mining"?

a) Information Retrieval System


b) Digital Library
c) Data Warehouse
d) Selective Dissemination of Information System

2. A major concern for Digital Libraries that is often ignored by Information Retrieval Systems
is:

a) The use of ranking to display results.


b) The legal aspects of copyright and intellectual property rights.
c) The ability to handle full-text documents.
d) The need for a search function.

3. The search capability that uses a query term like *computer to find terms such as
minicomputer is known as a:

a) Prefix Search
b) Suffix Search
c) Fuzzy Search
d) Proximity Search

4. Which browse capability indicates why an item was selected by visually marking the search
terms within the retrieved document?

a) Ranking
b) Zoning
c) Highlighting
d) Vocabulary Browse

5. A query that is saved by a user to be retrieved and executed during a later session is called a:
a) Boolean Query
b) Natural Language Query
c) Iterative Search
d) Canned Query

6. According to the text, which Boolean operator is implemented using set intersection to find
items containing all specified terms?

a) OR
b) NOT
c) AND
d) XOR

7. The functional view of an Information Retrieval System includes four major processes. Which
process is responsible for creating searchable data structures like processing tokens and their
characterizations?
a) Selective Dissemination of Information

b) Item Normalization

c) Document Database Search

d) Index Database Search

Fill in the Blanks

8. In the context of the textbook, the term "user" is defined as an end user who has minimal
knowledge of computers and technical fields.

9. Recall is a non-calculable metric in operational systems because the denominator, the


Number of Possible Relevant items in the database, is unknown.

10. The Boolean operator NOT is used to exclude items that contain a specific term.

11. The process of allowing a user to see the alphabetically sorted list of all unique words in the
database is called Vocabulary Browse.

12. The formula for Precision is the Number Retrieved Relevant divided by the Number Total
Retrieved.

Match the Following

13. Match the functional process from the system overview with its primary purpose.

Functional Process Primary Purpose

a) Document Database i) Allows users to create personal or public files that logically store items,
Search similar to a library card catalog.

b) Index Database ii) Provides retrospective search capability against all items ever received by
Search the system.

iii) Automatically processes incoming items to extract potential index data,


c) Automatic File Build
creating "Candidate Index Records."

Export to Sheets

Answer:

 a) - ii)

 b) - i)

 c) - iii)

Unit II: Set 1 - Objective Questions

Multiple Choice Questions (MCQs)


1. What was the primary goal of the MARC (MAchine Readable Cataloging) project initiated by
the Library of Congress?

a) To invent the first commercial search engine. b) To create a hardware text search processor. c)

To standardize the structure, content, and coding of bibliographic records for computerization.

d) To develop the first automatic stemming algorithm.

2. In the context of the indexing process, what does Exhaustivity refer to? a) The preciseness of
the index terms used.

b)

The extent to which the different concepts in an item are indexed.

c) The process of linking multiple index terms together.

d) The use of a controlled vocabulary for all index terms.

3. If an indexer has to decide whether to use the term "processor," "microcomputer," or


"Pentium," they are making a decision about the ______ of indexing. a) Exhaustivity

b)

Specificity

c) Linkage

d) Weighting

4. What is the main characteristic of a weighted indexing system? a) It considers every word in
the document to be an index term of equal value. b) It relies exclusively on manual indexing
by human experts. c)

It attempts to assign a value to an index term based on its importance in representing a concept in
the document.

d) It does not require a searchable data structure.

5. What is the primary modern goal of using stemming algorithms in an Information Retrieval
System? a) To significantly reduce secondary storage requirements.

b) To improve the precision of search results.

c)

To improve recall by mapping morphological variants to a single stem.

d) To correct spelling errors in the original documents.

6. Which stemming algorithm is based on a set of condition-and-action rules applied in steps,


such as removing "sses" and replacing it with "ss"? a) Successor Stemmer b) Dictionary Look-
up Stemmer c)

Porter Algorithm

d) N-Gram Stemmer
7. How does Information Extraction differ from general Automatic Indexing? a) Information
Extraction aims to understand the entire document, while indexing does not.

b)

Information Extraction focuses on pulling out specific, pre-defined types of information to


populate a database, rather than representing all concepts.

c) Information Extraction can only be performed by humans.

d) Automatic indexing never uses natural language processing, whereas extraction always does.

Fill in the Blanks

8. The term used to describe the mapping of multiple morphological variants to a single
representation (stem) is

conflation.

9. Postcoordination is the process of coordinating index terms at search time, for example, by
using the "AND" operator.

10. The most common data structure used in Information Retrieval Systems, which consists of a
dictionary and inversion lists, is the

inverted file structure.

11. According to Luhn's theory, the significance of a concept in an item is directly proportional to
the

frequency of the word associated with it.

12. The process of creating term linkages at the time of index creation is known as

precoordination.

Match the Following

13. Match the concept with its correct definition.

Concept Definition

i) A data structure that stores, for each term, a list of documents in which it
a) Stemming
appears.

b) Information ii) A process that reduces different forms of a word (e.g., "computing,"
Extraction "computer") to a common root.

iii) A process focused on extracting specific facts from text to fill slots in a
c) Inverted File
template or database.

Export to Sheets

Answer:
 a) - ii)

 b) - iii)

 c) - i)

Unit II: Set 2 - Objective Questions

Multiple Choice Questions (MCQs)

1. Which automatic indexing technique determines a canonical set of abstract concepts from a
text collection and uses them as a basis for indexing, also known as Latent Semantic
Indexing? a) Indexing by Term (Statistical) b) Natural Language Processing Indexing c)
Indexing by Concept d) Total Document Indexing
2. In the evaluation of Information Extraction systems (like in the MUC conferences), what does
the metric "Overgeneration" measure? a) The amount of correct data extracted vs. the
amount available. b) The accuracy of the extracted information. c) The amount of irrelevant
information that is extracted. d) The speed at which the system can process a single item.

3. The Kstem algorithm, used in the INQUERY system, is an example of which type of stemmer?
a) Affix Removal Stemmer b) Dictionary Look-up Stemmer c) Successor Stemmer d) N-Gram
Stemmer

4. Which stemming method is based on analyzing word and morpheme boundaries by


calculating the number of distinct letters that follow a word's prefix? a) Porter Algorithm b)
Paice/Husk Algorithm c) Successor Stemmer d) Kstem Algorithm

5. What is a primary advantage of automatic indexing when compared to manual indexing? a)


The ability to perform concept abstraction. b) The ability to judge the value of information. c)
Consistency and predictability in the index term selection process. d) Lower initial hardware
and software costs.

6. According to research on automatic document summarization, which type of feature has


been found to give better results for identifying important sentences? a) Location-based
heuristics (e.g., position in a paragraph). b) Frequency-based features (e.g., thematic word
counts). c) Sentence length. d) The number of uppercase words.

7. A potential negative consequence of applying stemming is a decrease in precision, because


words with different meanings may be conflated into the same stem. a) Recall b) Precision c)
Storage Space d) Processing Speed

Fill in the Blanks

8. In an inverted file system, the dictionary is the alphabetically sorted list of all unique
processing tokens, which points to the inversion lists.

9. The probabilistic approach to indexing and retrieval based on evidential reasoning is the
Bayesian model.

10. A key reason proper nouns and acronyms are challenging for stemmers is that they should
typically

not be stemmed.

11. The simple automatic indexing method where all words in an item are used as potential
index terms is called

total document indexing.

12. The series of evaluation conferences for Information Extraction sponsored by DARPA (ARPA)
are known as the

Message Understanding Conferences (MUC).

Match the Following


13. Match the stemming methodology with its description.

Stemming
Description
Methodology

i) Uses a dictionary to check if a stemmed root is a valid word, avoiding errors


a) Porter Algorithm
like stemming "factorial" to "factory."

b) Dictionary Look- ii) Analyzes prefixes to determine morpheme boundaries based on the number
up of unique characters that can follow.

c) Successor iii) An affix-removal stemmer that applies a set of ordered rules to remove
Stemmer common suffixes from words.

Export to Sheets

Answer:

 a) - iii)

 b) - i)

 c) - ii)

You might also like