Q1. What is Information Retrieval (IR)?
A) Storing data in databases
B) Finding structured data only
C) Finding unstructured documents that satisfy an information need
D) Designing computer networks
Q2. Which of the following is NOT a key component of IR?
A) Indexing
B) Querying
C) Sorting
D) Matching and ranking
Q3. What is the main purpose of indexing?
A) To delete irrelevant documents
B) To organize data for efficient searching
C) To display results to the user
D) To translate queries
Q4. A query in IR is:
A) A database table
B) A user request for information
C) A ranked list
D) A document
Q5. Matching and ranking are used to:
A) Store documents
B) Create metadata
C) Order results by relevance
D) Compress files
Q6. In IR, documents are usually considered:
A) Highly structured
B) Completely numeric
C) Unstructured text
D) Encrypted files
Q7. What does the Bag of Words (BoW) model ignore?
A) Word frequency
B) Word order
C) Document length
D) Stop words
Q8. Boolean retrieval model is based on:
A) Probabilities
B) Neural networks
C) Logical operators (AND, OR, NOT)
D) Machine learning
Q9. Re-ordering words in a document:
A) Changes its main meaning
B) Destroys the topic
C) Does not affect the main idea
D) Makes it unreadable
Q10. Given the following term-document matrix:
Term / Doc D1 D2 D3
data 1 0 1
mining 1 1 0
retrieval 0 1 1
Which documents satisfy the query:
data AND mining?
Answer D1
Q11. Using the same matrix, which documents satisfy:
retrieval OR data?
Answer: D1, D2, D3
Q12. Which documents satisfy:
mining AND NOT data?
Answer: D2
Q13. Construct a term-document incidence matrix for these documents:
D1: "data mining techniques"
D2: "information retrieval systems"
D3: "data retrieval"
Term D1 D2 D3
data 1 0 1
mining 1 0 0
techniques 1 0 0
information 0 1 0
retrieval 0 1 1
systems 0 1 0
Q14. Explain the difference between:
True Positive and False Positive in IR.
True Positive: relevant document retrieved.
False Positive: unrelevant document retrieved.
Q15. Why is the Bag of Words model useful in information retrieval?
The Bag of Words model is useful because it simplifies text representation and allows fast
comparison between documents and queries by ignoring word order and focusing on word
presence.
Q16. Write a program that takes the following documents and builds a term-document
matrix.
Documents:
D1 = "data mining techniques"
D2 = "information retrieval systems"
D3 = "data retrieval"
Output should be a matrix like:
Term D1 D2 D3
Q17. Write a function that extracts all unique terms (vocabulary) from a list of
documents?
Q18 .Write a function that takes a query like:
"data AND retrieval" and returns the documents that satisfy it.
[Link] a program that checks if a document matches a query using AND
logic
[Link] results as: Relevant documents: D1, D3