ARTIFICIAL INTELLIGENCE
GRADE 10
Unit 6: Natural Language Processing
Multiple Choice Questions(MCQ)
1. Which feature of NLP helps in understanding the emotions of the people
mentioned with the feedback?
(a) Virtual assistants (b) Sentiment analysis
(c) Text classification (d) Automatic summarization
2. Which of the following is used for finding the frequency of words in some
given text sample?
(a) Stemming (b) Lemmatisation
(c) Bag of words (d) None of the above
3. Machine translation feature converts _____.
(a) One language to another
(b) Human language to machine language
(c) Any human language to Programming
(d) Machine language to human language
4. Which of the following comes under NLP?
(a) Chatbots (b) Price comparison websites
(c) Facial recognition (d) All of the above
5. Chatbots are AI systems which
(a) Interact with humans through text or speech
(b) Are able to offer round the clock responses and handle multiple queries
simultaneously
(c) Both (a) and (b)
(d) Neither (a) nor (b)
6. What do we call the process of dividing a string into component words?
(a) Regression (b) Word Tokenisation
(c) Classification (d) Clustering
7. Sentence segment is the _____ step for building the NLP model.
(a) First (b) Second (c) Third (d) Fourth
8. Which of these is not a stopword?
(a) This (b) Things (c) Is (d) Do
9. What is the lemma of the word “Making”?
(a) Mak (b) Make (c) Making (d) Maker
10. Which algorithms result in two things, a vocabulary of words and frequency
of the words in the corpus?
(a) Sentence segmentation (b) Tokenisation
(c) Bag of words (d) Text normalisation
11. Which of the following is the type of data used by NLP applications?
(a) Images (b) Numerical data (c) Graphical data (d) Text and Speech
12. Assertion (A): Stemming is a technique used to reduce an inflected word
down to its word stem.
Reason (R): For example, the words “programming,” “programmer,” and
“programs” can all be reduced down to the common word stem “program”.
(a) Both A and R are correct and R is the correct explanation of A
(b) Both A and R are correct but R is not the correct explanation of A
(c) A is correct but R is not correct
(d) A is not correct but R is correct 20.
13. Assertion (A): TF-IDF is a natural language processing (NLP) technique that’s
used to evaluate the importance of different words in a sentence.
Reason (R): It’s useful in text classification and for helping a machine learning
model read words.
(a) Both A and R are correct and R is the correct explanation of A
(b) Both A and R are correct but R is not the correct explanation of A
(c) A is correct but R is not correct
(d) A is not correct but R is correct
14. _____________ is the process of converting a word to its actual root form as
per the language.
a) Tokenization b) Stemming c) Lemmatization d) Segmentation
Short answer type questions:
1. What is the meaning of syntax and semantics in NLP?
Answer: Syntax refers to the grammatical structure of a sentence. Semanticss
refers to the meaning of the sentence.
2. What is the difference between stemming and lemmatization?
Answer: Stemming is a technique used to extract the base form of the words
by removing affixes from them. It is just like cutting down the branches of a
tree to its stems. For example, the stem of the words eating, eats, eaten is eat.
Lemmatization is the grouping together of different forms of the same word.
In search queries, lemmatization allows end users to query any version of a
base word and get relevant results.
3. What is a document vector table?
Answer: Document Vector Table is used while implementing Bag of Words
algorithm. In a document vector table, the header row contains the vocabulary
of the corpus and other rows correspond to different documents. If the
document contains a particular word it is represented by 1 and absence of
word is represented by 0 value.
4. What do you mean by corpus?
Answer: In Text Normalization, we undergo several steps to normalize the text
to a lower level. That is, we will be working on text from multiple documents
and the term used for the whole textual data from all the documents
altogether is known as corpus.
5. Differentiate between a script-bot and a smart-bot. (Any 2 differences)
Answer: Script-bot ∙ A scripted chatbot doesn’t carry even a glimpse of A.I ∙
Script bots are easy to make Smart-bot ∙ Smart bots are built on NLP and ML. ∙
Smart –bots are comparatively difficult to make.
6. What is inverse document frequency?
Answer: Document Frequency is the number of documents in which the word
occurs irrespective of how many times it has occurred in those documents. In
case of inverse document frequency, we need to put the document frequency
in the denominator while the total number of documents is the numerator. For
example, if the document frequency of a word “AMAN” is 2 in a particular
document then its inverse document frequency will be 3/2. (Here no. of
documents is 3).
7. What is the significance of converting the text into a common case?
Answer: In Text Normalization, we undergo several steps to normalize the text
to a lower level. After the removal of stop words, we convert the whole text
into a similar case, preferably lower case. This ensures that the case-sensitivity
of the machine does not consider same words as different just because of
different cases.
8. Mention some applications of Natural Language Processing.
Answer: Natural Language Processing Applications-
• Sentiment Analysis.
• Chatbots & Virtual Assistants.
• Text Classification.
• Text Extraction.
• Machine Translation
• Text Summarization
• Market Intelligence
• Auto-Correct
9. What are stop words? Explain with the help of examples.
Answer: “Stop words” are the most common words in a language like “the”,
“a”, “on”, “is”, “all”. These words do not carry important meaning and are
usually removed from texts. It is possible to remove stop words using Natural
Language Toolkit (NLTK), a suite of libraries and programs for symbolic and
statistical natural language processing.
10. Explain the concept of Bag of Words.
Answer: Bag of Words is a Natural Language Processing model which helps in
extracting features out of the text which can be helpful in machine learning
algorithms. In bag of words, we get the occurrences of each word and construct
the vocabulary for the corpus. Bag of Words just creates a set of vectors
containing the count of word occurrences in the document (reviews). Bag of
Words vectors are easy to interpret. The bag of words gives us two things: i) A
vocabulary of words for the corpus ii) The frequency of these words (number
of times it has occurred in the whole corpus).
11. Identify any 2 stop words in the given sentence:
Pollution is the introduction of contaminants into the natural environment that
cause adverse change. The three types of pollution are air pollution, water
pollution and land pollution.
Ans: Stopwords in the given sentence are: is, the, of, that, into, are, and
Long Answer Questions
1. What are the steps of Text Normalization? Explain them briefly.
Answer: Text Normalization: In Text Normalization, we undergo following steps
to normalize the text to a lower level.
I) Sentence Segmentation - Under sentence segmentation, the whole corpus is
divided into sentences. Each sentence is taken as a different data so now the
whole corpus gets reduced to sentences.
ii) Tokenisation - After segmenting the sentences, each sentence is then further
divided into tokens. Tokens is a term used for any word or number or special
character occurring in a sentence. Under tokenisation, every word, number
and special character is considered separately and each of them is now a
separate token.
iii) Removing Stop words, Special Characters and Numbers - In this step, the
tokens which are not necessary are removed from the token list.
iv) Converting text to a common case -After the stop words removal, we
convert the whole text into a similar case, preferably lower case. This ensures
that the case-sensitivity of the machine does not consider same words as
different just because of different cases.
v) Stemming - In this step, the remaining words are reduced to their root
words. In other words, stemming is the process in which the affixes of words
are removed and the words are converted to their base form.
vi) Lemmatization - In lemmatization, the word we get after affix removal (also
known as lemma) is a meaningful one. With this we have normalized our text
to tokens which are the simplest form of words present in the corpus. Now it
is time to convert the tokens into numbers. For this, we would use the Bag of
Words algorithm.
2. Through a step-by-step process, calculate TFIDF for the given corpus and
mention the word(s) having highest value.
Document 1: We are going to Mumbai
Document 2: Mumbai is a famous place.
Document 3: We are going to a famous place.
Document 4: I am famous in Mumbai.
Answer: Term frequency is the frequency of a word in one document. Term
frequency can easily be found from the document vector table as in that table
we mention the frequency of each word of vocabulary in each document.
3. What are the different applications of NLP which are used in real-life
scenario?
Ans – Some of the applications which is used in the real-life scenario are –
a. Automatic Summarization – Automatic summarization is useful for gathering
data from social media and other online sources, as well as for summarizing
the meaning of documents and other written materials. When utilized to give
a summary of a news story or blog post while eliminating redundancy from
different sources and enhancing the diversity of content acquired, automatic
summarizing is particularly pertinent.
b. Sentiment Analysis – In posts when emotion is not always directly
expressed, or even in the same post, the aim of sentiment analysis is to detect
sentiment. To better comprehend what internet users are saying about a
company’s goods and services, businesses employ natural language processing
tools like sentiment analysis.
c. Text Classification – Text classification enables you to classify a document
and organize it to make it easier to find the information you need or to carry
out certain tasks. Spam screening in email is one example of how text
categorization is used.
d. Virtual Assistants – These days, digital assistants like Google Assistant,
Cortana, Siri, and Alexa play a significant role in our lives. Not only can we
communicate with them, but they can also facilitate our life. They can assist us
in making notes about our responsibilities, making calls for us, sending
messages, and much more by having access to our data.