Module 1: Introduction to NLP -
Questions and Answers
Q1. What is Natural Language Processing (NLP)?
Natural Language Processing (NLP) is a subfield of artificial intelligence and
linguistics that focuses on enabling computers to understand, interpret, and
generate human language. It combines computational linguistics with machine
learning and deep learning techniques to process language data.
Q2. Mention any two stages of NLP.
Two key stages of NLP are:
1. Morphological and lexical analysis: Processing of word structures.
2. Syntactic analysis: Analyzing grammatical structure.
Q3. What are ambiguities in NLP?
Ambiguities in NLP refer to situations where language elements can have multiple
interpretations. Common types include lexical, syntactic, and semantic ambiguities.
Q4. List any two applications of NLP.
Two applications of NLP are:
1. Machine translation (e.g., Google Translate)
2. Sentiment analysis (e.g., analysing public opinion on social media).
Q5. What is the role of grammar in NLP?
Grammar in NLP helps define the structural rules of a language, guiding parsing and
understanding of syntax to ensure meaningful language processing.
Q6. Explain the origin and history of Natural Language Processing (NLP).
The origin of NLP can be traced back to the 1950s, where the focus was on machine
translation between Russian and English during the Cold War. The Georgetown-IBM
experiment in 1954 was an early milestone. In the 1960s and 1970s, rule-based
systems were prominent, heavily influenced by Chomsky’s generative grammar. The
1980s saw statistical methods gain popularity with corpus-based studies, and in the
1990s, machine learning started transforming NLP. With the advent of deep learning
and neural networks in the 2010s, NLP capabilities expanded drastically, enabling
complex tasks like contextual translation, summarization, and conversational
agents. Modern NLP leverages transformers, such as BERT and GPT, to enhance
context understanding and language generation.
Q7. Discuss the various stages in NLP.
The main stages in NLP include:
1. **Lexical Analysis**: Breaking down sentences into tokens or words.
2. **Syntactic Analysis**: Checking for grammatical structure.
3. **Semantic Analysis**: Deriving meanings from words and sentences.
4. **Discourse Integration**: Understanding sentence-to-sentence meaning in a text.
5. **Pragmatic Analysis**: Interpreting language based on context. Each stage
ensures that language input is converted into structured data for machines to
process and respond meaningfully.
Q8. Describe the types of ambiguities in English and Indian regional languages.
Ambiguities in NLP include:
1. **Lexical Ambiguity**: A word has multiple meanings (e.g., 'bat' as an animal or
sports gear).
2. **Syntactic Ambiguity**: Sentence structure allows multiple interpretations.
3. **Semantic Ambiguity**: Meaning of sentence is unclear.
4. **Pragmatic Ambiguity**: Depends on situational context.
Indian regional languages add challenges due to their free word order, rich
morphology, and compound verbs, making parsing and translation more complex.
Q9. Explain the challenges faced in NLP.
NLP faces several challenges:
1. **Ambiguity** in language interpretation.
2. **Data sparsity** for low-resource languages.
3. **Multilinguality** and code-switching issues.
4. **Context understanding** and sarcasm detection.
5. **Morphological richness** in some languages. These make tasks like translation,
summarization, and information retrieval complex and demand robust models and
large annotated datasets.
Q10. Elaborate on the major applications of NLP in real-world scenarios.
Major applications of NLP include:
1. **Machine Translation**: Tools like Google Translate use NLP to translate between
languages.
2. **Text Summarization**: Condensing long texts into short, informative
summaries.
3. **Sentiment Analysis**: Analysing emotions from reviews or social media.
4. **Information Retrieval**: Search engines use NLP to return relevant results.
5. **Question Answering Systems**: AI assistants like Siri or Alexa interpret and
respond to queries. These applications improve automation, customer service, and
accessibility in digital platforms.
10 marks
[Link] the history and evolution of NLP, highlighting its key milestones and the
role of language and knowledge in language processing.
Answer: Natural Language Processing (NLP) began in the 1950s, driven by the
desire to enable machines to understand and generate human language. A key
milestone was the Georgetown-IBM experiment in 1954, which translated Russian
sentences into English, marking the birth of machine translation. In the 1960s,
systems like ELIZA, a rule-based chatbot, showcased early NLP capabilities by
simulating conversations. The 1980s saw a shift toward statistical methods, with the
introduction of probabilistic models for language processing. By the 2000s, machine
learning techniques, such as Hidden Markov Models, became prominent, followed by
the deep learning revolution in the 2010s, with models like transformers (e.g.,
BERT) achieving human-like performance in tasks like question
[Link] in NLP refers to the structured system of communication
(syntax, semantics, and pragmatics) that machines must process. Knowledge
encompasses world knowledge (facts, concepts) and linguistic knowledge (grammar,
vocabulary), which are essential for understanding context and meaning. For
example, in sentence processing, knowledge of grammar helps parse sentences,
while world knowledge helps disambiguate meanings (e.g., “bank” as a financial
institution vs. a riverbank). Together, language and knowledge enable machines to
perform complex tasks like translation, summarization, and dialogue generation,
making them integral to NLP’s evolution.
[Link] the stages of NLP in detail, providing examples for each stage and their
significance in language processing.
Answer: The five stages of NLP form a pipeline for processing human
language:Lexical Analysis: This stage involves breaking down text into words or
tokens. It includes tasks like tokenization (splitting "I am happy" into "I", "am",
"happy") and stemming (reducing "running" to "run"). It’s significant because it
prepares raw text for further [Link] Analysis: This stage parses the
grammatical structure of sentences. For example, in "The cat sleeps," it identifies
"The cat" as the subject and "sleeps" as the verb. It ensures the sentence is
grammatically correct and helps in understanding sentence [Link]
Analysis: This stage focuses on meaning. For example, in "I saw a bat," it determines
if "bat" refers to an animal or a sports tool based on context. It’s crucial for
understanding intent and resolving [Link] Analysis: This stage
examines the broader context across sentences. For example, in "John went to the
store. He bought milk," it links "He" to "John." It helps in understanding narratives or
[Link] Analysis: This stage interprets the intended meaning based
on context. For example, in "Can you pass the salt?" it recognizes the sentence as a
request, not a question about ability. It ensures practical understanding of language
use.
These stages are interconnected, enabling machines to process language
comprehensively, from raw text to meaningful interpretation.
[Link] the challenges of NLP, focusing on ambiguity and context understanding,
and suggest possible solutions to overcome these challenges.
Answer: NLP faces several challenges, with ambiguity and context understanding
being [Link]: Language is inherently ambiguous. Lexical ambiguity
occurs when words have multiple meanings (e.g., “bark” as a dog’s sound or tree
covering). Syntactic ambiguity arises in sentences like “I saw the man with the
telescope,” which could mean either the man had the telescope or the speaker used
it to see him. Semantic ambiguity occurs when the meaning is unclear, like in “She
broke the record” (a physical object or a milestone?).Context Understanding: NLP
struggles with context-dependent meanings. For instance, sarcasm (“Great, I love
being stuck in traffic!”) is hard to detect without tone or cultural knowledge.
Similarly, pronouns (e.g., “He left” after mentioning multiple people) require context
to resolve references.
Solutions:To tackle ambiguity, advanced models like transformers (e.g., BERT) use
context to disambiguate meanings by analyzing surrounding words. Word sense
disambiguation (WSD) techniques, which map words to their meanings using
knowledge bases like WordNet, also [Link] context understanding, incorporating
world knowledge (e.g., via knowledge graphs) and training models on diverse
datasets can improve performance. Additionally, multimodal approaches (combining
text with audio or visual data) can help detect sarcasm or emotions. Fine-tuning
models on domain-specific data (e.g., legal or medical texts) also enhances context
awareness.
While these solutions mitigate challenges, achieving human-level understanding
remains an ongoing research area in NLP.
[Link] the applications of NLP, with a detailed explanation of how sentiment
analysis and question-answering systems work, including their real-world uses.
Answer: NLP has diverse applications, including machine translation, text
summarization, sentiment analysis, information retrieval, and question-answering
[Link] Analysis: This application determines the emotional tone of text
(positive, negative, or neutral). It works by tokenizing text, assigning sentiment
scores to words (e.g., "happy" as positive, "sad" as negative), and using machine
learning models (e.g., LSTM or BERT) to classify the overall sentiment. For example,
in the sentence “I love this product,” the model identifies “love” as a positive word
and classifies the sentiment as positive. In real-world use, businesses use sentiment
analysis to analyze customer reviews on platforms like Amazon, helping them
understand user satisfaction and improve products. Social media monitoring also
uses it to gauge public opinion on events or [Link]-Answering Systems:
These systems provide direct answers to user queries. They work by understanding
the question’s intent, searching a knowledge base or corpus, and extracting relevant
information. For example, in “What is the capital of France?” the system identifies
“capital” as the key entity, searches for “France,” and retrieves “Paris.” Modern
systems like BERT-based models use attention mechanisms to focus on relevant
parts of the text. In real-world applications, question-answering powers virtual
assistants like Siri or Alexa, enabling users to get quick answers (e.g., “What’s the
weather today?”). It’s also used in customer support chatbots and educational tools
for automated tutoring.
Other applications like machine translation (e.g., Google Translate) enable
cross-lingual communication, while text summarization helps condense articles for
quick reading. These applications demonstrate NLP’s transformative impact across
industries.
[Link] the significance of grammar in language processing and its impact on NLP
tasks, with examples from English language processing.
Answer: Grammar is a foundational element in language processing, providing the
structural rules that govern how words combine to form sentences. In NLP, grammar
is crucial for tasks like syntactic analysis, which ensures that sentences are parsed
correctly. For example, in the English sentence “The dog barks,” grammar identifies
“The dog” as the noun phrase (subject) and “barks” as the verb phrase (action),
enabling the machine to understand the sentence’s structure. This parsing is
essential for applications like machine translation, where incorrect grammar can
lead to mistranslations (e.g., translating “She run fast” instead of “She runs fast” into
another language).Grammar also impacts tasks like text generation and
question-answering. In text generation, grammar ensures that the output is
coherent (e.g., maintaining subject-verb agreement in “They are happy” vs. “He is
happy”). In question-answering, grammar helps interpret questions like “Who is
running?” by identifying the subject and action, ensuring the system retrieves the
correct answer. Without proper grammar handling, NLP systems can produce
erroneous or unintelligible results. For instance, chatbots might misinterpret user
inputs if they fail to parse grammatical nuances, such as passive voice in “The ball
was kicked by John.” Thus, grammar is integral to achieving accurate and meaningful
language processing in NLP.
[Link] the challenges of NLP in Indian Regional Languages, focusing on linguistic
diversity and resource scarcity, and propose solutions to address these issues.
Answer: Indian Regional Languages (IRLs) present unique challenges for NLP due to
their linguistic diversity and resource scarcity. India has 22 official languages and
hundreds of dialects, each with distinct grammar, syntax, and vocabulary. For
example, Hindi (Devanagari script) and Tamil (Dravidian script) differ vastly in
morphology—Hindi uses postpositions (e.g., “ghar mein” for “in the house”), while
Tamil relies on agglutinative suffixes (e.g., “veettil” for the same). This diversity
complicates the development of universal NLP models, as algorithms trained on
English often fail to generalize to [Link] scarcity is another major challenge.
Unlike English, which has abundant datasets (e.g., Wikipedia, Common Crawl), IRLs
like Bengali or Kannada lack large-scale annotated corpora for training models. For
instance, POS tagging for Marathi is difficult due to limited tagged datasets, and
machine translation for Assamese suffers from a lack of parallel corpora.
Additionally, code-mixing (e.g., “I like khana very much” in Hinglish) and script
variations (e.g., Hindi written in Roman script) further complicate
[Link]:Develop shared resources through collaborative efforts, such as
creating multilingual corpora for IRLs (e.g., the AI4Bharat initiative).Use transfer
learning by fine-tuning models pretrained on high-resource languages like English
for low-resource [Link] unsupervised learning techniques, such as word
embeddings trained on raw text, to overcome the lack of labeled [Link]
community-driven efforts to crowdsource data, such as translating Wikipedia
articles into IRLs.
These strategies can help bridge the gap and improve NLP capabilities for Indian
languages.
[Link] the role of knowledge in NLP and how it integrates with language
processing to enhance applications like machine translation and information
retrieval.
Answer: Knowledge in NLP encompasses both linguistic knowledge (grammar,
syntax, semantics) and world knowledge (facts, cultural norms, common sense).
Linguistic knowledge helps machines parse and generate language, while world
knowledge enables understanding of context and meaning. For example, linguistic
knowledge allows a system to parse “The Eiffel Tower is tall” by recognizing the
subject (“Eiffel Tower”) and predicate (“is tall”), while world knowledge confirms
that the Eiffel Tower is a landmark in Paris, not a [Link] machine translation,
knowledge enhances accuracy. For instance, translating “I’m feeling blue” from
English to French requires world knowledge to recognize “blue” as an idiom for
sadness, not a color, resulting in “Je me sens triste” instead of a literal translation.
Linguistic knowledge ensures grammatical correctness in the target language, such
as adjusting for gender agreement in French nouns. Without integrating both types
of knowledge, translations can be nonsensical or culturally [Link]
information retrieval, knowledge improves relevance. For a query like “Who
invented the telephone?” the system uses world knowledge to identify Alexander
Graham Bell as the answer and linguistic knowledge to understand the question’s
structure. Knowledge graphs (e.g., linking “telephone” to “Bell” and “invention”)
further refine search results by connecting related concepts. For example, Google’s
Knowledge Graph uses such connections to provide quick answers. By integrating
knowledge with language processing, NLP systems achieve more accurate,
context-aware results in applications like translation and retrieval.
[Link] and contrast the applications of NLP in sentiment analysis and text
summarization, discussing their methodologies and practical significance in
real-world scenarios.
Answer: Sentiment analysis and text summarization are two key NLP applications
with distinct goals, methodologies, and real-world [Link] Analysis: This
application identifies the emotional tone of text (positive, negative, neutral). Its
methodology involves tokenizing text, assigning sentiment scores to words (e.g.,
“great” as +1, “awful” as -1), and using models like BERT to classify the overall
sentiment. For example, in “The movie was amazing,” the system detects “amazing”
as positive and labels the sentiment accordingly. Practically, sentiment analysis is
used by companies to analyze customer feedback on platforms like Twitter or Yelp,
helping them gauge brand perception. For instance, a company might discover
negative sentiment about a product launch and address it [Link]
Summarization: This application condenses a document into a shorter version while
retaining key ideas. It uses extractive methods (selecting important sentences) or
abstractive methods (generating new sentences). For example, summarizing a news
article might involve extracting sentences like “A new law was passed” or generating
a summary like “The government passed a new law.” Models like T5 or BART are
often used for abstractive summarization. In the real world, text summarization is
used in news apps to provide quick article summaries or in research to condense
academic papers for faster reading.
Comparison: Both applications process text, but sentiment analysis focuses on
emotion detection, while text summarization focuses on content reduction.
Sentiment analysis often uses classification models, whereas summarization may
use sequence-to-sequence models. Contrast: Sentiment analysis outputs a label (e.g.,
positive), while text summarization outputs a condensed text. Both are
significant—sentiment analysis for understanding opinions, and summarization for
managing information overload.