Natural Language Processing
What is NLP?
NLP stands for Natural Language Processing, which is a part of Computer Science, Human
language, and Artificial Intelligence. It is the technology that is used by machines to understand,
analyse, manipulate, and interpret human's languages. It helps developers to organize knowledge for
performing tasks such as translation, automatic summarization, Named Entity Recognition (NER),
speech recognition, relationship extraction, and topic segmentation.
Advantages of NLP
o NLP helps users to ask questions about any subject and get a direct response within seconds.
o NLP offers exact answers to the question means it does not offer unnecessary and unwanted
information.
o NLP helps computers to communicate with humans in their languages.
o It is very time efficient.
o Most of the companies use NLP to improve the efficiency of documentation processes,
accuracy of documentation, and identify the information from large databases.
Disadvantages of NLP
A list of disadvantages of NLP is given below:
o NLP may not show context.
o NLP is unpredictable
o NLP may require more keystrokes.
o NLP is unable to adapt to the new domain, and it has a limited function that's why NLP is
built for a single and specific task only.
Components of NLP
There are the following two components of NLP -
1. Natural Language Understanding (NLU)
Natural Language Understanding (NLU) helps the machine to understand and analyse human
language by extracting the metadata from content such as concepts, entities, keywords, emotion,
relations, and semantic roles.
NLU mainly used in Business applications to understand the customer's problem in both spoken and
written language.
NLU involves the following tasks -
o It is used to map the given input into useful representation.
o It is used to analyze different aspects of the language.
2. Natural Language Generation (NLG)
Natural Language Generation (NLG) acts as a translator that converts the computerized data into
natural language representation. It mainly involves Text planning, Sentence planning, and Text
Realization.
Difference between NLU and NLG
NLU NLG
NLU is the process of reading and NLG is the process of writing or generating
interpreting language. language.
It produces non-linguistic outputs from It produces constructing natural language
natural language inputs. outputs from non-linguistic inputs.
Applications of NLP
There are the following applications of NLP -
1. Question Answering
Question Answering focuses on building systems that automatically answer the questions asked by
humans in a natural language.
2. Spam Detection
Spam detection is used to detect unwanted e-mails getting to a user's inbox.
3. Sentiment Analysis
Sentiment Analysis is also known as opinion mining. It is used on the web to analyse the attitude,
behaviour, and emotional state of the sender. This application is implemented through a combination
of NLP (Natural Language Processing) and statistics by assigning the values to the text (positive,
negative, or natural), identify the mood of the context (happy, sad, angry, etc.)
4. Machine Translation
Machine translation is used to translate text or speech from one natural language to another natural
language.
5. Spelling correction
Microsoft Corporation provides word processor software like MS-word, PowerPoint for the spelling
correction.
6. Speech Recognition
Speech recognition is used for converting spoken words into text. It is used in applications, such as
mobile, home automation, video recovery, dictating to Microsoft Word, voice biometrics, voice user
interface, and so on.
7. Chatbot
Implementing the Chatbot is one of the important applications of NLP. It is used by many companies
to provide the customer's chat services.
8. Information extraction
Information extraction is one of the most important applications of NLP. It is used for extracting
structured information from unstructured or semi-structured machine-readable documents.
9. Natural Language Understanding (NLU)
It converts a large set of text into more formal representations such as first-order logic structures
that are easier for the computer programs to manipulate notations of the natural language
processing.
Phases of NLP
There are the following five phases of NLP:
1. Lexical Analysis and Morphological
The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream of
characters and converts it into meaningful lexemes. It divides the whole text into paragraphs,
sentences, and words.
2. Syntactic Analysis (Parsing)
Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship among
the words.
Example: Agra goes to the Poonam
In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is rejected by
the Syntactic analyzer.
3. Semantic Analysis
Semantic analysis is concerned with the meaning representation. It mainly focuses on the literal
meaning of words, phrases, and sentences.
4. Discourse Integration
Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning of
the sentences that follow it.
5. Pragmatic Analysis
Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by applying a
set of rules that characterize cooperative dialogues.
For Example: "Open the door" is interpreted as a request instead of an order.
What is a lexical Analysis?
Lexical analysis is the process of converting a sequence of characters in a source code file into a
sequence of tokens that can be more easily processed by a compiler or interpreter. It is often the first
phase of the compilation process and is followed by syntax analysis and semantic analysis.
• During lexical analysis, the source code is scanned character by character and grouped into
tokens based on the rules of the programming language. These tokens represent the basic
building blocks of the program’s syntax, such as keywords, identifiers, punctuation, and
constants.
• The lexical analyzer, also known as a lexer or tokenizer, is responsible for performing lexical
analysis.
• The output of the lexical analysis phase is a stream of tokens that can be more easily
processed by the syntax analyzer, which is responsible for checking the program for correct
syntax and structure.
• Lexical analysis is an important step in the compilation process because it ensures that the
source code is properly formatted and that the tokens it generates can be easily understood
and processed by the compiler or interpreter.
What is Syntax Analysis?
Syntax analysis, also known as parsing, is the process of analyzing a string of symbols, either in
natural language or in a computer language, according to the rules of formal grammar. It involves
checking whether a given input is correctly structured according to the syntax of the language.
• In natural language processing, syntax analysis is used to analyze and understand the
structure of sentences in a language. It involves identifying the parts of speech (nouns, verbs,
adjectives, etc.), determining the relationships between the words (such as subject-verb
agreement), and constructing a parse tree that represents the hierarchical structure of the
sentence.
• In computer science, syntax analysis is an important phase in the process of compiling a
program. It involves checking the source code of a program to ensure that it follows the
correct syntax of the programming language in which it is written.
• Syntax errors, such as missing brackets or incorrect use of keywords, are identified and
reported during this phase and must be corrected before the program can be successfully
compiled.
• Syntax analysis is a crucial step in understanding and interpreting the meaning of the text,
whether it is written in a natural language or a computer language.
Applications of Lexical Analysis:
• Compilers: Lexical analysis is an important part of the compilation process, as it converts the
source code of a program into a stream of tokens that can be more easily processed by the
compiler.
• Interpreters: Lexical analysis is also used in interpreters, which execute a program directly
from its source code without the need for compilation.
• Text editors: Many text editors use lexical analysis to highlight keywords and other elements
of the source code in different colors, making it easier for programmers to read and
understand the code.
• Code analysis tools: Lexical analysis is used by tools that analyze the source code of a
program for errors, security vulnerabilities, and other issues.
• Natural language processing: Lexical analysis is also used in natural language processing
(NLP) to break down natural language text into individual words and phrases that can be
more easily processed by NLP algorithms.
• Information retrieval: Lexical analysis is used in information retrieval systems, such as search
engines, to index and search for documents based on the words they contain.
Applications of Syntax Analysis:
• Natural language processing: Syntax analysis is used in natural language processing to
analyze and understand the structure of sentences in a language. It helps identify the parts
of speech, determine the relationships between the words, and construct a parse tree that
represents the hierarchical structure of the sentence.
• Information extraction: Syntax analysis can be used to extract structured information from
unstructured text, such as identifying names, dates, and locations in a news article or
extracting product details from an online shopping website.
• Machine translation: Syntax analysis is an important step in the process of machine
translation, as it helps to identify the structure and meaning of sentences in the source
language and translate them accurately into the target language.
• Computer science: In computer science, syntax analysis is an important phase in the process
of compiling a program. It checks the source code of a program to ensure that it follows the
correct syntax of the programming language in which it is written.
• Text analytics: Syntax analysis can be used in text analytics to extract insights and
information from large volumes of text data. For example, it can be used to identify common
themes or trends in customer reviews or to classify text documents based on their content.
What is Syntactic Processing?
Syntactic processing is the process of analyzing the grammatical structure of a sentence to
understand its meaning. This involves identifying the different parts of speech in a sentence, such
as nouns, verbs, adjectives, and adverbs, and how they relate to each other in order to give proper
meaning to the sentence.
Let’s start with an example to understand Syntactic Processing:
• New York is the capital of the United States of America.
• Is the United States of America the of New York capital.
If we observe closely, both sentences have the same set of words, but only the first one is
grammatically correct and which have proper meaning. If we approach both sentences with lexical
processing techniques, we can’t tell the difference between the two sentences.
Here, comes the role of syntactic processing techniques which can help to understand the
relationship between individual words in the sentence.
Difference between Lexical Processing and Syntactic Processing
Lexical processing aims at data cleaning and feature extraction, by using techniques such
as lemmatization, removing stopwords, correcting misspelled words, etc. However,
in syntactic processing, our aim is to understand the roles played by each of the words in the
sentence, and the relationship among words and to parse the grammatical structure of sentences to
understand the proper meaning of the sentence.
How Does Syntactic Processing Work?
To understand the working of syntactic processing, lets again start with an example.
For example, consider the sentence “The cat sat on the mat.” Syntactic processing would involve
identifying important components in the sentence such as “cat” as a noun, “sat” as a verb, “on” as
a preposition, and “mat” as a noun. It would also involve understanding that “cat” is the subject of
the sentence and “mat” is the object.
Syntactic processing involves a series of steps, including tokenization, part-of-speech tagging, parsing,
and semantic analysis.
Tokenization is the process of breaking up a sentence into individual words or tokens. Part-of-speech
(PoS) tagging involves identifying the part of speech of each token. Parsing is the process of
analyzing the grammatical structure of a sentence, including identifying the subject, verb, and object.
The semantic analysis involves understanding the meaning of the sentence in context.
There are several different techniques used in syntactic processing, including rule-based methods,
statistical methods, and machine learning algorithms. Each technique has its own strengths and
weaknesses, and the choice of technique depends on the specific task and the available data.
In the subsequent lessons, we will learn the concept of parsing and different parsing techniques, PoS
Tagging, and semantic analysis.
Why Is Syntactic Processing Important in NLP?
Syntactic processing is a crucial component of many NLP tasks, including machine translation,
sentiment analysis, and question-answering. Without accurate syntactic processing, it is difficult for
computers to understand the underlying meaning of human language.
Syntactic processing also plays an important role in text generation, such as in chatbots or
automated content creation. By understanding the grammatical structure of a sentence, computers
can generate more natural and fluent textual content.