0% found this document useful (0 votes)
12 views18 pages

NLP Study Notes

The document provides a comprehensive overview of Natural Language Processing (NLP), detailing its definition, evolution, applications, and key concepts such as tokenization, stemming, and lemmatization. It outlines the NLP pipeline, emphasizing the importance of text preprocessing and various techniques for text representation. Additionally, it discusses the role of NLP in AI and its practical applications in areas like spam filtering, chatbots, and sentiment analysis.

Uploaded by

Rahul Singh
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views18 pages

NLP Study Notes

The document provides a comprehensive overview of Natural Language Processing (NLP), detailing its definition, evolution, applications, and key concepts such as tokenization, stemming, and lemmatization. It outlines the NLP pipeline, emphasizing the importance of text preprocessing and various techniques for text representation. Additionally, it discusses the role of NLP in AI and its practical applications in areas like spam filtering, chatbots, and sentiment analysis.

Uploaded by

Rahul Singh
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Natural Language Processing

Complete Study Notes with Examples


NLP_class_notes | All 28 Topics Covered
Simple Language • Clear Examples • Exam Ready
1. Introduction to NLP
Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that helps
computers understand, interpret, and generate human language — like English, Hindi, or any
other language.
Think of it this way: Computers only understand numbers (0s and 1s). Human language is full of
words, emotions, context, and meaning. NLP is the bridge between human language and
computers.

🗣 Real-life Example: When you talk to Siri or Google Assistant and say "Set an alarm for 7am," the
assistant understands your words and takes action. That's NLP in action!

Role of NLP in Artificial Intelligence


NLP makes AI systems smarter by giving them the ability to process text and speech. Without
NLP, AI would be like a person who can calculate but cannot read or understand a sentence.
• Enables machines to read and understand text
• Allows voice assistants to respond to commands
• Powers search engines to find relevant results
• Helps computers write human-like text

Computational Linguistics
Computational Linguistics is the scientific study of language using computers. It combines
computer science with linguistics (the study of language). It helps us understand grammar,
meaning, and structure of language through computer programs.

Evolution of NLP
• Era: Rule-based NLP (1950s)
Programmers manually wrote rules like: 'If the sentence has the word NOT, it is negative.'
Very rigid — small changes broke everything.
• Era: Statistical NLP (1990s)
Instead of rules, used math and statistics. Computers learned from large amounts of text
data. More flexible than rule-based systems.
• Era: Machine Learning NLP (2000s)
Computers started learning patterns automatically from examples. Less manual effort
required.
• Era: Neural Networks in NLP (2010s)
Inspired by the human brain. Networks of connected nodes processed language more
accurately. Deep learning took NLP to a new level.
• Era: Transformer Architecture (2018+)
A revolutionary design that changed everything. Models like BERT and GPT are based on
Transformers. These models understand context extremely well.

💡 Key Takeaway: Lower in the evolution = smarter and more powerful the system. Transformers are
the current state-of-the-art.

2. Applications of NLP
NLP is used everywhere in real life. Here are the key applications:
• Application: Spam Filtering
Your email system reads emails and decides if they are spam. It looks for patterns like 'Free
money!' or 'Urgent offer!'
• Application: Algorithmic Trading (News Analysis)
Computers read financial news and make stock buy/sell decisions automatically based on
sentiment.
• Application: Question Answering Systems
Systems like ChatGPT, Google Search answer your questions by understanding them.
Example: 'What is the capital of France?' → 'Paris'
• Application: Text Summarization
Automatically shortens long documents into brief summaries. News apps use this to give you
3-line summaries of articles.
• Application: Machine Translation
Google Translate converts text from one language to another. It must understand the
meaning, not just swap words.
• Application: Chatbots
Customer service bots on websites. They understand your problem and give automated
replies.
• Application: Speech Recognition
Converting spoken words to text. Used in dictation apps, voice search, and smart speakers.
• Application: Sentiment Analysis
Deciding if a review is positive, negative, or neutral. Used by companies to track customer
opinions on Twitter/Amazon.
• Application: Named Entity Recognition (NER)
Identifying names, places, dates, organizations in text. Example: In 'Elon Musk founded Tesla
in 2003', NER finds: Person=Elon Musk, Org=Tesla, Date=2003
3. NLP Pipeline / Workflow
The NLP Pipeline is the complete step-by-step process for building an NLP system. Think of it
like a factory assembly line — raw text goes in, useful insights come out.

🔧 Pipeline Steps: Raw Text → Preprocessing → Text Representation → Feature Extraction →


Model Training → Deployment → Evaluation → Improvement

• Step 1: Text Input & Data Collection


Gather raw text data. Example: Scraping tweets, collecting product reviews, importing news
articles.
• Step 2: Text Preprocessing
Clean and prepare the text. Remove noise, normalize words, tokenize. (Covered in detail in
Topic 4)
• Step 3: Text Representation
Convert text to numbers (vectors) so the computer can process it. Methods: Bag of Words,
TF-IDF, Word Embeddings.
• Step 4: Feature Extraction / Feature Engineering
Select the most important features (patterns) from the data. Example: Word frequency,
bigrams, POS tags.
• Step 5: Model Selection and Training
Choose an algorithm (like BERT, LSTM, Naive Bayes) and train it on your labeled data.
• Step 6: Model Deployment
Put the trained model into production so real users can use it.
• Step 7: Evaluation and Optimization
Measure model performance using metrics like accuracy, precision, recall, and perplexity.
• Step 8: Iteration and Improvement
Fix errors, retrain, and improve the model continuously.

4. Text Preprocessing
Text Preprocessing is cleaning and transforming raw text into a form that computers can work
with efficiently. Raw text is messy — it has typos, punctuation, symbols, irrelevant words, and
inconsistencies.

📝 Before Preprocessing: "Hey!! Check THIS out... The PRODUCT is AMAZING!!! 😍 #Love It
@brand"

✅ After Preprocessing: "check product amazing love"


Why is Preprocessing Important?
• Reduces noise that confuses the model
• Makes words consistent (e.g., 'Running' and 'running' treated the same)
• Reduces vocabulary size → faster and more efficient models
• Improves model accuracy significantly

Preprocessing Techniques
Lowercasing
Convert all text to lowercase so 'Apple', 'APPLE', 'apple' are treated as the same word.

Example: 'Hello World' → 'hello world'

Punctuation Removal
Remove commas, periods, exclamation marks, etc. They usually don't carry meaning.

Example: 'Hello, World!' → 'Hello World'

Special Character Removal


Remove @mentions, #hashtags, URLs, emojis unless they are relevant to your task.

Example: 'Visit [Link] @now!' → 'Visit now'

Tokenization (basic intro here, full detail in Topic 5)


Split text into smaller pieces. A paragraph becomes sentences. A sentence becomes words.

Example: 'I love NLP.' → ['I', 'love', 'NLP']

Stop Word Removal


Stop words are very common words that carry little meaning: 'the', 'is', 'a', 'and', 'of'. Removing
them reduces noise.

Example: 'The cat is on the mat' → ['cat', 'mat'] (after removing stop words)

Dimensionality Reduction
Reduce the number of features (words) to make the model simpler. Done via stemming,
lemmatization, or PCA techniques.
5. Tokenization
Tokenization is the process of breaking text into smaller units called tokens. A token can be a
word, sentence, subword, or even a character.

🔑 Simple Definition: Tokenization = Splitting text into pieces (tokens)

Types of Tokenization
Sentence Tokenization
Breaks a paragraph into individual sentences.

Example: Input: 'I love NLP. It is fascinating!' → Output: ['I love NLP.', 'It is fascinating!']

Word Tokenization
Breaks a sentence into individual words.

Example: Input: 'I love NLP' → Output: ['I', 'love', 'NLP']

Tokenization in Python (using NLTK)


NLTK (Natural Language Toolkit) is a popular Python library for NLP.

Python Code: from [Link] import word_tokenize, sent_tokenize text = 'Hello World. NLP is
fun.' words = word_tokenize(text) # ['Hello', 'World', '.', 'NLP', 'is', 'fun', '.'] sents = sent_tokenize(text)
# ['Hello World.', 'NLP is fun.']

💡 Key Takeaway: Tokenization is almost always the FIRST step in any NLP pipeline after text
collection.

6. Stemming
Stemming is the process of reducing a word to its root (stem) by cutting off prefixes or suffixes.
The resulting stem may not always be a real word.

📌 Examples: running → run | studies → studi | happiness → happi | played → play

Types of Stemmers
Porter Stemmer
The most popular stemmer. Applies a series of suffix-stripping rules. Fast and widely used.
Example: caresses → caress | flies → fli | agreed → agre

Lancaster Stemmer
More aggressive than Porter. Produces shorter stems but can be over-stemmed (too much
trimming).

Example: eating → eat | generously → gen

Snowball Stemmer
An improved version of Porter. Works for multiple languages. More accurate than Porter.

Pros and Cons of Stemming


• Pro: Fast — computationally cheap
• Pro: Simple to implement
• Con: Can produce non-real words (e.g., 'studies' → 'studi')
• Con: No understanding of meaning or context

7. Lemmatization
Lemmatization is also about reducing words to their root form, but it's smarter than stemming —
it uses a dictionary and grammar rules to ensure the result is always a valid real word.

📌 Examples: running → run | better → good | studies → study | geese → goose

How Lemmatization Works


It uses vocabulary databases (like WordNet) and considers the Part of Speech (verb, noun,
adjective) to find the correct base form.

POS-aware Example: 'better' as adjective → 'good' (stemming would give 'better' unchanged)

Tools for Lemmatization


• WordNet Lemmatizer (NLTK) — uses Princeton's WordNet database
• SpaCy Lemmatizer — faster, more modern, context-aware

Stemming vs Lemmatization Comparison


Feature Stemming Lemmatization
Speed Faster Slower

Accuracy Lower Higher

Valid words? Not always Always

Context aware? No Yes

Dictionary needed? No Yes

8. Regular Expressions (Regex)


A Regular Expression (Regex) is a sequence of characters that defines a search pattern. It's
like a very powerful 'Find & Replace' tool that can search for complex patterns in text.

🔍 Analogy: Think of regex as a super-powered search bar. Instead of searching for just 'phone', you
can search for any 10-digit number pattern.

Common Regex Syntax


Symbol Meaning Example

[abc] Character set — matches a, b, or c [aeiou] matches vowels

[a-z] Range — matches any lowercase [0-9] matches any digit


letter

^ Start of string / negation in set ^Hello matches 'Hello world'

| Alternation (OR) cat|dog matches 'cat' or 'dog'

? Optional (0 or 1 occurrence) colou?r matches 'color' and 'colour'

. Any single character c.t matches 'cat', 'cut', 'cot'

* Zero or more go* matches 'g', 'go', 'goo'

+ One or more go+ matches 'go', 'goo' (not 'g')

{n} Exactly n times \d{3} matches exactly 3 digits

$ End of string end$ matches 'the end'

Regex Functions in Python


• [Link](pattern, text) — Find all matches and return as list
• [Link](pattern, text) — Check if pattern matches at the START
• [Link](pattern, text) — Search for pattern ANYWHERE in text
• [Link](pattern, replacement, text) — Replace matches with new text
• [Link](pattern) — Pre-compile a pattern for reuse (faster)
Python Example: import re text = 'Call me at 9876543210 or 8765432109' numbers = [Link](r'\
d{10}', text) # Output: ['9876543210', '8765432109']

9. Part of Speech (POS) Tagging


POS Tagging is the process of labeling each word in a sentence with its grammatical role — is it
a noun, verb, adjective, etc.?

Example: Input: 'The quick brown fox jumps over the lazy dog' Output: The(DT) quick(JJ) brown(JJ)
fox(NN) jumps(VBZ) over(IN) the(DT) lazy(JJ) dog(NN)

Parts of Speech
• Noun (NN) — Person, place, thing: 'dog', 'city', 'book'
• Verb (VB) — Action or state: 'run', 'is', 'jumped'
• Adjective (JJ) — Describes a noun: 'quick', 'beautiful', 'old'
• Pronoun (PRP) — Replaces a noun: 'he', 'she', 'it', 'they'
• Adverb (RB) — Describes a verb/adjective: 'quickly', 'very', 'never'
• Preposition (IN) — Shows relationship: 'in', 'on', 'at', 'by'
• Conjunction (CC) — Connects clauses: 'and', 'but', 'or'
• Determiner (DT) — 'the', 'a', 'an', 'this'

Word Ambiguity Problem


One word can have multiple POS depending on context. POS taggers use surrounding context
to decide the correct tag.

Example: 'book' can be: Noun → 'I read a book' OR Verb → 'Book a ticket'

Application: Text-to-Speech
POS tags help determine pronunciation. 'record' as noun = REH-cord, as verb = re-CORD.

10. Text Representation


Computers only understand numbers. Text Representation converts text into numerical vectors
so machine learning models can process it.

🎯 Core Idea: Every word or document is represented as a list of numbers (a vector). Similar words
should have similar numbers.
Traditional Methods
• Bag of Words (BoW) — Count word frequencies
• TF-IDF — Weight words by importance
• N-Grams — Capture word sequences

Modern Embedding Methods


• Word2Vec — Learns word meanings from context
• GloVe — Global word co-occurrence vectors
• FastText — Works on word parts (good for rare words)
• ELMO — Context-dependent embeddings
• BERT — Bidirectional transformer embeddings
• GPT — Generative transformer model

11. Bag of Words (BoW)


Bag of Words is the simplest way to represent text numerically. It counts how many times each
word appears in a document and ignores word order.

Step-by-Step Example: Doc 1: 'cat sat on mat' Doc 2: 'cat sat on hat' Vocabulary: [cat, sat, on, mat,
hat] Doc 1 vector: [1, 1, 1, 1, 0] Doc 2 vector: [1, 1, 1, 0, 1]

Limitations of Bag of Words


• Ignores word order: 'dog bites man' = 'man bites dog' (wrong!)
• No understanding of meaning or context
• Creates very large, sparse vectors for large vocabularies

12. TF-IDF
TF-IDF stands for Term Frequency–Inverse Document Frequency. It's smarter than BoW
because it gives higher weight to important, rare words and lower weight to common words.

Term Frequency (TF)


How often a word appears in a single document.

Formula: TF(word) = (Number of times word appears in doc) / (Total words in doc)
Inverse Document Frequency (IDF)
How rare a word is across ALL documents. Rare words get a higher IDF score.

Formula: IDF(word) = log(Total documents / Documents containing the word)

TF-IDF Combined
Formula: TF-IDF = TF × IDF

Intuition: The word 'cricket' is common in sports articles but rare overall → high TF-IDF in sports
docs. The word 'the' appears everywhere → low TF-IDF.

💡 Key Takeaway: TF-IDF is great for search engines and document classification tasks.

13. N-Gram Models


N-Grams are sequences of N consecutive words. They capture word order and context that Bag
of Words misses.

Example Sentence: "I love natural language processing"

• Unigram (N=1): ['I', 'love', 'natural', 'language', 'processing']


• Bigram (N=2): ['I love', 'love natural', 'natural language', 'language processing']
• Trigram (N=3): ['I love natural', 'love natural language', 'natural language processing']

Applications of N-Grams
• Spelling correction — 'teh' surrounded by other words can be corrected to 'the'
• Speech recognition — predicting the next word improves accuracy
• Machine translation — maintaining phrase structure

14. Language Modeling


A Language Model assigns a probability to a sequence of words. In other words, it predicts:
'How likely is this sentence to appear in real language?'

🎯 Goal: Assign probability P(sentence) to any given sentence.

Example: P('I love NLP') should be HIGH (natural sentence) P('NLP love I') should be LOW
(unnatural order)
Joint Probability
The probability of an entire sentence is the joint probability of all words occurring together.

Formula: P(w1, w2, w3, ..., wn) = P(w1) × P(w2|w1) × P(w3|w1,w2) × ... × P(wn|w1...wn-1)

15. Chain Rule in Language Models


The Chain Rule breaks down the joint probability of a sentence into a product of conditional
probabilities. This makes it easier to compute.

Chain Rule Formula: P(A, B, C) = P(A) × P(B|A) × P(C|A,B)

Sentence Example: P('I love NLP') = P(I) × P(love|I) × P(NLP|I, love) Meaning: Probability of 'I' first,
then 'love' given 'I', then 'NLP' given 'I love'.

The problem: For long sentences, computing P(wn|w1...wn-1) requires knowing ALL previous
words — this is computationally expensive. This leads to the Markov Assumption.

16. Markov Assumption


The Markov Assumption is a simplification: instead of looking at ALL previous words to predict
the next word, we only look at the LAST word (or last few words).

Full Context (hard): P(wi | w1, w2, ..., wi-1) ← depends on ALL previous words

Markov Assumption (simple): P(wi | wi-1) ← depends ONLY on the previous word

Real-life Analogy: Predicting next word: 'I went to the ___' Markov only looks at 'the' → might predict
'store', 'park', 'gym' (It ignores 'I went to' context, but it's a good-enough approximation)

💡 Key Takeaway: The Markov Assumption makes language models computationally tractable. It's
the foundation of N-gram models.

17. Bigram Language Model


A Bigram Language Model applies the Markov Assumption with N=2. It predicts each word
based only on the immediately preceding word.

Bigram Probability: P(wi | wi-1) = Count(wi-1, wi) / Count(wi-1)

Worked Example: Corpus: 'I love NLP. I love Python. I study NLP.' P(NLP | love) = Count('love
NLP') / Count('love') = 1/2 = 0.5 P(Python | love) = Count('love Python') / Count('love') = 1/2 = 0.5

Sentence Probability Using Bigram


Example: P('I love NLP') = P(I) × P(love|I) × P(NLP|love) = 0.33 × 0.67 × 0.5 = 0.11

18. N-gram Limitations

Data Sparsity
For longer N-grams (trigrams, 4-grams), many combinations simply never appear in the training
data, giving them a probability of 0. This creates the 'zero probability problem'.

Problem: If 'blue suede shoes' never appears in training data, P = 0, even though it's a valid phrase.

Large Vocabulary Problems


As vocabulary grows, the number of possible N-grams explodes exponentially. A vocabulary of
50,000 words has 50,000² = 2.5 billion possible bigrams!

Long-Distance Dependencies
N-gram models fail to capture relationships between words that are far apart in a sentence.

Example: 'The trophy that the man who won the race picked up is shiny.' N-gram struggles to
connect 'trophy' with 'shiny' across 8 words.

19. Evaluation of Language Models


How do we measure if a language model is good or bad? There are two main approaches:

Intrinsic Evaluation
Measure performance directly on a held-out test dataset using mathematical metrics. No
external task needed.
• Most common metric: Perplexity (Topic 20)
• Measures how well the model predicts unseen text

Extrinsic Evaluation
Evaluate the model's performance on an actual downstream task.
Examples: Does using this language model improve the accuracy of a machine translation system?
Does it make a speech recognition system better?

💡 Key Takeaway: A model with lower perplexity doesn't always perform better on real tasks.
Extrinsic evaluation is the ultimate test.

20. Perplexity
Perplexity is the main metric for evaluating language models. It measures how 'confused' or
'surprised' a model is when it sees new text.

Simple Intuition: A model that easily predicts the next word has LOW perplexity. A model that is
often wrong has HIGH perplexity.

Perplexity Formula
Formula: PP(W) = P(w1, w2, ..., wN)^(-1/N) Where W is the test set and N is the number of words.

Interpretation
• Perplexity = 10 means the model is as confused as if choosing uniformly from 10 words
• Lower perplexity = better model
• A perplexity of 1 would mean the model perfectly predicts every word (impossible in
practice)

Exam Tip: LOWER PERPLEXITY = BETTER MODEL. Always remember this!

21. Entropy
Entropy comes from Information Theory (Claude Shannon, 1948). It measures the average
amount of information (or uncertainty) in a probability distribution.

Intuition: A coin flip has entropy = 1 bit (perfectly uncertain: 50/50). A biased coin (99% heads) has
entropy close to 0 (very predictable).

Shannon Entropy Formula


Formula: H(X) = -Σ P(x) × log₂ P(x) Sum over all possible outcomes x.
Example: Fair die (6 sides): H = -6 × (1/6 × log₂(1/6)) ≈ 2.58 bits

Relationship with Perplexity


Connection: PP = 2^H If entropy H is high (model is uncertain), perplexity is high too.

💡 Key Takeaway: Entropy and Perplexity are deeply connected. High entropy → high perplexity →
worse model.

22. Word Embeddings


Word Embeddings represent words as dense numerical vectors in a high-dimensional space
where similar words are close together. This is much smarter than Bag of Words!

Famous Example: King - Man + Woman ≈ Queen The math works because embeddings capture
semantic relationships!

Why Embeddings?
• BoW treats every word as independent — embeddings capture word relationships
• 'cat' and 'kitten' are close in embedding space, far away in BoW space
• Capture analogies: Paris:France :: Tokyo:Japan
• Dense vectors (50-300 numbers) vs sparse BoW vectors (thousands of zeros)

23. Types of Embeddings

Word Embeddings
Each word gets its own vector. The same word always has the same vector regardless of
context.
• Word2Vec — Predict words from context (or context from words)
• GloVe — Uses global co-occurrence statistics to learn vectors
• FastText — Breaks words into character n-grams; handles unknown words well

Sentence Embeddings
Represents a full sentence as a single vector. Captures the overall meaning of the sentence.

Use case: Finding similar sentences, semantic search, FAQ matching


Document Embeddings
Represents an entire document as a vector. Useful for comparing documents, clustering, or
classification.

Use case: Classifying news articles, finding duplicate documents

24. Word2Vec
Word2Vec is a neural network-based technique that learns word embeddings by training on a
large text corpus. It has two architectures:

CBOW (Continuous Bag of Words)


CBOW predicts the TARGET word given the surrounding CONTEXT words.

Example: Sentence: 'I love __ language processing' Context words: ['I', 'love', 'language',
'processing'] Task: Predict the missing word → 'natural'

• Faster to train
• Better for frequent words

Skip-Gram
Skip-Gram is the OPPOSITE of CBOW. It predicts the CONTEXT words given the TARGET
word.

Example: Target word: 'natural' Task: Predict context → ['I', 'love', 'language', 'processing']

• Slower to train but more accurate


• Better for rare words

Feature CBOW Skip-Gram

Direction Context → Target Target → Context

Speed Faster Slower

Rare words Less accurate More accurate

Best for Frequent words Rare/specialized words

25. Word2Vec Training Steps


Here is the complete training pipeline for Word2Vec:
• Prepare corpus — collect large amounts of text data
• Tokenization — split text into words
• Create context windows — for each word, define surrounding words as context (window
size = 2 or 5)
• Train CBOW or Skip-Gram neural network on (context, target) pairs
• Extract the weight matrix — each row is a word's embedding vector
• Compute word similarity — use cosine similarity between vectors

Cosine Similarity: sim('king', 'queen') = cos(vector_king, vector_queen) ≈ 0.85 (very similar)


sim('king', 'banana') ≈ 0.1 (very different)

26. Parsing
Parsing is the process of analyzing the grammatical structure of a sentence or piece of text to
understand its meaning and how components relate to each other.

Analogy: Parsing is like diagramming a sentence in grammar class — identifying the subject, verb,
object, and how they connect.

27. Types of Parsing

1. Syntactic Parsing
Analyzes the grammatical (syntax) structure of a sentence. It builds a parse tree showing
subject, verb, object relationships.

Example: Sentence: 'The cat sat on the mat' Parse tree: S → NP + VP NP → 'The cat' VP → 'sat' +
PP PP → 'on the mat' Subject = cat, Verb = sat, Location = mat

2. Semantic Parsing
Goes beyond grammar — it extracts the MEANING from a sentence. Used in question
answering and dialogue systems.

Example: Input: 'What will be the weather of Pilani tomorrow?' Semantic output: Intent: get_weather
Location: Pilani Date: tomorrow

3. Code Parsing
Converts programming code (Python, Java, etc.) into a machine-readable representation like
Abstract Syntax Trees (AST). Used in compilers and IDEs.
4. Data Parsing
Extracts and interprets structured data from formats like JSON, XML, and CSV.

Example: JSON: {"name": "Alice", "age": 25} Parsed: name = Alice, age = 25

28. Types of Semantic Parsing

Shallow Semantic Parsing


Also called Semantic Role Labeling (SRL). Identifies WHO did WHAT to WHOM, WHEN,
WHERE — without full sentence understanding.

Example: 'John gave Mary a book in the library yesterday.' Agent (who): John Action: gave
Recipient: Mary Object: book Location: library Time: yesterday

Deep Semantic Parsing


Creates a complete formal representation of meaning, often as a logical form or knowledge
graph. Full structured understanding of the sentence.

Example: Input: 'Every student likes some teacher.' Logical form: ∀x[student(x) → ∃y[teacher(y) ∧
likes(x,y)]]

Neural Semantic Parsing


Uses deep learning to do semantic parsing automatically from training examples. These models
learn the mapping from sentences to meaning representations.
• LSTM (Long Short-Term Memory) — processes sequences with memory
• Transformers — attention-based, state-of-the-art for NLP
• BERT — pre-trained bidirectional transformer, fine-tuned for parsing
• GPT — generative transformer, can produce structured outputs

💡 Key Takeaway: Neural semantic parsing is the current best approach. Models like GPT-4 and
BERT can parse complex sentences into structured representations with high accuracy.

End of NLP Study Notes • All 28 Topics Covered


NLP_class_notes | Good luck on your exams!

You might also like