0% found this document useful (0 votes)

14 views12 pages

NLP Lab Programs

The document contains several Python programs for various natural language processing tasks using the NLTK library. It includes implementations for tokenization, stop word removal, stemming, word analysis, word generation, word sense disambiguation, part-of-speech tagging, morphological analysis, n-grams generation, and bigram smoothing. Each program is accompanied by sample code and explanations for clarity.

Uploaded by

Vemula Naresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views12 pages

NLP Lab Programs

Uploaded by

Vemula Naresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

NLP Lab Programs

1. Write a Python Program to perform following tasks on text

a) Tokenization b) Stop word Removal

Program:

import nltk
from [Link] import word_tokenize
from [Link] import stopwords

# Download required NLTK data

[Link]('punkt')
[Link]('punkt_tab') # fix for latest nltk versions
[Link]('stopwords')

text = "Natural Language Processing helps computers understand human

language."

# Tokenization
tokens = word_tokenize(text)
print("Tokens:", tokens)

# Stopword Removal
stop_words = set([Link]('english'))
filtered = [w for w in tokens if [Link]() not in stop_words]

print("Filtered Tokens:", filtered)

2. Write a Python program to implement Porter stemmer algorithm for stemming

Program:
# ------------------------------------------------------------
# Porter Stemmer Implementation using NLTK
# ------------------------------------------------------------

import nltk
from [Link] import PorterStemmer
from [Link] import word_tokenize

# Download punkt (needed for tokenization)

[Link]('punkt')
[Link]('punkt_tab') # for latest versions

# Sample text
text = "The children are playing happily in the garden while their
parents are watching them."

# Tokenization
tokens = word_tokenize(text)

# Create Porter Stemmer object

ps = PorterStemmer()

# Apply stemming
stemmed_words = [[Link](word) for word in tokens]

print("Original Tokens:", tokens)

print("Stemmed Words:", stemmed_words)

3. Write Python Program for a) Word Analysis b) Word Generation

Program:
A.
# Simple Word Analysis Program

prefixes = ["un", "re", "in", "im", "dis"]

suffixes = ["ing", "ed", "s", "er", "ness", "ly"]

word = input("Enter a word: ")

found_prefix = ""
found_suffix = ""

# Check prefix
for p in prefixes:
if [Link](p):
found_prefix = p
break

# Check suffix
for s in suffixes:
if [Link](s):
found_suffix = s
break

# Find root
root = word
if found_prefix:
root = root[len(found_prefix):]
if found_suffix:
root = root[:-len(found_suffix)]

print("\n--- Word Analysis ---")

print("Prefix :", found_prefix if found_prefix else "None")
print("Root :", root)
print("Suffix :", found_suffix if found_suffix else "None")

B.
# Simple Word Generation Program

prefixes = ["un", "re", "dis"]

suffixes = ["ing", "ed", "s", "er"]

root = input("Enter root word: ")

generated_words = []

# Add prefixes
for p in prefixes:
generated_words.append(p + root)

# Add suffixes
for s in suffixes:
generated_words.append(root + s)

print("\n--- Generated Words ---")

for w in generated_words:
print(w)

4. Create a Sample list for at least 5 words with ambiguous sense and Write a Python program to
implement WSD

Program:
"""
WSD using NLTK + WordNet (Lesk-like overlap method).
- Requires: nltk
- Downloads WordNet data automatically if missing.
"""

import nltk
from [Link] import wordnet as wn
from [Link] import stopwords
from [Link] import PorterStemmer
from [Link] import word_tokenize
import string

'''# Ensure required NLTK data is available

nltk_packages = ["wordnet", "omw-1.4", "punkt", "stopwords"]
for pkg in nltk_packages:
try:
[Link](pkg)
except LookupError:
[Link](pkg)'''
[Link](‘wordnet’)
[Link]('punkt')
[Link]('punkt_tab') # fix for latest nltk versions
[Link]('stopwords')

STOP = set([Link]("english"))
PUNCT = set([Link])
STEMMER = PorterStemmer()

def normalize_tokens(text):
"""Tokenize, lowercase, remove stopwords/punctuation, and stem."""
tokens = word_tokenize([Link]())
clean = []
for t in tokens:
if t in PUNCT:
continue
if t in STOP:
continue
if len(t) < 2:
continue
[Link]([Link](t))
return clean

def synset_signature(syn):
"""
Build a signature (list of normalized tokens) for a synset.
Use: definition, examples, lemma names, and immediate hypernyms'
lemma names.
"""
sig = []
# definition
sig += normalize_tokens([Link]())
# examples
for ex in [Link]():
sig += normalize_tokens(ex)
# lemma names (split underscores)
for lemma in [Link]():
for part in [Link]().split("_"):
if part:
[Link]([Link]([Link]()))
# add hypernyms' lemma names to give some extra context
for hyper in [Link]():
for lemma in [Link]():
for part in [Link]().split("_"):
[Link]([Link]([Link]()))
return set(sig)

def lesk_wsd(target_word, sentence):

"""
Lesk-like WSD for target_word in sentence.
Returns best synset or None.
"""
# tokens from context (normalized)
context = normalize_tokens(sentence)
if not context:
return None

# Get candidate synsets for the target

candidates = [Link](target_word)
if not candidates:
return None

best_syn = None
best_score = -1

for syn in candidates:

signature = synset_signature(syn)
# overlap between context and signature
score = len(set(context) & signature)
# tie-breaker: prefer synset with higher frequency_key
(heuristic)
if score > best_score:
best_score = score
best_syn = syn

return best_syn, best_score

# --- Sample list of ambiguous words and demo sentences ---

SAMPLES = [
("bank", "I deposited my paycheck at the bank yesterday."),
("bank", "The canoe was pulled up on the muddy bank of the river."),
("bat", "A bat flew out of the cave at dusk."),
("bat", "He gripped the cricket bat and ran to the crease."),
("plant", "The power plant was shut down after the accident."),
("plant", "She put the new plant on the balcony and watered it."),
("lead", "Old pipes often contain lead which is harmful."),
("lead", "She will lead the project next month."),
("bass", "He likes to fish for bass in the lake."),
("bass", "Turn up the bass on that song; I love the low end."),
]

if __name__ == "__main__":
print("NLTK + WordNet Lesk-style WSD Demo\n" + "-"*36)
for word, sent in SAMPLES:
result = lesk_wsd(word, sent)
if result is None or result[0] is None:
print(f"Word: {word}\n Sentence: {sent}\n -> No sense
found.\n")
continue
syn, score = result
# present info
print(f"Word: {word}\n Sentence: {sent}")
print(f" -> Predicted synset: {[Link]()} (score={score})")
print(f" Definition : {[Link]()}")
print(f" Examples : {[Link]()}")
print(f" Lemmas : {', '.join(syn.lemma_names())}\n")
Easy:
import nltk
from [Link] import lesk
from [Link] import word_tokenize

# download once (uncomment if needed)

# [Link]("wordnet"); [Link]("punkt")

sentence = input("Enter sentence: ")

word = input("Enter ambiguous word: ")
sense = lesk(word_tokenize(sentence), word)
if sense:
print("Predicted sense:", [Link]())
print("Definition:", [Link]())
else:
print("No sense found.")
Best:
# Sample list of ambiguous words with multiple meanings
ambiguous_words = {
"bank": {
"sense1": "A financial institution that handles money and
provides financial services.",
"sense2": "The side of a river or a stream."
},
"bat": {
"sense1": "A flying mammal.",
"sense2": "A piece of equipment used in sports like baseball to
hit the ball."
},
"bark": {
"sense1": "The outer covering of a tree.",
"sense2": "The sound made by a dog."
},
"match": {
"sense1": "A competition or game.",
"sense2": "A device used to start a fire."
},
"bore": {
"sense1": "A person or thing that is dull and uninteresting.",
"sense2": "To make a hole in something using a tool."
}
}

# Function to perform Word Sense Disambiguation

def disambiguate_word(word, context):
context = [Link]()

if word not in ambiguous_words:

return "Word not found in ambiguous words list."

senses = ambiguous_words[word]

if "river" in context or "water" in context:

sense = senses["sense2"]

elif "money" in context or "financial" in context:

sense = senses["sense1"]
elif "dog" in context:
sense = senses["sense2"]

elif "sport" in context or "hit" in context or "ball" in context:

sense = senses["sense2"]

elif "dull" in context or "boring" in context:

sense = senses["sense1"]

elif "hole" in context or "drill" in context:

sense = senses["sense2"]

else:
return "Could not determine the sense of the word based on the
context."

return f"The word '{word}' in context '{context}' refers to: {sense}"

context1 = "The bank is located on the river."

context2 = "I need to go to the bank to withdraw some money."
context3 = "The dog barked loudly in the yard."
context4 = "He hit the ball with a bat during the game."
context5 = "The movie was so boring, I was about to fall asleep."
context6 = "We need to bore a hole into the wall."

print(disambiguate_word("bank", context1))
print(disambiguate_word("bank", context2))
print(disambiguate_word("bark", context3))
print(disambiguate_word("bat", context4))
print(disambiguate_word("bore", context5))
print(disambiguate_word("bore", context6))

5. Install NLTK tool kit and perform stemming

Program:
import nltk
from [Link] import PorterStemmer
from [Link] import word_tokenize
# Initialize the stemmer
stemmer = PorterStemmer()
# List of words to stem
words = ["running", "flies", "easily", "fairly", "crying", "happiness",
"playing"]
# Apply stemming and display results
for word in words:
print(f"Original: {word} --> Stemmed: {[Link](word)}")

6. Create Sample list of at least 10 words POS tagging and find the POS for any given word
Program;
# POS tagging without NLTK data (exam-safe)

pos_dictionary = {
"run": "VB",
"beautiful": "JJ",
"quickly": "RB",
"computer": "NN",
"play": "VB",
"jump": "VB",
"happy": "JJ",
"india": "NNP",
"walked": "VBD",
"singing": "VBG",
"teacher": "NN",
"dogs": "NNS"
}

# Sample words
words = [
"run", "beautiful", "quickly", "computer", "play",
"jump", "happy", "India", "walked", "singing",
"teacher", "dogs"
]

print("POS Tags for the Sample Words:")

for w in words:
tag = pos_dictionary.get([Link](), "NN")
print(f"{w} → {tag}")

# User input
word = input("\nEnter a word to find its POS: ").strip()
print(f"The POS tag for '{word}' is: {pos_dictionary.get([Link](),
'NN')}")

7. Write a Python program to

a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Grams library
c) Implement N-Grams Smoothing
a) Perform Morphological Analysis using NLTK library
# morph_analysis_simple.py
# Simple morphological analysis using NLTK
import nltk
from [Link] import word_tokenize
from nltk import pos_tag
from [Link] import PorterStemmer
from [Link] import WordNetLemmatizer
from [Link] import wordnet

# downloads (safe to call repeatedly)

[Link]('punkt', quiet=True)
[Link]('averaged_perceptron_tagger', quiet=True)
[Link]('wordnet', quiet=True)
[Link]('omw-1.4', quiet=True)

ps = PorterStemmer()
lemmatizer = WordNetLemmatizer()

def pos_to_wordnet_tag(tag):
if [Link]('J'):
return [Link]
if [Link]('V'):
return [Link]
if [Link]('N'):
return [Link]
if [Link]('R'):
return [Link]
return None

def analyze(text):
tokens = word_tokenize(text)
tagged = pos_tag(tokens)
results = []
for tok, tag in tagged:
wn_tag = pos_to_wordnet_tag(tag)
lemma = [Link](tok, wn_tag) if wn_tag else [Link](tok)
[Link]({
'token': tok,
'pos': tag,
'porter_stem': [Link](tok),
'lemma': lemma
})
return results

# demo
if __name__ == "__main__":
s = "The runners were running quickly toward the finishing line."
for item in analyze(s):
print(item)
b) ) Generate n-grams using NLTK N-Grams library
# generate_ngrams_simple.py
# Simple n-gram generation using NLTK

import nltk
from [Link] import word_tokenize
from [Link] import ngrams
from collections import Counter

[Link]('punkt', quiet=True)
def generate_ngrams(text, n=2, pad=False):
tokens = word_tokenize(text)
if pad:
tokens = ['<s>']*(n-1) + tokens + ['</s>']*(n-1)
grams = list(ngrams(tokens, n))
freq = Counter(grams)
return grams, freq

# demo
if __name__ == "__main__":
text = "I love natural language processing and I love coding."
for n in (1,2,3):
grams, freq = generate_ngrams(text, n=n, pad=True)
print(f"\nTop {n}-grams (first 10):", grams[:10])
print("Most common:", freq.most_common(5))
c) Implement N-Grams Smoothing
# bigram_addk_simple.py
# Simple bigram model with add-k smoothing

import nltk
from [Link] import word_tokenize
from collections import Counter, defaultdict

[Link]('punkt', quiet=True)

class BigramAddKSmoother:
def __init__(self, k=1.0):
self.k = k
self.unigram_counts = Counter()
self.bigram_counts = Counter()
[Link] = set()
self.total_unigrams = 0

def fit(self, texts):

for text in texts:
tokens = [[Link]() for t in word_tokenize(text)]
self.total_unigrams += len(tokens)
self.unigram_counts.update(tokens)
self.bigram_counts.update(zip(tokens, tokens[1:]))
[Link](tokens)

def prob(self, w1, w2):

""" P(w2 | w1) with add-k smoothing """
w1 = [Link](); w2 = [Link]()
V = len([Link])
count_bigram = self.bigram_counts.get((w1, w2), 0)
count_w1 = self.unigram_counts.get(w1, 0)
denom = count_w1 + self.k * V
return (count_bigram + self.k) / denom if denom > 0 else 0.0

# demo
if __name__ == "__main__":
corpus = [
"I love natural language processing",
"I love coding in Python",
"Natural language processing is fun",
"Python makes coding easy"
]
model = BigramAddKSmoother(k=1.0) # k=1 -> Laplace
[Link](corpus)

pairs = [("i","love"), ("love","natural"), ("natural","language"), ("python","makes"),

("unknown","word")]
for a,b in pairs:
print(f"P({b}|{a}) = {[Link](a,b):.4f}")

8. Using NLTK package to convert audio file to text and text file to audio files
A) Text to audio
Program:
from gtts import gTTS

# read text from file

with open("[Link]", "r") as f:
text = [Link]()

# convert to speech
tts = gTTS(text=text, lang='en')

[Link]("output.mp3")
print("Audio saved as output.mp3")

B. Audio to Text
Proram:
import speech_recognition as sr

r = [Link]()

with [Link]("[Link]") as source:

audio = [Link](source)

text = r.recognize_google(audio)
print("Text:", text)

Common questions

The process for implementing smoothing in n-gram models involves adjusting raw count probabilities to account for unseen n-grams, which improves predictive accuracy. The document describes an add-k smoothing technique where a non-negative constant 'k' is added to each n-gram count to avoid zero probabilities, with Laplace smoothing being a specific case where k=1. Smoothing is necessary to ensure the model does not assign zero probability to unseen combinations, enhancing its robustness in handling diverse text inputs .

The core tasks involved in text preprocessing using NLTK include tokenization, stop word removal, and stemming. Tokenization involves breaking down the text into individual words or tokens. Stop word removal involves filtering out commonly used words that do not carry significant meaning, like 'the' or 'is'. Stemming is the process of reducing words to their base or root form. These steps prepare the text data for further natural language processing and analysis .

Normalization of tokens contributes to effective word sense disambiguation by standardizing words to a common form, allowing for more accurate matching between the context and word signatures. Methodologies demonstrated include converting tokens to lowercase, removing stop words, eliminating punctuation, and applying stemming. This process reduces noise and enhances overlap detection between the context tokens and the synset signatures, facilitating more precise disambiguation .

N-grams play a critical role in language modeling by predicting the probability of a sequence of words, which is essential in tasks like speech recognition or predictive text. The document suggests implementing them using NLTK by tokenizing text and generating n-grams with padding to account for sentence boundaries. It also discusses calculating frequencies of n-grams and applying smoothing techniques to handle unseen n-grams and improve model robustness .

The Lesk algorithm for word sense disambiguation operates by comparing the dictionary definitions or 'signatures' of each possible sense of a word to its context within a sentence. The signature includes tokens from the definition, examples, and related synsets. By finding the sense whose signature has the highest overlap of normalized tokens with the context, the algorithm determines the most appropriate meaning of the word. This method, exemplified in the document, uses NLTK and WordNet to perform this disambiguation .

The document addresses enhancing word generation capabilities by demonstrating a Python program that adds prefixes and suffixes to a root word to create new word forms. This synthetic word generation mirrors morphological processes and is useful for creating derivative words that enrich language models and automated generation tasks. Practical implications include aiding in vocabulary expansion for educational purposes, generating test data for NLP models, and augmenting lexicons for natural language applications .

The document illustrates audio-text conversion in two ways: text-to-audio using the gTTS library and audio-to-text using the `speech_recognition` library with Google API. By reading text from a file and converting it to an mp3 using gTTS, and conversely, transforming a spoken wav file into text, it shows how computers can automate the transcription process. Potential applications include automated customer service, accessibility tools for visually impaired users, and efficient data entry .

The advantages of using POS tagging in processing text include better understanding of word functions, syntactic parsing, and enhancing information extraction processes. POS tags enable distinguishing between words that have multiple meanings (e.g., 'deal' as a verb vs. noun). The document provides a sample program that demonstrates POS tagging by mapping words to their respective tags, which helps systematically in NLP applications such as text-to-speech or grammar checking .

The significance of prefixes and suffixes in morphological analysis lies in their ability to alter the meaning or grammatical function of a root word. The document demonstrates their use through a Python program that analyzes a word to identify its prefix and suffix, thereby isolating the root. This analysis helps in understanding the morphology of the word, which is crucial for tasks like word sense disambiguation and lexical analysis .

The Porter Stemmer algorithm functions by applying several rules to words in order to strip them of their suffixes, reducing them to their stems or root forms. Its purpose in natural language processing is to normalize words so that different forms of a word can be treated equivalently, which is beneficial for text analysis tasks like information retrieval and text classification. The algorithm iteratively applies transformation rules and conditionally checks the word's structure to perform stemming effectively .

Lesk Algorithm for Word Disambiguation
No ratings yet
Lesk Algorithm for Word Disambiguation
5 pages
Python Text Processing and WSD Implementation
No ratings yet
Python Text Processing and WSD Implementation
13 pages
NLTK Text Processing and Analysis
No ratings yet
NLTK Text Processing and Analysis
17 pages
NLP 2
No ratings yet
NLP 2
8 pages
NLP Lab Manual2 NLP Lab NLP Lab Manual2
No ratings yet
NLP Lab Manual2 NLP Lab NLP Lab Manual2
15 pages
How to Install and Use NLTK in Python
No ratings yet
How to Install and Use NLTK in Python
15 pages
Python NLP Techniques: Tokenization & Stemming
No ratings yet
Python NLP Techniques: Tokenization & Stemming
17 pages
WSD Tool: Lesk & ML Approaches
No ratings yet
WSD Tool: Lesk & ML Approaches
5 pages
NLTK Tokenization and Stop Words Guide
No ratings yet
NLTK Tokenization and Stop Words Guide
32 pages
NLP Experiments in Google Colab
No ratings yet
NLP Experiments in Google Colab
9 pages
NLP Lab
No ratings yet
NLP Lab
11 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
9 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
14 pages
NLP Codes
No ratings yet
NLP Codes
11 pages
Python NLP: Word Analysis & Generation
No ratings yet
Python NLP: Word Analysis & Generation
4 pages
Word Similarity and NLP Techniques
No ratings yet
Word Similarity and NLP Techniques
14 pages
NLTK Tokenization and Stop Word Removal
No ratings yet
NLTK Tokenization and Stop Word Removal
17 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
18 pages
NLTK Word Analysis and Generation Techniques
No ratings yet
NLTK Word Analysis and Generation Techniques
22 pages
NLP Text Processing Techniques
No ratings yet
NLP Text Processing Techniques
10 pages
Antonyms in NLTK WordNet Usage
No ratings yet
Antonyms in NLTK WordNet Usage
42 pages
NLP (PRC)
No ratings yet
NLP (PRC)
9 pages
NLP Tokenization and Processing Techniques
No ratings yet
NLP Tokenization and Processing Techniques
16 pages
NLTK Text Processing Techniques
No ratings yet
NLTK Text Processing Techniques
28 pages
NLP Lab Manual: Tokenization to Audio Processing
No ratings yet
NLP Lab Manual: Tokenization to Audio Processing
8 pages
Word Sense Disambiguation with NLTK
No ratings yet
Word Sense Disambiguation with NLTK
2 pages
NLP Examples
No ratings yet
NLP Examples
3 pages
NLP Tokenization, Stemming, Lemmatization Guide
No ratings yet
NLP Tokenization, Stemming, Lemmatization Guide
29 pages
NLTK N-Gram Smoothing Techniques
No ratings yet
NLTK N-Gram Smoothing Techniques
23 pages
R22 NLP Python-Programs Upto 7
No ratings yet
R22 NLP Python-Programs Upto 7
25 pages
NLP Tokenization and Text Processing Guide
No ratings yet
NLP Tokenization and Text Processing Guide
6 pages
NLTK Data Preprocessing in Python
No ratings yet
NLTK Data Preprocessing in Python
12 pages
NLP Lab Manual for Tokenization and Stemming
No ratings yet
NLP Lab Manual for Tokenization and Stemming
45 pages
NLP Practical Journal with Python Code
No ratings yet
NLP Practical Journal with Python Code
17 pages
NLP Text Preprocessing with NLTK
No ratings yet
NLP Text Preprocessing with NLTK
27 pages
NLP Laboratory Manual for CSE AIML
No ratings yet
NLP Laboratory Manual for CSE AIML
27 pages
NLP Techniques: Stemming, Lemmatization, POS
No ratings yet
NLP Techniques: Stemming, Lemmatization, POS
15 pages
VTU NLP Lab Manual Programs
No ratings yet
VTU NLP Lab Manual Programs
15 pages
Lesk Algorithm for Word Disambiguation
No ratings yet
Lesk Algorithm for Word Disambiguation
4 pages
NLP Techniques with Machine Learning
No ratings yet
NLP Techniques with Machine Learning
20 pages
Document Preprocessing in Text Analytics
No ratings yet
Document Preprocessing in Text Analytics
4 pages
NLP Programs
No ratings yet
NLP Programs
25 pages
Install NLTK and Perform Stemming
No ratings yet
Install NLTK and Perform Stemming
18 pages
NLP Practicals 0 52843100 1771207592
No ratings yet
NLP Practicals 0 52843100 1771207592
12 pages
NLP Print END
No ratings yet
NLP Print END
13 pages
Lab 11
No ratings yet
Lab 11
12 pages
NLP Techniques: Tokenization & Stemming
No ratings yet
NLP Techniques: Tokenization & Stemming
8 pages
Week 9
No ratings yet
Week 9
4 pages
Data Preprocessing with NLTK in Python
No ratings yet
Data Preprocessing with NLTK in Python
10 pages
NLP Practical Exercises Overview
No ratings yet
NLP Practical Exercises Overview
16 pages
Python NLTK Parsing Techniques
No ratings yet
Python NLTK Parsing Techniques
5 pages
Healthcare Data Analysis Overview
No ratings yet
Healthcare Data Analysis Overview
5 pages
Force-Distance Relationship in Machines
No ratings yet
Force-Distance Relationship in Machines
132 pages
Anumati Purchase Transactions Summary
No ratings yet
Anumati Purchase Transactions Summary
9 pages
NLP Lab Manual for JNTUH R22
100% (3)
NLP Lab Manual for JNTUH R22
20 pages
Software Engineering Course Notes
No ratings yet
Software Engineering Course Notes
102 pages
BDA Lab Manual-2
No ratings yet
BDA Lab Manual-2
61 pages
Operating Systems Lab Manual for CSE
No ratings yet
Operating Systems Lab Manual for CSE
51 pages
Introduction to Web Scripting and HTML
No ratings yet
Introduction to Web Scripting and HTML
79 pages
Java Lab Manual for ETE 4141
No ratings yet
Java Lab Manual for ETE 4141
16 pages
7th Grade Hobbies Worksheet 2025-2026
No ratings yet
7th Grade Hobbies Worksheet 2025-2026
4 pages
Verb To Be: Usage and Examples
No ratings yet
Verb To Be: Usage and Examples
21 pages
English Phonetic Symbols Guide
No ratings yet
English Phonetic Symbols Guide
14 pages
Understanding Semantic Role Labeling with CRF
No ratings yet
Understanding Semantic Role Labeling with CRF
33 pages
Easy 11 Plus: Live Lesson Details
No ratings yet
Easy 11 Plus: Live Lesson Details
15 pages
Ukranian Cont HSC Notes 2008
No ratings yet
Ukranian Cont HSC Notes 2008
17 pages
Weekly Vocab 1
No ratings yet
Weekly Vocab 1
6 pages
Japan's Evolving English Curriculum
No ratings yet
Japan's Evolving English Curriculum
11 pages
Reading Test: Sentence Completion
No ratings yet
Reading Test: Sentence Completion
12 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
2 pages
Understanding Reported Speech Rules
No ratings yet
Understanding Reported Speech Rules
42 pages
Class-Pre Nursery August Planner 2025-26
No ratings yet
Class-Pre Nursery August Planner 2025-26
5 pages
Worksheet 1
No ratings yet
Worksheet 1
9 pages
IELTS Exam 2024: Fees and Tips
100% (1)
IELTS Exam 2024: Fees and Tips
2 pages
BPSC 70th Prelims Exam Instructions
No ratings yet
BPSC 70th Prelims Exam Instructions
18 pages
Class 1 English Sample Paper Term 2
No ratings yet
Class 1 English Sample Paper Term 2
3 pages
Rise of Mumpreneurs: Balancing Business and Family
No ratings yet
Rise of Mumpreneurs: Balancing Business and Family
7 pages
Kannada NER with Deep Learning Models
No ratings yet
Kannada NER with Deep Learning Models
22 pages
Final Translation Guide for Class 10
No ratings yet
Final Translation Guide for Class 10
31 pages
Understanding Academic English Vocabulary
No ratings yet
Understanding Academic English Vocabulary
4 pages
Team Writing Process Guide
No ratings yet
Team Writing Process Guide
1 page
8th Form Mid-Term English Test 2
100% (1)
8th Form Mid-Term English Test 2
3 pages
Present Perfect Tense Exercises
No ratings yet
Present Perfect Tense Exercises
15 pages
Argumentative Essay on Sports Facilities
No ratings yet
Argumentative Essay on Sports Facilities
5 pages
Verb Tenses Review Game
No ratings yet
Verb Tenses Review Game
1 page
Impact of Drugs on Health and Society
No ratings yet
Impact of Drugs on Health and Society
12 pages
Writing a Linguistics Paper Guide
No ratings yet
Writing a Linguistics Paper Guide
22 pages
Understanding Passive Voice in English
No ratings yet
Understanding Passive Voice in English
15 pages
Learn English With Podcasts Learn English With Podcasts: Audio Transcript
No ratings yet
Learn English With Podcasts Learn English With Podcasts: Audio Transcript
4 pages
Pluralsight Author Audition Guide
No ratings yet
Pluralsight Author Audition Guide
43 pages

NLP Lab Programs

Uploaded by

NLP Lab Programs

Uploaded by

NLP Lab Programs

1. Write a Python Program to perform following tasks on text

a) Tokenization b) Stop word Removal

# Download required NLTK data

text = "Natural Language Processing helps computers understand human

print("Filtered Tokens:", filtered)

2. Write a Python program to implement Porter stemmer algorithm for stemming

# Download punkt (needed for tokenization)

# Create Porter Stemmer object

print("Original Tokens:", tokens)

3. Write Python Program for a) Word Analysis b) Word Generation

prefixes = ["un", "re", "in", "im", "dis"]

word = input("Enter a word: ")

print("\n--- Word Analysis ---")

prefixes = ["un", "re", "dis"]

root = input("Enter root word: ")

print("\n--- Generated Words ---")

'''# Ensure required NLTK data is available

def lesk_wsd(target_word, sentence):

# Get candidate synsets for the target

for syn in candidates:

return best_syn, best_score

# --- Sample list of ambiguous words and demo sentences ---

# download once (uncomment if needed)

sentence = input("Enter sentence: ")

# Function to perform Word Sense Disambiguation

if word not in ambiguous_words:

if "river" in context or "water" in context:

elif "money" in context or "financial" in context:

elif "sport" in context or "hit" in context or "ball" in context:

elif "dull" in context or "boring" in context:

elif "hole" in context or "drill" in context:

return f"The word '{word}' in context '{context}' refers to: {sense}"

context1 = "The bank is located on the river."

5. Install NLTK tool kit and perform stemming

print("POS Tags for the Sample Words:")

7. Write a Python program to

# downloads (safe to call repeatedly)

def fit(self, texts):

def prob(self, w1, w2):

pairs = [("i","love"), ("love","natural"), ("natural","language"), ("python","makes"),

# read text from file

with [Link]("[Link]") as source:

Common questions

What is the process for implementing smoothing in n-gram models, and why is it necessary?

What is the process for implementing smoothing in n-gram models, and why is it necessary?

What are the core tasks involved in text preprocessing using NLTK as outlined in the given document?

What are the core tasks involved in text preprocessing using NLTK as outlined in the given document?

How does normalization of tokens contribute to effective word sense disambiguation and what methodologies are used in the provided examples?

How does normalization of tokens contribute to effective word sense disambiguation and what methodologies are used in the provided examples?

What role do n-grams play in language modeling, and how does the document suggest implementing them using NLTK?

What role do n-grams play in language modeling, and how does the document suggest implementing them using NLTK?

How does the Lesk algorithm for word sense disambiguation operate according to the program examples in the document?

How does the Lesk algorithm for word sense disambiguation operate according to the program examples in the document?

In what ways does the document address enhancing word generation capabilities through Python, and what are its practical implications?

In what ways does the document address enhancing word generation capabilities through Python, and what are its practical implications?

How does the document demonstrate audio-text conversion using Python, and what are the potential applications of this process?

How does the document demonstrate audio-text conversion using Python, and what are the potential applications of this process?

What are the advantages of using POS tagging in processing text, and how is it addressed in the document?

What are the advantages of using POS tagging in processing text, and how is it addressed in the document?

What is the significance of prefixes and suffixes in morphological analysis, and how does the document demonstrate their use?

What is the significance of prefixes and suffixes in morphological analysis, and how does the document demonstrate their use?

How does the Porter Stemmer algorithm function, and what is its purpose in natural language processing as demonstrated in the document?

How does the Porter Stemmer algorithm function, and what is its purpose in natural language processing as demonstrated in the document?

You might also like