#WEEK 01
Write a Python Program to perform following tasks on text
• Tokenization b) Stop word Removal
Aim : To implement tokenization and stop word removal on a given text using Python in order to
preprocess textual data for Natural Language Processing (NLP) applications.
Tools: Python, NLTK
Procedure:
1. Install NLTK using pip install nltk.
2. Write a Python script to tokenize text into words/sentences.
3. Implement stop word removal using the NLTK stopword corpus.
4. Execute the program and analyze the output.
#CODE
import nltk
from [Link] import stopwords
from [Link] import word_tokenize
[Link]('punkt_tab')
[Link]('stopwords')
text = "Natural Language Processing is a branch of Artificial Intelligence."
# Step 1: Tokenization
tokens = word_tokenize(text)
print("Tokens:")
print(tokens)
# Step 2: Stop word removal
stop_words = set([Link]('english'))
filtered_tokens = [word for word in tokens if [Link]() not in stop_words]
print("\nTokens after Stop Word Removal:")
print(filtered_tokens)
#OUTPUT
Tokens:
['Natural', 'Language', 'Processing', 'is', 'a', 'branch', 'of', 'Artificial', 'Intelligence', '.']
Tokens after Stop Word Removal:
['Natural', 'Language', 'Processing', 'branch', 'Artificial', 'Intelligence', '.']
#WEEK 02
Install NLTK tool kit and perform stemming
• Aim : To implement the Porter Stemming algorithm .
• Tools: Python, NLTK
#CODE
import re
class PorterStemmer:
def __init__(self):
[Link] = "aeiou"
[Link] = {
1: ["s", "es", "ed", "ing"],
2: ["ly", "er", "ment"],
3: ["iest", "ness", "ful", "ous"]
}
def is_vowel(self, ch):
return ch in [Link]
def step1(self, word):
if [Link]("sses"):
return word[:-2] # sses -> ss
if [Link]("ied") or [Link]("ies"):
return word[:-2] # ied/ies -> i
if [Link]("s") and not [Link]("ss"):
return word[:-1] # remove plural s
return word
def step2(self, word):
if [Link]("ing"):
return word[:-3]
if [Link]("ed"):
return word[:-2]
return word
def step3(self, word):
if [Link]("ness"):
return word[:-4]
if [Link]("ful"):
return word[:-3]
if [Link]("ous"):
return word[:-3]
return word
def stem(self, word):
word = [Link]()
word = self.step1(word)
word = self.step2(word)
word = self.step3(word)
return word
ps = PorterStemmer()
words = ["running", "happiness", "easily", "jumping", "fairly", "savings", "flies"]
for word in words:
print(f"Original: {word}, Stemmed: {[Link](word)}")
#OUTPUT
Original: running, Stemmed: runn
Original: happiness, Stemmed: happi
Original: easily, Stemmed: easily
Original: jumping, Stemmed: jump
Original: fairly, Stemmed: fairly
Original: savings, Stemmed: sav
Original: flies, Stemmed: fli
#WEEK 03
Write Python programs for:
• Word Analysis
• Word Generation
AIM: To write Python programs for:
a) Word Analysis – to analyze words in a given text
b) Word Generation – to generate words using linguistic rules and random methods
Tools: Python, NLTK
Procedure:
1. Write a function to analyze words based on their frequency and linguistic features.
2. Implement a function to generate words based on affix rules.
#CODE
import nltk
from collections import Counter
from [Link] import word_tokenize
[Link]('punkt_tab')
text = "This is a sample text for word frequency analysis. Analysis is important."
tokens = word_tokenize(text)
word_freq = Counter(tokens)
print("Word Frequencies:", word_freq)
#OUTPUT
Word Frequencies: Counter({'is': 2, '.': 2, 'This': 1, 'a': 1, 'sample': 1, 'text': 1, 'for': 1, 'word': 1,
'frequency': 1, 'analysis': 1, 'Analysis': 1, 'important': 1})
#CODE
# Word Generation using Prefix and Suffix
root_word = input("Enter a root word: ")
prefixes = ["un", "re", "pre"]
suffixes = ["ing", "ed", "ly"]
print("\nGenerated Words:")
for p in prefixes:
print(p + root_word)
for s in suffixes:
print(root_word + s)
#OUTPUT
Enter a root word: happy
Generated Words:
Unhappy
rehappy
prehappy
happying
happyed
happily
#WEEK 04
Create a sample list of at least 5 words with ambiguous senses and write a Python program to
implement WSD.
• Aim :To implement Word Sense Disambiguation (WSD) using the Lesk algorithm in
Python with the help of the NLTK WordNet corpus, and to identify the correct meaning of
an ambiguous word based on its context in a sentence.
Tools: Python, WordNet (NLTK)
Procedure
1. Import the required modules lesk from [Link] and wordnet from [Link].
2. Define a sentence containing an ambiguous word whose meaning depends on context.
3. Specify the target word to be disambiguated.
4. Split the sentence into individual words to form the context.
5. Apply the lesk() function by passing the context words and the target word.
6. Obtain the most appropriate WordNet synset returned by the Lesk algorithm.
7. Display the identified sense along with its definition.
#CODE
import nltk
from [Link] import lesk
from [Link] import wordnet
[Link]('wordnet')
sentences = [
("The bank will not be open until tomorrow.", "bank"),
("He hit the ball with a bat.", "bat"),
("The plant produces electricity.", "plant"),
("The crane lifted the heavy container.", "crane"),
("She bought a new mouse for her laptop.", "mouse")
]
for sentence, word in sentences:
sense = lesk([Link](), word)
print("Sentence:", sentence)
print("Ambiguous Word:", word)
print("Best Sense:", sense)
print("Definition:", [Link]() if sense else "No sense found")
print("-" * 60)
#OUTPUT
Sentence: The bank will not be open until tomorrow.
Ambiguous Word: bank
Best Sense: Synset('deposit.v.02')
Definition: put into a bank account
------------------------------------------------------------
Sentence: He hit the ball with a bat.
Ambiguous Word: bat
Best Sense: Synset('squash_racket.n.01')
Definition: a small racket with a long handle used for playing squash
------------------------------------------------------------
Sentence: The plant produces electricity.
Ambiguous Word: plant
Best Sense: Synset('plant.v.06')
Definition: put firmly in the mind
------------------------------------------------------------
Sentence: The crane lifted the heavy container.
Ambiguous Word: crane
Best Sense: Synset('grus.n.01')
Definition: a small constellation in the southern hemisphere near Phoenix
------------------------------------------------------------
Sentence: She bought a new mouse for her laptop.
Ambiguous Word: mouse
Best Sense: Synset('mouse.v.02')
Definition: manipulate the mouse of a computer
WEEK 05: Create Sample list of at least 10 words POS tagging and find the POS for any given
word
Aim : To create a sample list of words and perform Part of Speech (POS) tagging using the NLTK
toolkit in Python and to find the POS tag for any given word.
Tools: Python, WordNet (NLTK)
Procedure
1. Install the NLTK library using pip.
2. Import the required NLTK modules in Python.
3. Download necessary NLTK data packages such as tokenizer and POS tagger.
4. Create a sample list of at least 10 words.
5. Use the pos_tag() function to assign POS tags to each word in the list.
6. Display the words along with their corresponding POS tags.
7. Provide a given word and find its POS tag using the same POS tagging function.
8. Observe and verify the output.
#CODE
import nltk
[Link]('punkt_tab')
[Link]('averaged_perceptron_tagger_eng')
words = [
"running", "dog", "beautiful", "quickly", "eat",
"computer", "happy", "students", "write", "very"
]
pos_tags = nltk.pos_tag(words)
print("Word\t\tPOS Tag")
print("------------------------")
for word, tag in pos_tags:
print(f"{word}\t\t{tag}")
given_word = "running"
tag = nltk.pos_tag([given_word])
print(f"\nGiven Word: {given_word}")
print(f"POS Tag : {tag[0][1]}")
#OUTPUT
Word POS Tag
------------------------
running VBG
dog NN
beautiful JJ
quickly RB
eat VBP
computer NN
happy JJ
students NNS
write VBP
very RB
Given Word: running
POS Tag : VBG
#WEEK 06
Install NLTK tool kit and perform stemming
• Aim : To implement the Porter Stemming algorithm .
• Tools: Python, NLTK
#CODE
from [Link] import PorterStemmer
ps = PorterStemmer()
words = ["running", "flies", "jumps", "easily", "fairly"]
stemmed_words = [[Link](word) for word in words]
print("Stemmed Words:", stemmed_words)
#OUTPUT
Stemmed Words: ['run', 'fli', 'jump', 'easili', 'fairli']
#WEEK 07
1. Perform a morphological analysis using nltk library
Aim : To perform lemmatization using the WordNet Lemmatizer in Python and convert inflected
words into their base (dictionary) forms using appropriate Parts of Speech (POS)
Tools: Python, NLTK
Procedure:
1. Install and import NLTK library in Python.
2. Download required datasets such as:wordnet
3. Input the sentence for analysis.
4. Perform lemmatization using WordNet Lemmatizer with proper POS mapping.
5. Display the output lemmas.
#CODE
import nltk
from [Link] import WordNetLemmatizer
from [Link] import wordnet
[Link]('wordnet')
lemmatizer = WordNetLemmatizer()
words = ["better", "running", "wolves"]
print([Link]("better", pos="a"))
print([Link]("running", pos="v"))
print([Link]("wolves", pos="n"))
#OUTPUT
good
run
wolf
2. Generate n-grams using NLTK N-Grams library
#CODE
import nltk
from [Link] import ngrams
from [Link] import word_tokenize
# [Link]('punkt')
text = input("Enter the text: ")
n = int(input("Enter the value of n: "))
tokens = word_tokenize(text)
generated_ngrams = list(ngrams(tokens, n))
print(f"\n{n}-grams are:")
for gram in generated_ngrams:
print(gram)
#OUTPUT
Enter the text: Swarm intelligence is inspired by nature
Enter the value of n: 3
3-grams are:
('Swarm', 'intelligence', 'is')
('intelligence', 'is', 'inspired')
('is', 'inspired', 'by')
('inspired', 'by', 'nature')
3. Implement N-Grams Smoothing
import nltk
from [Link] import ngrams
from collections import Counter
corpus = [
"I love natural language processing",
"I love machine learning",
"natural language processing is fun"
]
tokens = []
for sentence in corpus:
[Link]([Link]().split())
vocab = set(tokens)
V = len(vocab)
bigrams = list(ngrams(tokens, 2))
bigram_counts = Counter(bigrams)
unigram_counts = Counter(tokens)
def bigram_probability(w1, w2):
bigram_count = bigram_counts[(w1, w2)]
unigram_count = unigram_counts[w1]
probability = (bigram_count + 1) / (unigram_count + V)
return probability
print("P(love | I) =", bigram_probability("i", "love"))
print("P(language | love) =", bigram_probability("love", "language"))
print("P(fun | machine) =", bigram_probability("machine", "fun"))
#OUTPUT
P(love | I) = 0.2727272727272727
P(language | love) = 0.09090909090909091
P(fun | machine) = 0.1
#WEEK 08
Using NLTK package to convert audio file to text and text file to audio files.
Aim: To implement text-to-audio and audio-to-text conversion using Python libraries pyttsx3 and
SpeechRecognition
Procedure
1. Install required Python libraries: pyttsx3 and SpeechRecognition.
2. Import the required modules in Python.
3. Create a function to convert text to speech using pyttsx3.
4. Create another function to convert speech to text using SpeechRecognition.
5. Use a microphone to capture speech input.
6. Run the program and observe the audio output and recognized text.
#CODE
pip install pyttsx3 SpeechRecognition pyaudio
import pyttsx3
import speech_recognition as sr
def text_to_audio(text):
engine = [Link]()
[Link](text)
[Link]()
def audio_to_text():
recognizer = [Link]()
with [Link]() as source:
print("Speak something...")
audio = [Link](source)
try:
return recognizer.recognize_google(audio)
except [Link]:
return "Sorry, I could not understand the audio."
except [Link]:
return "Network error."
text_to_audio("NLP is interesting.")
result = audio_to_text()
print("You said:", result)
#OUTPUT
The program speaks the sentence:
“NLP is interesting.”
If the microphone function is called, the program will also convert spoken words into text.