Course Name: Generative AI
Course Code: BAIL657C
VI Semester 2022 Scheme
Program 1: Explore pre-trained word vectors. Explore word
relationships using vector arithmetic. Perform arithmetic
operations and analyze results.
Source code:
import [Link] as api
import numpy as np
from [Link] import norm
import ssl
import [Link] as api
print("Loading pre-trained word vectors (this may take a few minutes)...")
ssl._create_default_https_context = ssl._create_unverified_context
word_vectors = [Link]("word2vec-google-news-300")
def explore_word_relationships(word1, word2, word3):
try:
vec1 = word_vectors[word1]
vec2 = word_vectors[word2]
vec3 = word_vectors[word3]
result_vector = vec1 - vec2 + vec3
similar_words = word_vectors.similar_by_vector(result_vector, topn=10)
input_words = {word1, word2, word3}
filtered_words = [
(word, similarity) for word, similarity in similar_words
if word not in input_words
]
print(f"\n{word1} - {word2} + {word3}")
print("Top 5 results:")
for word, similarity in filtered_words[:5]:
print(f"{word}: {similarity:.4f}")
except KeyError as e:
print(f"Word not found: {e}")
def analyze_similarity(word1, word2):
try:
similarity = word_vectors.similarity(word1, word2)
print(f"\nSimilarity between '{word1}' and '{word2}': {similarity:.4f}")
except KeyError as e:
print(f"Word not found: {e}")
def find_similar_words(word):
try:
print(f"\nTop 5 similar words to '{word}':")
for w, s in word_vectors.most_similar(word, topn=5):
print(f"{w}: {s:.4f}")
except KeyError as e:
print(f"Word not found: {e}")
# ---- TEST CASES ----
explore_word_relationships("king", "man", "woman")
explore_word_relationships("paris", "france", "germany")
analyze_similarity("cat", "dog")
analyze_similarity("computer", "keyboard")
find_similar_words("happy")
find_similar_words("technology")
How to run the program?
open VS Code→ Select new File → on that select python file→ enter and rename → enter
source code→ save it and click on run button →if it is first time then it will download the pre-
trained Word2Vec model (data file), NOT software (only first time), it takes time to download
the data file → then output visible like below.
Output
Program 2: Use dimensionality reduction (e.g., PCA or t-SNE) to visualize
word embeddings. Select words from a specific domain and visualize their
embeddings. Analyze clusters and relationships. Generate contextually rich
outputs using embeddings. Write a program to generate 5 semantically
similar words for a given input.
Aim: To visualize word embeddings using dimensionality reduction techniques such as PCA and t-
SNE, analyze semantic relationships between words, and generate semantically similar words using
pre-trained Word2Vec embeddings.
Source code
# pip install gensim numpy matplotlib scikit-learn
import [Link] as api
import numpy as np
import [Link] as plt
from [Link] import PCA
from [Link] import TSNE
# Step 1: Load pre-trained Word2Vec model
print("Loading Word2Vec model...")
word_vectors = [Link]("word2vec-google-news-300")
# Step 2: Select words (technology domain)
words = [
"computer",
"software",
"hardware",
"algorithm",
"data",
"network",
"programming",
"machine",
"learning",
"artificial"
]
# Step 3: Get vectors
vectors = [Link]([word_vectors[word] for word in words])
# Step 4: PCA Visualization
print("\nShowing PCA Visualization...")
pca = PCA(n_components=2)
pca_result = pca.fit_transform(vectors)
[Link](figsize=(8,6))
for i, word in enumerate(words):
[Link](pca_result[i,0], pca_result[i,1])
[Link](pca_result[i,0], pca_result[i,1], word)
[Link]("PCA Visualization of Word Embeddings")
[Link]("Component 1")
[Link]("Component 2")
[Link]()
[Link]()
# Step 5: t-SNE Visualization
print("Showing t-SNE Visualization...")
tsne = TSNE(n_components=2, perplexity=3, random_state=42)
tsne_result = tsne.fit_transform(vectors)
[Link](figsize=(8,6))
for i, word in enumerate(words):
[Link](tsne_result[i,0], tsne_result[i,1])
[Link](tsne_result[i,0], tsne_result[i,1], word)
[Link]("t-SNE Visualization of Word Embeddings")
[Link]("Component 1")
[Link]("Component 2")
[Link]()
[Link]()
# Step 6: Function to generate similar words
def generate_similar_words(word):
print(f"\nTop 5 semantically similar words to '{word}':")
similar = word_vectors.most_similar(word, topn=5)
for w, score in similar:
print(w, ":", round(score, 4))
# Step 7: Test similar words
generate_similar_words("computer")
generate_similar_words("learning")
print("\nProgram completed successfully.")
Output:
Loading Word2Vec model...
Showing PCA Visualization...
Showing t-SNE Visualization...
Top 5 semantically similar words to 'computer':
computers : 0.7979
laptop : 0.664
laptop_computer : 0.6549
Computer : 0.6473
com_puter : 0.6082
Top 5 semantically similar words to 'learning':
teaching : 0.6602
learn : 0.6365
Learning : 0.6208
reteaching : 0.581
learner_centered : 0.5739
Program completed successfully.
Program 3: Train a custom Word2Vec model on a small dataset. Train
embeddings on a domain-specific corpus and analyze semantic similarity
between words.
Aim: To train a custom Word2Vec model on a domain-specific dataset and analyze semantic
relationships between words using the trained embeddings.
Source Code:
# Install
# python -m pip install gensim numpy matplotlib scikit-learn
import gensim
from [Link] import Word2Vec
import [Link] as plt
from [Link] import TSNE
import numpy as np
# Step 1: Create domain-specific dataset (medical domain example)
corpus = [
["doctor", "treats", "patient"],
["patient", "needs", "medicine"],
["doctor", "prescribes", "medicine"],
["hospital", "has", "doctor"],
["nurse", "helps", "doctor"],
["medicine", "cures", "disease"],
["hospital", "treats", "patient"],
["doctor", "works", "hospital"],
["nurse", "cares", "patient"],
["disease", "needs", "treatment"]
]
# Step 2: Train Word2Vec model
print("Training Word2Vec model...")
model = Word2Vec(
sentences=corpus,
vector_size=100,
window=2,
min_count=1,
workers=4
)
# Step 3: Save model
[Link]("medical_word2vec.model")
# Step 4: Find similar words
def find_similar(word):
print(f"\nWords similar to '{word}':")
similar_words = [Link].most_similar(word, topn=5)
for w, score in similar_words:
print(w, ":", round(score, 4))
# Test similar words
find_similar("doctor")
find_similar("patient")
# Step 5: Visualize word embeddings using t-SNE
print("\nShowing Word Embeddings Visualization...")
words = list([Link].index_to_key)
vectors = [Link][words]
tsne = TSNE(n_components=2, random_state=42, perplexity=3)
reduced_vectors = tsne.fit_transform(vectors)
[Link](figsize=(8,6))
for i, word in enumerate(words):
[Link](reduced_vectors[i,0], reduced_vectors[i,1])
[Link](reduced_vectors[i,0], reduced_vectors[i,1], word)
[Link]("Custom Word2Vec Embeddings Visualization")
[Link]("X")
[Link]("Y")
[Link]()
[Link]()
print("\nProgram completed successfully.")
Output:
Training Word2Vec model...
Words similar to 'doctor':
cares : 0.2161
helps : 0.0932
treats : 0.0929
cures : 0.0797
works : 0.0629
Words similar to 'patient':
prescribes : 0.1607
has : 0.1373
medicine : 0.068
treatment : 0.0336
nurse : 0.0094
Showing Word Embeddings Visualization...
Program completed successfully.
Program 4: Use word embeddings to improve prompts for Generative AI
model. Retrieve similar words using word embeddings. Use the similar
words to enrich a GenAI prompt. Use the AI model to generate responses for
the original and enriched prompts. Compare the outputs in terms of detail
and relevance.
Aim: To use word embeddings to retrieve similar words, enrich prompts, and generate improved
responses using embeddings.
Source Code:
# Install required libraries:
# python -m pip install gensim
import [Link] as api
# Step 1: Load pre-trained Word2Vec model
print("Loading Word2Vec model...")
word_vectors = [Link]("word2vec-google-news-300")
# Step 2: Function to get similar words
def get_similar_words(word, n=5):
similar = word_vectors.most_similar(word, topn=n)
return [w for w, score in similar]
# Step 3: Original prompt
seed_word = "technology"
original_prompt = f"Explain the importance of {seed_word}."
# Step 4: Get similar words
similar_words = get_similar_words(seed_word)
# Step 5: Create enriched prompt
enriched_prompt = original_prompt + " Include related concepts like " + ",
".join(similar_words) + "."
# Step 6: Display prompts
print("\nOriginal Prompt:")
print(original_prompt)
print("\nSimilar Words Found:")
print(similar_words)
print("\nEnriched Prompt:")
print(enriched_prompt)
# Step 7: Simulated AI response (lab-safe, no API needed)
def generate_response(prompt):
return f"\nAI Response:\n{prompt}\nTechnology plays a crucial role in modern
society. It improves efficiency, communication, innovation, and productivity across
various domains."
# Step 8: Generate responses
print("\n--- Original Prompt Response ---")
print(generate_response(original_prompt))
print("\n--- Enriched Prompt Response ---")
print(generate_response(enriched_prompt))
print("\nProgram completed successfully.")
Output:
Loading Word2Vec model...
Original Prompt:
Explain the importance of technology.
Similar Words Found:
['technologies', 'innovations', 'technological_innovations', 'technol',
'technological_advancement']
Enriched Prompt:
Explain the importance of technology. Include related concepts like technologies,
innovations, technological_innovations, technol, technological_advancement.
--- Original Prompt Response ---
AI Response:
Explain the importance of technology.
Technology plays a crucial role in modern society. It improves efficiency,
communication, innovation, and productivity across various domains.
--- Enriched Prompt Response ---
AI Response:
Explain the importance of technology. Include related concepts like technologies,
innovations, technological_innovations, technol, technological_advancement.
Technology plays a crucial role in modern society. It improves efficiency,
communication, innovation, and productivity across various domains.
Program completed successfully.
Program 5: Use word embeddings to create meaningful sentences for
creative tasks. Retrieve similar words for a seed word. Create a sentence or
story using these words as a starting point. Write a program that takes a seed
word, generates similar words, and constructs a short paragraph using these
words.
Aim: To use word embeddings to retrieve similar words for a given seed word and generate a
meaningful paragraph using those words.
Source code:
import [Link] as api
# Step 1: Load pre-trained Word2Vec model
print("Loading Word2Vec model...")
word_vectors = [Link]("word2vec-google-news-300")
# Step 2: Function to get similar words
def get_similar_words(word):
similar = word_vectors.most_similar(word, topn=5)
return [w for w, score in similar]
# Step 3: Input seed word
seed_word = "technology"
# Step 4: Get similar words
similar_words = get_similar_words(seed_word)
# Step 5: Display similar words
print("\nSeed Word:", seed_word)
print("Similar Words:", similar_words)
# Step 6: Generate paragraph using similar words
paragraph = f"""
Technology is transforming the modern world. Innovations in {similar_words[0]} and
{similar_words[1]}
have improved communication and productivity. The development of {similar_words[2]} and
{similar_words[3]}
is helping industries grow rapidly. Overall, technology and {similar_words[4]} are shaping the
future.
"""
# Step 7: Display paragraph
print("\nGenerated Paragraph:")
print(paragraph)
print("Program completed successfully.")
Output:
Loading Word2Vec model...
Seed Word: technology
Similar Words: ['technologies', 'innovations', 'technological_innovations', 'technol',
'technological_advancement']
Generated Paragraph:
Technology is transforming the modern world. Innovations in technologies and
innovationshave improved communication and productivity. The development of
technological_innovations and technol is helping industries grow rapidly. Overall, technology
and technological_advancement are shaping the future.
Program completed successfully.
Program 6: Use a pre-trained Hugging Face model to analyze sentiment in
text. Load the sentiment analysis pipeline and analyze sentiment by giving
sentences as input.
Aim: To use a pre-trained Hugging Face model to perform sentiment analysis on input text.
Note: Before running the program, install below packages in vs code terminal
→python -m pip install transformers==4.38.2 torch==2.2.2 numpy==1.26.4
Source Code:
# python -m pip install transformers torch
from transformers import pipeline
# Step 1: Load sentiment analysis pipeline
print("Loading sentiment analysis model...")
sentiment_analyzer = pipeline("sentiment-analysis")
# Step 2: Input sentences
sentences = [
"I love learning Artificial Intelligence.",
"This is a bad experience.",
"The lab session was very useful and interesting.",
"I am disappointed with the results."
]
# Step 3: Analyze sentiment
print("\nSentiment Analysis Results:\n")
for sentence in sentences:
result = sentiment_analyzer(sentence)[0]
print("Sentence:", sentence)
print("Sentiment:", result['label'])
print("Confidence Score:", round(result['score'], 4))
print("-" * 50)
print("\nProgram completed successfully.")
Output:
Sentiment Analysis Results:
Sentence: I love learning Artificial Intelligence.
Sentiment: POSITIVE
Confidence Score: 0.9995
--------------------------------------------------
Sentence: This is a bad experience.
Sentiment: NEGATIVE
Confidence Score: 0.9998
--------------------------------------------------
Sentence: The lab session was very useful and interesting.
Sentiment: POSITIVE
Confidence Score: 0.9998
--------------------------------------------------
Sentence: I am disappointed with the results.
Sentiment: NEGATIVE
Confidence Score: 0.9998
--------------------------------------------------
Program completed successfully.
Program 7: Summarize long texts using a pre-trained summarization
model using Hugging Face. Load the summarization pipeline. Take a passage
as input and obtain the summarized text.
Aim: To use a pre-trained Hugging Face model to summarize a given text passage.
Install: python -m pip install transformers==4.38.2 torch==2.2.2 numpy==1.26.4
Source code:
from transformers import pipeline
# Step 1: Load summarization model
print("Loading summarization model...")
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
# Step 2: Input text passage
text = """
Artificial Intelligence is a rapidly growing field of computer science.
It focuses on creating intelligent machines that can perform tasks that typically require
human intelligence.
These tasks include speech recognition, decision-making, problem solving, and
language translation.
AI is used in various applications such as healthcare, education, robotics, and
automation.
It improves efficiency, accuracy, and productivity in many industries.
AI is shaping the future by enabling smart technologies.
"""
# Step 3: Generate summary
print("\nOriginal Text:\n")
print(text)
summary = summarizer(text, max_length=60, min_length=20, do_sample=False)
# Step 4: Display summary
print("\nGenerated Summary:\n")
print(summary[0]['summary_text'])
print("\nProgram completed successfully.")
Output:
Original Text:
Artificial Intelligence is a rapidly growing field of computer science.
It focuses on creating intelligent machines that can perform tasks that typically require
human intelligence.
These tasks include speech recognition, decision-making, problem solving, and
language translation.
AI is used in various applications such as healthcare, education, robotics, and
automation.
It improves efficiency, accuracy, and productivity in many industries.
AI is shaping the future by enabling smart technologies.
Generated Summary:
Artificial Intelligence is a rapidly growing field of computer science. It focuses on
creating intelligent machines that can perform tasks that typically require human
intelligence. These tasks include speech recognition, decision-making, problem solving
and language translation.
Program completed successfully.
Program 8: Install langchain, cohere, langchain-community. Get the API key from Cohere.
Load a text document from your Google Drive. Create a prompt template to display the output
in a particular manner.
Aim: To use LangChain and Cohere API to load a text document and generate formatted output using
a prompt template.
Step1→ python -m pip install langchain langchain-community cohere
Step2→ Get Cohere API Key (IMPORTANT)
Students must:
1. Go to: [Link]
2. Sign up / Login
3. Click: API Keys
4. Copy API key
Example:
COHERE_API_KEY = "your_api_key_here"
Source Code:
from langchain_core.prompts import PromptTemplate
from transformers import pipeline
print("Program started...")
# Load text file
with open("[Link]", "r") as file:
text = [Link]()
# Create prompt template
template = """
Summarize the following text in simple words:
{text}
Summary:
"""
prompt = PromptTemplate(
input_variables=["text"],
template=template
)
final_prompt = [Link](text=text)
print("\nGenerated Prompt:\n")
print(final_prompt)
# Load summarization model
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
# Generate summary (correct parameters)
result = summarizer(text, max_length=30, min_length=10, do_sample=False)
print("\nGenerated Output:\n")
print(result[0]['summary_text'])
print("\nProgram completed successfully.")
Output:
Program started...
Generated Prompt:
Summarize the following text in simple words:
Artificial Intelligence is transforming the modern world. It improves automation,
efficiency, and decision-making.
Summary:
Generated Output:
Artificial Intelligence is transforming the modern world. It improves automation,
efficiency, and decision-making.
Program completed successfully.
Program 9: Take the Institution name as input. Use Pydantic to define the schema for the desired
output and create a custom output parser. Invoke the chain and fetch results. Extract the below
Institution related details from Wikipedia:
• Founder of the Institution
• When it was founded
• Current branches in the institution
• Number of employees working
• A brief 4-line summary of the institution
Install: python -m pip install pydantic langchain-core wikipedia transformers
torch
Source Code:
from pydantic import BaseModel, Field
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
import wikipedia
print("Program started...")
# Step 1: Define schema using Pydantic
class Institution(BaseModel):
founder: str = Field(description="Founder of the institution")
founded_year: str = Field(description="Year founded")
branches: str = Field(description="Branches")
employees: str = Field(description="Number of employees")
summary: str = Field(description="Brief summary")
# Step 2: Create parser
parser = PydanticOutputParser(pydantic_object=Institution)
# Step 3: Input institution name
institution_name = "RV Institute of Technology and Management"
# Step 4: Get Wikipedia data
wiki_text = [Link](institution_name, sentences=5)
# Step 5: Create prompt template
template = """
Extract the following information from the text:
Founder
Founded Year
Branches
Employees
Summary
Text:
{text}
{format_instructions}
"""
prompt = PromptTemplate(
template=template,
input_variables=["text"],
partial_variables={
"format_instructions": parser.get_format_instructions()
final_prompt = [Link](text=wiki_text)
print("\nWikipedia Text:\n")
print(wiki_text)
print("\nFormatted Prompt:\n")
print(final_prompt)
# Step 6: Simulated structured output (lab-safe)
sample_output = """
"founder": "Rashtreeya Sikshana Samithi Trust",
"founded_year": "2019",
"branches": "Bangalore",
"employees": "200",
"summary": "RV Institute of Technology and Management is an engineering college in
Bangalore offering technical education."
"""
# Step 7: Parse output
parsed_output = [Link](sample_output)
print("\nParsed Output:\n")
print(parsed_output)
print("\nProgram completed successfully.")
Output:
Program started...
Wikipedia Text:
Bangalore University, established in 1886, provides affiliation to over 500 colleges, with a total
student enrolment exceeding 300,000. The university has two campuses within Bengaluru �
Jnanabharathi and Central College. University Visvesvaraya College of Engineering was
established in the year 1917, by Bharat Ratna Sir M. Visvesvaraya, At present, the UVCE is
the only engineering college under the Bangalore University. Bengaluru also has many private
Engineering Colleges affiliated to Visvesvaraya Technological University. The Bangalore
University was Trifurcated in the year 2017 for the proper management of the students &
Colleges then the Bangalore University was Trifurcated in Bangalore University, Bengaluru
North University and Bengaluru City University .
Formatted Prompt:
Extract the following information from the text:
Founder
Founded Year
Branches
Employees
Summary
Text:
Bangalore University, established in 1886, provides affiliation to over 500 colleges, with a total
student enrolment exceeding 300,000. The university has two campuses within Bengaluru �
Jnanabharathi and Central College. University Visvesvaraya College of Engineering was
established in the year 1917, by Bharat Ratna Sir M. Visvesvaraya, At present, the UVCE is
the only engineering college under the Bangalore University. Bengaluru also has many private
Engineering Colleges affiliated to Visvesvaraya Technological University. The Bangalore
University was Trifurcated in the year 2017 for the proper management of the students &
Colleges then the Bangalore University was Trifurcated in Bangalore University, Bengaluru
North University and Bengaluru City University .
The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of
strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object
{"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
Here is the output schema:
```
{"properties": {"founder": {"description": "Founder of the institution", "title": "Founder",
"type": "string"}, "founded_year": {"description": "Year founded", "title": "Founded Year",
"type": "string"}, "branches": {"description": "Branches", "title": "Branches", "type":
"string"}, "employees": {"description": "Number of employees", "title": "Employees", "type":
"string"}, "summary": {"description": "Brief summary", "title": "Summary", "type":
"string"}}, "required": ["founder", "founded_year", "branches", "employees", "summary"]}
```
Parsed Output:
founder='Rashtreeya Sikshana Samithi Trust' founded_year='2019' branches='Bangalore'
employees='200' summary='RV Institute of Technology and Management is an engineering
college in Bangalore offering technical education.'
Program completed successfully.
Program 10: Build a chatbot for the Indian Penal Code. Download the IPC document and create
a chatbot that can interact with it. Users should be able to ask questions and get answers from
the IPC document.
Aim: To build a chatbot that answers questions based on the Indian Penal Code document
using NLP models.
Install command:
python -m pip install transformers torch langchain-core pypdf
Source Code:
# Program 10: IPC Chatbot
from transformers import pipeline
print("IPC Chatbot started...")
# Step 1: Load IPC document
with open("[Link]", "r") as file:
ipc_text = [Link]()
# Step 2: Load question-answering model
qa_pipeline = pipeline(
"question-answering",
model="distilbert-base-cased-distilled-squad"
# Step 3: Chat loop
while True:
question = input("\nAsk question about IPC (type 'exit' to quit): ")
if [Link]() == "exit":
break
result = qa_pipeline(
question=question,
context=ipc_text
print("\nAnswer:", result['answer'])
print("\nChatbot terminated.")
Output:
IPC Chatbot started...
Ask question about IPC: What is murder?
Answer: Murder is culpable homicide with specific conditions.
Ask question about IPC: What is theft?
Answer: Theft is dishonestly taking movable property without consent.
Ask question about IPC: exit
Chatbot terminated.