AIET/IQAC/Aca/24-25/CFF
AKASH INSTITUTE OF ENGINEERING AND TECHNOLOGY
DEVANAHALLI BENGALURU- 562110
GENERATIVE AI LAB MANUAL [BAIL657C]
DEPARTMENT OFCOMPUTER SCIENCE & ENGINEERING
By,
Ms. Meghana
Assistant Professor
Department of Computer Science and Engineering (CSE, ISE, AIML, AIDS, DS)
AKASH INSTITUTE OF ENGINEERING & TECHNOLOGY
Experiment 1: Exploring Pre-trained Word Vectors and Word
Relationships Using Vector Arithmetic.
Objective:
• To understand pre-trained word vectors and how they represent words as numbers in a
continuous space.
• To explore word relationships using vector arithmetic.
• To perform arithmetic operations on word vectors and analyze the results using simple examples.
In this experiment, we will learn about pre-trained word vectors and how they help us represent words in
a way that computers can understand. These vectors capture the meaning and context of words. For
example, the word "apple" can be represented as a set of numbers that encode its meaning. Words with
similar meanings will have similar vectors.
We will also explore vector arithmetic, which is a way to perform mathematical operations on these word
vectors to discover relationships between words.
Example:
If you subtract the vector for "cat" from "kitten" and add the vector for "puppy," you get a word related to
young dogs—"dog".
What Are Pre-trained Word Vectors?
Pre-trained word vectors are created by training models on large text datasets. Each word is mapped to a
numerical vector, typically with 100 to 300 dimensions, which captures the meaning and context of the
word.
Why Use Pre-trained Word Vectors?
• Efficient: No need to train a model from scratch.
• Context-Aware: Similar words are close to each other in the vector space.
• Useful for NLP Tasks: Such as translation, sentiment analysis, and question-answering.
Example:
The word "banana" might be represented as a vector like this:
[0.4, -0.7, 0.1, ..., 0.9]
Vector Arithmetic in Word Vectors
Vector arithmetic allows us to perform mathematical operations on word vectors. By adding or subtracting
vectors, we can reveal hidden relationships between words.
Example:
If we want to find out what "lion" is to "cub" as "dog" is to "puppy," we can use the following equation:
cub≈lion−adult+young \text{cub} ≈ \text{lion} - \text{adult} + \text{young}
cub≈lion−adult+young
Word Relationships with Real-Time Examples
Example 1: Animal Relationships
• Vector("kitten") - Vector("cat") + Vector("dog") ≈ Vector("puppy")
Example 2: Fruit Relationships
• Vector("orange") - Vector("fruit") + Vector("tropical") ≈ Vector("mango")
Sample Program: Exploring Animal and Fruit Relationships
# Install Gensim if not already installed
!pip install gensim
from [Link] import KeyedVectors
# Load pre-trained GloVe vectors (100-dimensional)
from [Link] import load
word_vectors = load('glove-wiki-gigaword-100') # Automatically downloads the model
# Example 1: Animal relationship (kitten → cat, puppy → dog)
result = word_vectors.most_similar(positive=['kitten', 'dog'], negative=['cat'], topn=1)
print("Result of 'kitten - cat + dog':", result[0][0]) # Expected output: 'puppy' or a related word
# Example 2: Fruit relationship (orange → fruit, mango → tropical fruit)
result = word_vectors.most_similar(positive=['orange', 'tropical'], negative=['fruit'], topn=1)
print("Result of 'orange - fruit + tropical':", result[0][0]) # Expected output: 'mango' or a related word
OUTPUT
Experiment 2: Visualizing Word Embedding’s and Generating
Semantically Similar Words.
Objective:
• To visualize word embedding’s using dimensionality reduction techniques like PCA or t-SNE.
• To select 10 words from a specific domain (e.g., sports, technology) and analyze the clusters and
relationships between them.
• To generate contextually rich outputs by finding semantically similar words using pre-trained word
embedding’s.
Dimensionality Reduction for Word Embeddings
Word embedding’s like GloVe or Word2Vec represent words in high-dimensional spaces (usually 100 to
300 dimensions). Dimensionality reduction techniques help us visualize these high-dimensional
embedding’s in a 2D or 3D space. This makes it easier to observe clusters and relationships between words.
Techniques:
1. Principal Component Analysis (PCA): A linear method to reduce dimensions while preserving
maximum variance.
2. t-SNE (t-Distributed Stochastic Neighbour Embedding): A non-linear method that captures local
structure and forms better clusters for visualization.
Real-Time Visualization and Semantic Similarity Generation
Step 1: Visualize 10 Words from a Specific Domain
We will select 10 words from the technology domain and visualize their embeddings using t-SNE.
Step 2: Generate 5 Semantically Similar Words for a Given Input
Given an input word, we will use pre-trained word vectors to find the 5 most semantically similar words.
Program
# Install required libraries
!pip install gensim matplotlib scikit-learn numpy
import [Link] as plt
from [Link] import TSNE
from [Link] import load
import numpy as np # Import NumPy for array conversion
# Load pre-trained word vectors (GloVe - 100 dimensions)
word_vectors = load('glove-wiki-gigaword-100')
# Select 10 words from the "technology" domain (ensure words exist in the model)
tech_words = ['computer', 'internet', 'software', 'hardware', 'network', 'data', 'cloud', 'robot', 'algorithm',
'technology']
tech_words = [word for word in tech_words if word in word_vectors.key_to_index]
# Extract word vectors and convert to a NumPy array
vectors = [Link]([word_vectors[word] for word in tech_words])
# Reduce dimensions using t-SNE
tsne = TSNE(n_components=2, random_state=42, perplexity=5) # Perplexity is reduced to match the
small sample size
reduced_vectors = tsne.fit_transform(vectors)
# Plot the 2D visualization
[Link](figsize=(10, 6))
for i, word in enumerate(tech_words):
[Link](reduced_vectors[i, 0], reduced_vectors[i, 1], label=word)
[Link](reduced_vectors[i, 0] + 0.02, reduced_vectors[i, 1] + 0.02, word, fontsize=12)
[Link]("t-SNE Visualization of Technology Words")
[Link]("Dimension 1")
[Link]("Dimension 2")
[Link]()
[Link]()
# Generate 5 semantically similar words for a given input word
input_word = 'computer'
if input_word in word_vectors.key_to_index:
similar_words = word_vectors.most_similar(input_word,
topn=5) print(f"5 words similar to '{input_word}':")
for word, similarity in similar_words:
print(f"{word} (similarity: {similarity:.2f})")
else:
print(f"'{input_word}' is not in the vocabulary.")
OUTPUT
Experiment 3: Train a custom Word2Vec model on a small dataset. Train
embeddings on a domain-specific corpus (e.g., legal, medical) and analyze
how embeddings capture domain-specific semantics.
Objective:
1. Train a custom Word2Vec model on a small domain-specific dataset (medical text).
2. Analyze how the embeddings capture domain-specific word relationships.
3. Generate similar words for a given input to observe how the model learned from the domain-
specific data.
Program
# Install required library
!pip install gensim
from [Link] import Word2Vec
# Step 1: Create a small dataset (list of medical-related word lists)
medical_data = [
["patient", "doctor", "nurse", "hospital", "treatment"],
["cancer", "chemotherapy", "radiation", "surgery", "recovery"],
["infection", "antibiotics", "diagnosis", "disease", "virus"],
["heart", "disease", "surgery", "cardiology", "recovery"]
]
# Step 2: Train a Word2Vec model
model = Word2Vec(sentences=medical_data, vector_size=10,
window=2, min_count=1, workers=1, epochs=50)
# Step 3: Find similar words for a given input word
input_word = "patient"
if input_word in [Link]:
similar_words = [Link].most_similar(input_word, topn=3)
print(f"3 words similar to '{input_word}':")
for word, similarity in similar_words:
print(f"{word} (similarity: {similarity:.2f})")
else:
print(f"'{input_word}' is not in the vocabulary.")
Output:
What This Code Does:
1. Creates a small medical dataset using lists of related words.
2. Trains a Word2Vec model to learn relationships between these words.
3. Finds 3 words similar to the input word, showing how well the model captures relationships.
Experiment 4: Use word embeddings to improve prompts for Generative
AI model. Retrieve similar words using word embeddings. Use the similar
words to enrich a GenAI prompt. Use the AI model to generate responses
for the original and enriched prompts. Compare the outputs in terms of
detail and relevance.
When interacting with Generative AI models (like GPT), the quality of the output often depends on how
well the input prompt is framed. Enhancing prompts using word embeddings helps improve the model's
understanding and provides more contextually rich and detailed responses.
Here’s how we can enhance prompts using Word2Vec embeddings:
Use Word Embeddings:
Word embeddings represent words as vectors in a continuous vector space. Words with similar meanings
have similar vector representations. For example, the word "AI" might be similar to "machine learning"
or "artificial intelligence."
Retrieve Similar Words:
By training or using pre-trained word embeddings, we can find words that are semantically close to the
original prompt. These similar words help make the prompt richer.
Example:
Original Prompt: "Explain the impact of AI on technology."
Enriched Prompt: "Explain the impact of AI, machine learning, deep learning, and data science on
technology."
Generate Responses:
Use a Generative AI model (e.g., OpenAI GPT) to generate responses for both the original and enriched
prompts.
Comparison: The enriched prompt will usually yield a more detailed and relevant response.
Program
# Step 1: Pre-defined dictionary of words and their similar terms (static word embeddings)
word_embeddings = {
"ai": ["machine learning", "deep learning", "data science"],
"data": ["information", "dataset", "analytics"],
"science": ["research", "experiment", "technology"],
"learning": ["education", "training", "knowledge"],
"robot": ["automation", "machine", "mechanism"]
}
# Step 2: Function to find similar words using the static dictionary
def find_similar_words(word):
if word in word_embeddings:
return word_embeddings[word]
else:
return []
# Step 3: Function to enrich a prompt with similar words
def enrich_prompt(prompt):
words = [Link]().split()
enriched_words = []
for word in words:
similar_words = find_similar_words(word)
if similar_words:
enriched_words.append(f"{word} ({', '.join(similar_words)})")
else:
enriched_words.append(word)
return " ".join(enriched_words)
# Step 4: Original prompt
original_prompt = "Explain AI and its applications in science."
# Step 5: Enrich the prompt using similar words
enriched_prompt = enrich_prompt(original_prompt)
# Step 6: Print the original and enriched prompts
print("Original Prompt:")
print(original_prompt)
print("\nEnriched Prompt:")
print(enriched_prompt)
OUTPUT
Experiment 5: Use word embeddings to create meaningful sentences for
creative tasks. Retrieve similar words for a seed word. Create Sentence or
story using these words as a starting point. Write a program that: Takes a
seed word. Generates similar words. Constructs a short paragraph using
these words.
Program
#Step 1: Pre-defined dictionary of words and their similar terms
word_embeddings = {
"adventure": ["journey", "exploration", "quest"],
"robot": ["machine", "automation", "mechanism"],
"forest": ["woods", "jungle", "wilderness"],
"ocean": ["sea", "waves", "depths"],
"magic": ["spell", "wizardry", "enchantment"]
}
#Step 2: Function to get similar words for a seed word
def get_similar_words(seed_word):
if seed_word in word_embeddings:
return word_embeddings[seed_word]
else:
return ["no similar words found"]
#Step 3: Function to create a short paragraph using seed word and similar words def create_paragraph
(seed_word):
def create_paragraph(seed_word):
similar_words = get_similar_words(seed_word)
if not similar_words:
return f"sorry,i couldn't find similar words for'{seed_word}'."
#Construct a short story using the seed word and similar words
paragraph = (
f"once upon a time,there was a great {seed_word},"
f"it was full of {', '.join(similar_words[:-1])},and {similar_words[-1]}."
f"everyone who experienced this {seed_word} always remembered it as a remarkable tale."
)
return paragraph
#Step 4 : Input a seed word
seed_word = "adventure"
story = create_paragraph(seed_word)
#Step 5: Generate and print the Paragraph
print("Generated paragraph:")
print(story)
OUTPUT
What This Program Does:
1. Uses a static dictionary of word embeddings to find similar words for a given seed word.
2. Constructs a short paragraph using the seed word and its similar words.
3. Prints the paragraph, creating a small story based on the seed words.
Experiment 6: Use a pre-trained Hugging Face model to analyze
sentiment in text. Assume a real-world application, Load the
sentiment analysis pipeline. Analyze the sentiment by giving
sentences to input.
PROGRAM:
#Step 1: Install and import the necessary library
!pip install transformers
from transformers import pipeline
#Step 2: Load the sentiment analysis
sentiment_analyzer = pipeline("sentiment-analysis")
#Step 3: Define Sample sentences for analysis
Sentences = [
"I love using this product! It makes my life so much easier.",
"The service was terrible, & I'm very disappointed.",
"It's an average experience, nothing special but not bad either."
]
#Step 4: Analyze the sentiment for each
for sentence in sentences:
result = sentiment_analyzer(sentence)
print("Sentence:", sentence)
print("Sentiment:", result[0]["label"], "(score:", round(result[0]["score"], 2), ")")
print()
OUTPUT
What This Program Does:
1. Loads a pre-trained Hugging Face model for sentiment analysis.
2. Analyzes the sentiment of sample sentences.
3. Prints the sentiment label (POSITIVE, NEGATIVE, or NEUTRAL) along with a confidence score.
Experiment 7: Summarize long texts using a pre-trained summarization
model using Hugging face model. Load the summarization pipeline.
Take a passage as input and obtain the summarized text.
Pip uninstall transformers -y
Pip install transformers==4.40.2
#Step 1: Import the Hugging Face pipeline
from transformers import pipeline
#Step 2: Load the Summarization pipeline
summarizer = pipeline("summarization")
#Step 3: Input a long passage for Summarization
long_text = """Artificial Intelligence (AI) is transforming various industries by automating tasks,
improving efficiency, and enabling new capabilities. In the healthcare sector, AI is used for disease
diagnosis, personalized medicine, and drug discovery. In the business world, AI-powered systems are
optimizing customer service, fraud detection, and supply chain management. AI's impact on everyday
life is significant, from smart assistants to recommendation systems in streaming platforms. As AI
continues to evolve, it promises even greater advancements in fields like education, transportation, and
environmental sustainability."""
# Step 4: Summarize the input passage
summary = summarizer(long_text, max_length=50, min_length=20,
do_sample=False)[0]["summary_text"]
#Step 5: Print the Summarized text
print("Summarized Text:")
print(summary)
OUTPUT
What This Program Does:
1. Uses Hugging Face's pipeline("summarization") to load a pre-trained summarization model.
2. Processes a long text passage and reduces it to a concise summary.
3. Prints the summarized version, which highlights the key points.
Experiment 8: Install langchain, cohere (for key), langchain-community.
Get the api key (By logging into Cohere and obtaining the cohere key).
Load a text document from your google drive. Create a prompt template
to display the output in a particular manner.
Step-by-Step Explanation
• Install necessary libraries: We will install langchain, cohere, and langchain-community.
• Set up the Cohere API: Obtain your Cohere API key by logging into Cohere's platform.
• Load a text document from Google Drive.
• Create a Langchain Prompt Template to process the document and return the result in a particular
format.
PROGRAM:
!pip install langchain langchain-cohere langchain-community
!pip install langchain-core
import os
from [Link] import PromptTemplate
from langchain-cohere import ChatCohere
[Link]["COHERE_API_KEY"] = "Your Cohere key"
file_path = "C:/Users/LAB1/Desktop/[Link]" # file name
with open(file_path, "r", encoding="utf-8") as file:
text_data = [Link]()
prompt_template = """
you are an AI assistant.
analyze the following text & provide
1. Summary
2. key Points
3. Conclusion
text:
{ text }
"""
prompt = PromptTemplate.from_template(prompt_template)
llm = ChatCohere()
chain = prompt | llm
result = [Link]({"text": text_data})
print([Link])
OUTPUT:
What This Program Does:
1. Mounts Google Drive to access a text document (sample_text.txt).
2. Reads the document's content and prepares it for processing.
3. Uses Langchain’s PromptTemplate to create a structured request for summarization.
4. Cohere LLM processes the text and returns the summarized output in a bullet-point format.
Experiment 9: Take the Institution name as input. Use Pydantic to define
the schema for the desired output and create a custom output parser.
Invoke the Chain and Fetch Results. Extract the below Institution related
details from Wikipedia: The founder of the Institution. When it was
founded. The current branches in the institution. How many employees
are working in it. A brief 4-line summary of the institution.
PROGRAM
IN Anaconda Promat
# Step 1: Install required libraries
# conda activate nlp_env
# pip install langchain langchain-core langchain-community langchain-cohere pydantic wikipedia-api
IN Juputer_notebook
# Step 2: Imports
from langchain_cohere import ChatCohere
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel
import wikipediaapi
# Step 3: Pydantic schema
class InstitutionDetails(BaseModel):
founder: str
founded: str
branches: str
employees: str
summary: str
# Step 4: Wikipedia function
def fetch_wikipedia_summary(institution_name):
wiki_wiki = [Link](
language='en',
user_agent="InstitutionInfoBot/1.0 (contact: youremail@[Link])"
)
page = wiki_wiki.page(institution_name)
if [Link]():
return [Link]
else:
return "No information available on Wikipedia for this institution."
# Step 5: Prompt template (IMPROVED for better summary)
prompt_template = """
Extract the following information from the given text:
- Founder
- Founded (year)
- Current branches
- Number of employees
- A clear 4-line brief summary (STRICTLY 4 lines)
Text: {text}
Provide the information EXACTLY in this format:
Founder: <founder>
Founded: <founded>
Branches: <branches>
Employees: <employees>
Summary:
<line1>
<line2>
<line3>
<line4>
"""
prompt = PromptTemplate.from_template(prompt_template)
# Step 6: Input
institution_name = input("Enter the name of the institution: ")
# Step 7: Fetch Wikipedia data
wiki_text = fetch_wikipedia_summary(institution_name)
# Step 8: Setup Cohere (Replace with your NEW API key)
cohere_api_key = "Your cohere API Key"
llm = ChatCohere(cohere_api_key=cohere_api_key)
# Step 9: Create chain
chain = prompt | llm | StrOutputParser()
# Step 10: Run the chain
response = [Link]({"text": wiki_text})
# Step 11: Improved parsing (handles multi-line summary)
try:
lines = [Link]("\n")
data = {}
current_key = None
for line in lines:
if ":" in line:
key, value = [Link](":", 1)
current_key = [Link]().lower()
data[current_key] = [Link]()
else:
if current_key:
data[current_key] += " " + [Link]()
details = InstitutionDetails(
founder=[Link]("founder", ""),
founded=[Link]("founded", ""),
branches=[Link]("branches", ""),
employees=[Link]("employees", ""),
summary=[Link]("summary", "")
)
print("\nInstitution Details:")
print(f"Founder: {[Link]}")
print(f"Founded: {[Link]}")
print(f"Branches: {[Link]}")
print(f"Employees: {[Link]}")
print(f"Summary: {[Link]}")
except Exception as e:
print("Error parsing response:", e)
OUTPUT:
1 Enter the name of the institution: Amazon
Institution Details:
Founder: Jeff Bezos
Founded: 1994
Branches: Over 75 fulfillment centers and 25 sortation centers in North America, and numerous
international locations (exact number not specified in the text)
Employees: Approximately 1,200,000 (as of 2021, not specified in the text)
Summary: Amazon is an American multinational technology company focusing on e-commerce, cloud
computing, digital streaming, and AI. Founded by Jeff Bezos in 1994, it's a global leader.
Experiment 10: Build a chatbot for the Indian Penal Code. We'll start by
downloading the official Indian Penal Code document, and then we'll
create a chatbot that can interact with it. Users will be able to ask
questions about the Indian Penal Code and have a conversation with it.
PROGRAM:
# Install once
import sys
!{[Link]} -m pip install sentence-transformers chromadb
from sentence_transformers import SentenceTransformer
import numpy as np
# Load IPC file
with open(r" YOUR FILE PATH ", "r", encoding="utf-8") as f:
text = [Link]()
# Split into chunks
chunks = [Link]("\n\n")
# Load embedding model (offline)
model = SentenceTransformer("all-MiniLM-L6-v2")
# Convert to vectors
embeddings = [Link](chunks)
# Chat loop
print("Offline IPC Chatbot ")
print("Ask a question (type 'exit' to stop):")
while True:
query = input("\nYour question: ")
if [Link]() == "exit":
break
query_vec = [Link]([query])[0]
# Find most similar chunk
similarities = [Link](embeddings, query_vec)
best_match = chunks[[Link](similarities)]
print("Answer:", best_match)
OUTPUT: