0% found this document useful (0 votes)
18 views28 pages

Gen Ai Lab Manual Programs

The document outlines a course on Generative AI, detailing various programs that utilize word embeddings, including exploring pre-trained word vectors, visualizing embeddings with PCA and t-SNE, training custom Word2Vec models, and using embeddings to enhance prompts for Generative AI. Each program includes source code examples and aims to analyze semantic relationships, generate similar words, and improve AI responses. Additionally, it covers sentiment analysis and text summarization using pre-trained models from Hugging Face.

Uploaded by

sxcupsjpg
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views28 pages

Gen Ai Lab Manual Programs

The document outlines a course on Generative AI, detailing various programs that utilize word embeddings, including exploring pre-trained word vectors, visualizing embeddings with PCA and t-SNE, training custom Word2Vec models, and using embeddings to enhance prompts for Generative AI. Each program includes source code examples and aims to analyze semantic relationships, generate similar words, and improve AI responses. Additionally, it covers sentiment analysis and text summarization using pre-trained models from Hugging Face.

Uploaded by

sxcupsjpg
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Course Name: Generative AI

Course Code: BAIL657C


VI Semester 2022 Scheme
Program 1: Explore pre-trained word vectors. Explore word
relationships using vector arithmetic. Perform arithmetic
operations and analyze results.
Source code:
import [Link] as api
import numpy as np
from [Link] import norm
import ssl
import [Link] as api

print("Loading pre-trained word vectors (this may take a few minutes)...")

ssl._create_default_https_context = ssl._create_unverified_context
word_vectors = [Link]("word2vec-google-news-300")

def explore_word_relationships(word1, word2, word3):


try:
vec1 = word_vectors[word1]
vec2 = word_vectors[word2]
vec3 = word_vectors[word3]

result_vector = vec1 - vec2 + vec3

similar_words = word_vectors.similar_by_vector(result_vector, topn=10)

input_words = {word1, word2, word3}


filtered_words = [
(word, similarity) for word, similarity in similar_words
if word not in input_words
]

print(f"\n{word1} - {word2} + {word3}")


print("Top 5 results:")
for word, similarity in filtered_words[:5]:
print(f"{word}: {similarity:.4f}")

except KeyError as e:
print(f"Word not found: {e}")

def analyze_similarity(word1, word2):


try:
similarity = word_vectors.similarity(word1, word2)
print(f"\nSimilarity between '{word1}' and '{word2}': {similarity:.4f}")
except KeyError as e:
print(f"Word not found: {e}")

def find_similar_words(word):
try:
print(f"\nTop 5 similar words to '{word}':")
for w, s in word_vectors.most_similar(word, topn=5):
print(f"{w}: {s:.4f}")
except KeyError as e:
print(f"Word not found: {e}")

# ---- TEST CASES ----


explore_word_relationships("king", "man", "woman")
explore_word_relationships("paris", "france", "germany")

analyze_similarity("cat", "dog")
analyze_similarity("computer", "keyboard")

find_similar_words("happy")
find_similar_words("technology")
How to run the program?
open VS Code→ Select new File → on that select python file→ enter and rename → enter
source code→ save it and click on run button →if it is first time then it will download the pre-
trained Word2Vec model (data file), NOT software (only first time), it takes time to download
the data file → then output visible like below.
Output
Program 2: Use dimensionality reduction (e.g., PCA or t-SNE) to visualize
word embeddings. Select words from a specific domain and visualize their
embeddings. Analyze clusters and relationships. Generate contextually rich
outputs using embeddings. Write a program to generate 5 semantically
similar words for a given input.
Aim: To visualize word embeddings using dimensionality reduction techniques such as PCA and t-
SNE, analyze semantic relationships between words, and generate semantically similar words using
pre-trained Word2Vec embeddings.

Source code
# pip install gensim numpy matplotlib scikit-learn

import [Link] as api


import numpy as np
import [Link] as plt
from [Link] import PCA
from [Link] import TSNE

# Step 1: Load pre-trained Word2Vec model


print("Loading Word2Vec model...")
word_vectors = [Link]("word2vec-google-news-300")

# Step 2: Select words (technology domain)


words = [
"computer",
"software",
"hardware",
"algorithm",
"data",
"network",
"programming",
"machine",
"learning",
"artificial"
]

# Step 3: Get vectors


vectors = [Link]([word_vectors[word] for word in words])

# Step 4: PCA Visualization


print("\nShowing PCA Visualization...")
pca = PCA(n_components=2)
pca_result = pca.fit_transform(vectors)

[Link](figsize=(8,6))
for i, word in enumerate(words):
[Link](pca_result[i,0], pca_result[i,1])
[Link](pca_result[i,0], pca_result[i,1], word)

[Link]("PCA Visualization of Word Embeddings")


[Link]("Component 1")
[Link]("Component 2")
[Link]()
[Link]()

# Step 5: t-SNE Visualization


print("Showing t-SNE Visualization...")
tsne = TSNE(n_components=2, perplexity=3, random_state=42)
tsne_result = tsne.fit_transform(vectors)

[Link](figsize=(8,6))
for i, word in enumerate(words):
[Link](tsne_result[i,0], tsne_result[i,1])
[Link](tsne_result[i,0], tsne_result[i,1], word)

[Link]("t-SNE Visualization of Word Embeddings")


[Link]("Component 1")
[Link]("Component 2")
[Link]()
[Link]()

# Step 6: Function to generate similar words


def generate_similar_words(word):
print(f"\nTop 5 semantically similar words to '{word}':")
similar = word_vectors.most_similar(word, topn=5)

for w, score in similar:


print(w, ":", round(score, 4))

# Step 7: Test similar words


generate_similar_words("computer")
generate_similar_words("learning")

print("\nProgram completed successfully.")


Output:
Loading Word2Vec model...

Showing PCA Visualization...


Showing t-SNE Visualization...

Top 5 semantically similar words to 'computer':


computers : 0.7979
laptop : 0.664
laptop_computer : 0.6549
Computer : 0.6473
com_puter : 0.6082

Top 5 semantically similar words to 'learning':


teaching : 0.6602
learn : 0.6365
Learning : 0.6208
reteaching : 0.581
learner_centered : 0.5739

Program completed successfully.


Program 3: Train a custom Word2Vec model on a small dataset. Train
embeddings on a domain-specific corpus and analyze semantic similarity
between words.

Aim: To train a custom Word2Vec model on a domain-specific dataset and analyze semantic
relationships between words using the trained embeddings.

Source Code:
# Install
# python -m pip install gensim numpy matplotlib scikit-learn

import gensim
from [Link] import Word2Vec
import [Link] as plt
from [Link] import TSNE
import numpy as np

# Step 1: Create domain-specific dataset (medical domain example)


corpus = [
["doctor", "treats", "patient"],
["patient", "needs", "medicine"],
["doctor", "prescribes", "medicine"],
["hospital", "has", "doctor"],
["nurse", "helps", "doctor"],
["medicine", "cures", "disease"],
["hospital", "treats", "patient"],
["doctor", "works", "hospital"],
["nurse", "cares", "patient"],
["disease", "needs", "treatment"]
]

# Step 2: Train Word2Vec model


print("Training Word2Vec model...")
model = Word2Vec(
sentences=corpus,
vector_size=100,
window=2,
min_count=1,
workers=4
)
# Step 3: Save model
[Link]("medical_word2vec.model")

# Step 4: Find similar words


def find_similar(word):
print(f"\nWords similar to '{word}':")
similar_words = [Link].most_similar(word, topn=5)

for w, score in similar_words:


print(w, ":", round(score, 4))

# Test similar words


find_similar("doctor")
find_similar("patient")

# Step 5: Visualize word embeddings using t-SNE


print("\nShowing Word Embeddings Visualization...")

words = list([Link].index_to_key)
vectors = [Link][words]

tsne = TSNE(n_components=2, random_state=42, perplexity=3)


reduced_vectors = tsne.fit_transform(vectors)

[Link](figsize=(8,6))

for i, word in enumerate(words):


[Link](reduced_vectors[i,0], reduced_vectors[i,1])
[Link](reduced_vectors[i,0], reduced_vectors[i,1], word)

[Link]("Custom Word2Vec Embeddings Visualization")


[Link]("X")
[Link]("Y")
[Link]()
[Link]()

print("\nProgram completed successfully.")


Output:

Training Word2Vec model...

Words similar to 'doctor':


cares : 0.2161
helps : 0.0932
treats : 0.0929
cures : 0.0797
works : 0.0629

Words similar to 'patient':


prescribes : 0.1607
has : 0.1373
medicine : 0.068
treatment : 0.0336
nurse : 0.0094

Showing Word Embeddings Visualization...

Program completed successfully.


Program 4: Use word embeddings to improve prompts for Generative AI
model. Retrieve similar words using word embeddings. Use the similar
words to enrich a GenAI prompt. Use the AI model to generate responses for
the original and enriched prompts. Compare the outputs in terms of detail
and relevance.

Aim: To use word embeddings to retrieve similar words, enrich prompts, and generate improved
responses using embeddings.

Source Code:
# Install required libraries:
# python -m pip install gensim

import [Link] as api

# Step 1: Load pre-trained Word2Vec model


print("Loading Word2Vec model...")
word_vectors = [Link]("word2vec-google-news-300")

# Step 2: Function to get similar words


def get_similar_words(word, n=5):
similar = word_vectors.most_similar(word, topn=n)
return [w for w, score in similar]

# Step 3: Original prompt


seed_word = "technology"
original_prompt = f"Explain the importance of {seed_word}."

# Step 4: Get similar words


similar_words = get_similar_words(seed_word)

# Step 5: Create enriched prompt


enriched_prompt = original_prompt + " Include related concepts like " + ",
".join(similar_words) + "."

# Step 6: Display prompts


print("\nOriginal Prompt:")
print(original_prompt)

print("\nSimilar Words Found:")


print(similar_words)

print("\nEnriched Prompt:")
print(enriched_prompt)
# Step 7: Simulated AI response (lab-safe, no API needed)
def generate_response(prompt):
return f"\nAI Response:\n{prompt}\nTechnology plays a crucial role in modern
society. It improves efficiency, communication, innovation, and productivity across
various domains."

# Step 8: Generate responses


print("\n--- Original Prompt Response ---")
print(generate_response(original_prompt))

print("\n--- Enriched Prompt Response ---")


print(generate_response(enriched_prompt))

print("\nProgram completed successfully.")

Output:
Loading Word2Vec model...

Original Prompt:
Explain the importance of technology.

Similar Words Found:


['technologies', 'innovations', 'technological_innovations', 'technol',
'technological_advancement']

Enriched Prompt:
Explain the importance of technology. Include related concepts like technologies,
innovations, technological_innovations, technol, technological_advancement.

--- Original Prompt Response ---

AI Response:
Explain the importance of technology.
Technology plays a crucial role in modern society. It improves efficiency,
communication, innovation, and productivity across various domains.

--- Enriched Prompt Response ---

AI Response:
Explain the importance of technology. Include related concepts like technologies,
innovations, technological_innovations, technol, technological_advancement.
Technology plays a crucial role in modern society. It improves efficiency,
communication, innovation, and productivity across various domains.

Program completed successfully.


Program 5: Use word embeddings to create meaningful sentences for
creative tasks. Retrieve similar words for a seed word. Create a sentence or
story using these words as a starting point. Write a program that takes a seed
word, generates similar words, and constructs a short paragraph using these
words.

Aim: To use word embeddings to retrieve similar words for a given seed word and generate a
meaningful paragraph using those words.

Source code:
import [Link] as api

# Step 1: Load pre-trained Word2Vec model


print("Loading Word2Vec model...")
word_vectors = [Link]("word2vec-google-news-300")

# Step 2: Function to get similar words


def get_similar_words(word):
similar = word_vectors.most_similar(word, topn=5)
return [w for w, score in similar]

# Step 3: Input seed word


seed_word = "technology"

# Step 4: Get similar words


similar_words = get_similar_words(seed_word)

# Step 5: Display similar words


print("\nSeed Word:", seed_word)
print("Similar Words:", similar_words)

# Step 6: Generate paragraph using similar words


paragraph = f"""
Technology is transforming the modern world. Innovations in {similar_words[0]} and
{similar_words[1]}
have improved communication and productivity. The development of {similar_words[2]} and
{similar_words[3]}
is helping industries grow rapidly. Overall, technology and {similar_words[4]} are shaping the
future.
"""

# Step 7: Display paragraph


print("\nGenerated Paragraph:")
print(paragraph)

print("Program completed successfully.")

Output:
Loading Word2Vec model...

Seed Word: technology


Similar Words: ['technologies', 'innovations', 'technological_innovations', 'technol',
'technological_advancement']

Generated Paragraph:

Technology is transforming the modern world. Innovations in technologies and


innovationshave improved communication and productivity. The development of
technological_innovations and technol is helping industries grow rapidly. Overall, technology
and technological_advancement are shaping the future.

Program completed successfully.


Program 6: Use a pre-trained Hugging Face model to analyze sentiment in
text. Load the sentiment analysis pipeline and analyze sentiment by giving
sentences as input.

Aim: To use a pre-trained Hugging Face model to perform sentiment analysis on input text.
Note: Before running the program, install below packages in vs code terminal
→python -m pip install transformers==4.38.2 torch==2.2.2 numpy==1.26.4
Source Code:
# python -m pip install transformers torch

from transformers import pipeline

# Step 1: Load sentiment analysis pipeline


print("Loading sentiment analysis model...")
sentiment_analyzer = pipeline("sentiment-analysis")

# Step 2: Input sentences


sentences = [
"I love learning Artificial Intelligence.",
"This is a bad experience.",
"The lab session was very useful and interesting.",
"I am disappointed with the results."
]

# Step 3: Analyze sentiment


print("\nSentiment Analysis Results:\n")

for sentence in sentences:


result = sentiment_analyzer(sentence)[0]

print("Sentence:", sentence)
print("Sentiment:", result['label'])
print("Confidence Score:", round(result['score'], 4))
print("-" * 50)

print("\nProgram completed successfully.")


Output:
Sentiment Analysis Results:

Sentence: I love learning Artificial Intelligence.


Sentiment: POSITIVE
Confidence Score: 0.9995
--------------------------------------------------
Sentence: This is a bad experience.
Sentiment: NEGATIVE
Confidence Score: 0.9998
--------------------------------------------------
Sentence: The lab session was very useful and interesting.
Sentiment: POSITIVE
Confidence Score: 0.9998
--------------------------------------------------
Sentence: I am disappointed with the results.
Sentiment: NEGATIVE
Confidence Score: 0.9998
--------------------------------------------------

Program completed successfully.


Program 7: Summarize long texts using a pre-trained summarization
model using Hugging Face. Load the summarization pipeline. Take a passage
as input and obtain the summarized text.

Aim: To use a pre-trained Hugging Face model to summarize a given text passage.
Install: python -m pip install transformers==4.38.2 torch==2.2.2 numpy==1.26.4
Source code:
from transformers import pipeline

# Step 1: Load summarization model


print("Loading summarization model...")
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Step 2: Input text passage


text = """
Artificial Intelligence is a rapidly growing field of computer science.
It focuses on creating intelligent machines that can perform tasks that typically require
human intelligence.
These tasks include speech recognition, decision-making, problem solving, and
language translation.
AI is used in various applications such as healthcare, education, robotics, and
automation.
It improves efficiency, accuracy, and productivity in many industries.
AI is shaping the future by enabling smart technologies.
"""

# Step 3: Generate summary


print("\nOriginal Text:\n")
print(text)

summary = summarizer(text, max_length=60, min_length=20, do_sample=False)

# Step 4: Display summary


print("\nGenerated Summary:\n")
print(summary[0]['summary_text'])

print("\nProgram completed successfully.")


Output:
Original Text:

Artificial Intelligence is a rapidly growing field of computer science.


It focuses on creating intelligent machines that can perform tasks that typically require
human intelligence.
These tasks include speech recognition, decision-making, problem solving, and
language translation.
AI is used in various applications such as healthcare, education, robotics, and
automation.
It improves efficiency, accuracy, and productivity in many industries.
AI is shaping the future by enabling smart technologies.

Generated Summary:

Artificial Intelligence is a rapidly growing field of computer science. It focuses on


creating intelligent machines that can perform tasks that typically require human
intelligence. These tasks include speech recognition, decision-making, problem solving
and language translation.

Program completed successfully.


Program 8: Install langchain, cohere, langchain-community. Get the API key from Cohere.
Load a text document from your Google Drive. Create a prompt template to display the output
in a particular manner.

Aim: To use LangChain and Cohere API to load a text document and generate formatted output using
a prompt template.

Step1→ python -m pip install langchain langchain-community cohere

Step2→ Get Cohere API Key (IMPORTANT)


Students must:

1. Go to: [Link]
2. Sign up / Login
3. Click: API Keys
4. Copy API key

Example:

COHERE_API_KEY = "your_api_key_here"

Source Code:
from langchain_core.prompts import PromptTemplate
from transformers import pipeline

print("Program started...")

# Load text file


with open("[Link]", "r") as file:
text = [Link]()

# Create prompt template


template = """
Summarize the following text in simple words:

{text}

Summary:
"""

prompt = PromptTemplate(
input_variables=["text"],
template=template
)

final_prompt = [Link](text=text)

print("\nGenerated Prompt:\n")
print(final_prompt)

# Load summarization model


summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Generate summary (correct parameters)


result = summarizer(text, max_length=30, min_length=10, do_sample=False)

print("\nGenerated Output:\n")
print(result[0]['summary_text'])

print("\nProgram completed successfully.")

Output:
Program started...

Generated Prompt:

Summarize the following text in simple words:

Artificial Intelligence is transforming the modern world. It improves automation,


efficiency, and decision-making.

Summary:

Generated Output:

Artificial Intelligence is transforming the modern world. It improves automation,


efficiency, and decision-making.

Program completed successfully.


Program 9: Take the Institution name as input. Use Pydantic to define the schema for the desired
output and create a custom output parser. Invoke the chain and fetch results. Extract the below
Institution related details from Wikipedia:

• Founder of the Institution


• When it was founded
• Current branches in the institution
• Number of employees working
• A brief 4-line summary of the institution

Install: python -m pip install pydantic langchain-core wikipedia transformers


torch

Source Code:

from pydantic import BaseModel, Field

from langchain_core.prompts import PromptTemplate

from langchain_core.output_parsers import PydanticOutputParser

import wikipedia

print("Program started...")

# Step 1: Define schema using Pydantic

class Institution(BaseModel):

founder: str = Field(description="Founder of the institution")

founded_year: str = Field(description="Year founded")

branches: str = Field(description="Branches")

employees: str = Field(description="Number of employees")

summary: str = Field(description="Brief summary")

# Step 2: Create parser

parser = PydanticOutputParser(pydantic_object=Institution)

# Step 3: Input institution name

institution_name = "RV Institute of Technology and Management"


# Step 4: Get Wikipedia data

wiki_text = [Link](institution_name, sentences=5)

# Step 5: Create prompt template

template = """

Extract the following information from the text:

Founder

Founded Year

Branches

Employees

Summary

Text:

{text}

{format_instructions}

"""

prompt = PromptTemplate(

template=template,

input_variables=["text"],

partial_variables={

"format_instructions": parser.get_format_instructions()

final_prompt = [Link](text=wiki_text)
print("\nWikipedia Text:\n")

print(wiki_text)

print("\nFormatted Prompt:\n")

print(final_prompt)

# Step 6: Simulated structured output (lab-safe)

sample_output = """

"founder": "Rashtreeya Sikshana Samithi Trust",

"founded_year": "2019",

"branches": "Bangalore",

"employees": "200",

"summary": "RV Institute of Technology and Management is an engineering college in


Bangalore offering technical education."

"""

# Step 7: Parse output

parsed_output = [Link](sample_output)

print("\nParsed Output:\n")

print(parsed_output)

print("\nProgram completed successfully.")


Output:
Program started...

Wikipedia Text:

Bangalore University, established in 1886, provides affiliation to over 500 colleges, with a total
student enrolment exceeding 300,000. The university has two campuses within Bengaluru �
Jnanabharathi and Central College. University Visvesvaraya College of Engineering was
established in the year 1917, by Bharat Ratna Sir M. Visvesvaraya, At present, the UVCE is
the only engineering college under the Bangalore University. Bengaluru also has many private
Engineering Colleges affiliated to Visvesvaraya Technological University. The Bangalore
University was Trifurcated in the year 2017 for the proper management of the students &
Colleges then the Bangalore University was Trifurcated in Bangalore University, Bengaluru
North University and Bengaluru City University .

Formatted Prompt:

Extract the following information from the text:

Founder

Founded Year

Branches

Employees

Summary

Text:

Bangalore University, established in 1886, provides affiliation to over 500 colleges, with a total
student enrolment exceeding 300,000. The university has two campuses within Bengaluru �
Jnanabharathi and Central College. University Visvesvaraya College of Engineering was
established in the year 1917, by Bharat Ratna Sir M. Visvesvaraya, At present, the UVCE is
the only engineering college under the Bangalore University. Bengaluru also has many private
Engineering Colleges affiliated to Visvesvaraya Technological University. The Bangalore
University was Trifurcated in the year 2017 for the proper management of the students &
Colleges then the Bangalore University was Trifurcated in Bangalore University, Bengaluru
North University and Bengaluru City University .

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of
strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}

the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object
{"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:

```

{"properties": {"founder": {"description": "Founder of the institution", "title": "Founder",


"type": "string"}, "founded_year": {"description": "Year founded", "title": "Founded Year",
"type": "string"}, "branches": {"description": "Branches", "title": "Branches", "type":
"string"}, "employees": {"description": "Number of employees", "title": "Employees", "type":
"string"}, "summary": {"description": "Brief summary", "title": "Summary", "type":
"string"}}, "required": ["founder", "founded_year", "branches", "employees", "summary"]}

```

Parsed Output:

founder='Rashtreeya Sikshana Samithi Trust' founded_year='2019' branches='Bangalore'


employees='200' summary='RV Institute of Technology and Management is an engineering
college in Bangalore offering technical education.'

Program completed successfully.


Program 10: Build a chatbot for the Indian Penal Code. Download the IPC document and create
a chatbot that can interact with it. Users should be able to ask questions and get answers from
the IPC document.

Aim: To build a chatbot that answers questions based on the Indian Penal Code document
using NLP models.

Install command:

python -m pip install transformers torch langchain-core pypdf

Source Code:

# Program 10: IPC Chatbot

from transformers import pipeline

print("IPC Chatbot started...")

# Step 1: Load IPC document

with open("[Link]", "r") as file:

ipc_text = [Link]()

# Step 2: Load question-answering model

qa_pipeline = pipeline(

"question-answering",

model="distilbert-base-cased-distilled-squad"

# Step 3: Chat loop

while True:
question = input("\nAsk question about IPC (type 'exit' to quit): ")

if [Link]() == "exit":

break

result = qa_pipeline(

question=question,

context=ipc_text

print("\nAnswer:", result['answer'])

print("\nChatbot terminated.")

Output:

IPC Chatbot started...

Ask question about IPC: What is murder?

Answer: Murder is culpable homicide with specific conditions.

Ask question about IPC: What is theft?

Answer: Theft is dishonestly taking movable property without consent.

Ask question about IPC: exit

Chatbot terminated.

You might also like