0% found this document useful (0 votes)
5 views7 pages

Micro Genai

The document outlines various tasks involving word embeddings, including exploring pre-trained vectors, visualizing embeddings, training a custom Word2Vec model, and using embeddings to enhance prompts for Generative AI models. It includes code examples for performing these tasks using libraries like Gensim and Transformers, as well as techniques for sentiment analysis and text summarization. The document emphasizes the application of word embeddings in different domains and creative tasks.

Uploaded by

chiragmohite02
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views7 pages

Micro Genai

The document outlines various tasks involving word embeddings, including exploring pre-trained vectors, visualizing embeddings, training a custom Word2Vec model, and using embeddings to enhance prompts for Generative AI models. It includes code examples for performing these tasks using libraries like Gensim and Transformers, as well as techniques for sentiment analysis and text summarization. The document emphasizes the application of word embeddings in different domains and creative tasks.

Uploaded by

chiragmohite02
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1. Explore pre-trained word vectors. Explore word relationships using vector arithmetic.

Perform arithmetic operations and


analyze results.

!pip install gensim

from [Link] import load

print("Loading pre-trained Glove model (50 Dimensions)___")

model= load("glove-wiki-gigaword-50")

def explore_world_vectors():

result=model.most_similar(positive=['woman','king'],negative=['man'],topn=1)

print("\n king-man+woman=", result[0][0])

print("Similarity Score:",result[0][1])

result=model.most_similar(positive=['paris','italy'],negative=['france'],topn=1)

print("\n france-paris+italy=", result[0][0])

print("Similarity Score:",result[0][1])

result=model.most_similar(positive=['programming'],topn=5)

print("\n Top 5 words similar to 'programming':")

for word,similarity in result:

print(word,similarity)

result=model.most_similar(positive=['king','young'], negative=['adult'],topn=1)

print("\n king-adult+young=",result[0][0])

print("Similarity Score:",result[0][1])

explore_world_vectors()
2. Use dimensionality reduction (e.g., PCA or t-SNE) to visualize word embeddings for Q 1. Select 10 words from a specific
domain (e.g., sports, technology) and visualize their embeddings. Analyze clusters and relationships. Generate contextually rich
outputs using embeddings. Write a program to generate 5 semantically similar words for a given input.

!pip install gensim matplotlib scikit-learn

import [Link] as api

import [Link] as plt

from [Link] import PCA

model=[Link]("glove-wiki-gigaword-50")

words= ["computer","software","hardware","internet","network","data","ai","programming","algorithm","cloud"]

vectors=[model[word] for word in words]

pca=PCA(n_components=2)

reduced_vectors=pca.fit_transform(vectors)

[Link](figsize=(8,6))

for i,word in enumerate(words):

[Link](reduced_vectors[i][0], reduced_vectors[i][1])

[Link](reduced_vectors[i][0]+0.01, reduced_vectors[i][1]+0.01, word)

[Link]("PCA Visualization of Technology Word Embeddings")

[Link]('PCA Compponent 1')

[Link]('PCA Component 2')

[Link]()

Similar_words=model.most_similar("programming",topn=5)

print("Top 5 similar words to 'programming':")

for word,score in Similar_words:

print(word,score)
3. Train a custom Word2Vec model on a small dataset. Train embeddings on a domain-specific corpus (e.g., legal, medical) and
analyze how embeddings capture domain-specific semantics.

!pip install gensim

from [Link] import Word2Vec

Sentences=[

"the doctors examined the patient",

"the patient was diagnosed with diabetes",

"the doctor prescribed medicine",

"nedicinal treatment improves patient health",

"the hospital provides medical care",

"nurses assist the doctor during treatment",

"the diagnosis helps in treatment planning"

tokenized_sentences=[[Link]() for sentence in Sentences]

model=Word2Vec(sentences=tokenized_sentences,vector_size=50,window=5,min_count=1,workers=4)

print("\n Words similar to 'doctor':")

similar_words=[Link].most_similar("doctor",topn=5)

for word,score in similar_words:

print(word,score)

print("\n Words similar to 'treatment':")

similar_words=[Link].most_similar("treatment",topn=5)

for word,score in similar_words:

print(word,score)
4. Use word embeddings to improve prompts for Generative AI model. Retrieve similar words using word embeddings. Use the
similar words to enrich a GenAI prompt. Use the AI model to generate responses for the original and enriched prompts. Compare
the outputs in terms of detail and relevance.

!pip install numpy sentence_transformers scikit-learn transformers

import numpy as np

from sentence_transformers import SentenceTransformer

from [Link] import cosine_similarity

from transformers import pipeline

embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

corpus=["global warming","greenhouse effect","carbon emissions","fossil fuels","rising temperatures","extreme weather","sea-


level rises","climate crisis","environmental impact","sustainability","renewable energy","deforestation","methane
emission","carbon footprint"]

corpus_embeddings=embedding_model.encode(corpus)

def get_similar_terms(query, top_k=5):

query_embedding=embedding_model.encode([query])

similarities=cosine_similarity(query_embedding, corpus_embeddings)[0]

top_indices=[Link]()[-top_k:][::-1]

return [(corpus[i],similarities[i]) for i in top_indices]

original_prompt="Explain climate change"

similar_terms=get_similar_terms("climate change",top_k=5)

print("Top Similar Terms:")

for word, score in similar_terms:

print(f"{word} (Similarity: {score:.4f})")

similar_word_list=[word for word,score in similar_terms]

enriched_prompt=f"{original_prompt} Include discussion of" + ",".join(similar_word_list)+"."

print("\n Enriched Prompt: \n")

print(enriched_prompt)

generator=pipeline("text-generation", model="google/flan-t5-large", device=0)

original_response=generator(original_prompt, max_length=300)[0]["generated_text"]

enriched_response=generator(enriched_prompt, max_length=300)[0]["generated_text"]

print("\n______Original Response______\n")

print(original_response)

print("\n______Enriched Response______\n")

print(enriched_response)

print("\nWord Count Comparision:")

print("\nOriginal:",len(original_response.split()))

print("Enriched:",len(enriched_response.split()))
5. Use word embeddings to create meaningful sentences for creative tasks. Retrieve similar words for a seed word. Create a
sentence or story using these words as a starting point. Write a program that: Takes a seed word. Generates similar words.
Constructs a short paragraph using these words.

!pip install gensim

import [Link] as api

print("Loading model...")

model=[Link]("glove-wiki-gigaword-50")

seed_word=input("Enter a seed word:").lower()

if seed_word in model:

similar_words=model.most_similar(seed_word, topn=5)

print("\nSimilar Words:")

words_list=[]

for word,score in similar_words:

print(f"{word}(similarity: {score:.4f})")

words_list.append(word)

paragraph=(f"{seed_word.capitalize()} is connected with " + "," .join(words_list[:-1]) +f", and {words_list[-1]}." f"These elements
together define the essence of {seed_word}.")

print("\nGenerated Paragraph:")

print(paragraph)

else:

print("Seed word not found in the model.")


6. Use a pre-trained Hugging Face model to analyze sentiment in text. Assume a real-world application, Load the sentiment
analysis pipeline. Analyze the sentiment by giving sentences to input.

!pip install transformers

from transformers import pipeline

sentence_pipeline=pipeline("sentiment-analysis")

input_sentences=[

"The new phone I bought is absolutely amazing!",

"Worst customer service ever. I'm never coming back.",

"The experience was avaerage, nothing special.",

"Fast delivery and the packaging was perfect.",

"The product broke within two days. Very disappointed."

results = sentence_pipeline(input_sentences)

print("Sentiment Analysis Results:\n")

for sentence,result in zip(input_sentences,results):

print(f"Sentence: {sentence}")

print(f"Predicted Sentiment: {result['label']}, Confidence Score: {result['score']:.2f}\n")


7. Summarize long texts using a pre-trained summarization model using Hugging face model. Load the summarization pipeline.
Take a passage as input and obtain the summarized text.

!pip install transformers sentencepiece -q

from transformers import pipeline

summarizer = pipeline("text-generation", model="t5-small")

text = """

The Industrial Revolution changed societies from farming-based to industrial economies.

Factories, machines and steam engines increased production and improved transportation.

"""

summary = summarizer(

"summarize: " + text,

max_length=60,

min_length=30,

do_sample=False

print(summary[0]['generated_text'])

You might also like