Generative AI Applications
• Text Based Applications
Ram N Sangwan • Image-based Applications
• Video Generation
• Audio Applications
• Generative AI Ecosystem
Text Based Applications
Text Based Applications
Customer Operations
• Automated customer service based on the customer’s product suite, experience, and language
• Real-time AI call scripts based on conversation history and caller context
Marketing
• Content generation for e-commerce, articles optimized for SEO etc.
Sales
• Custom sales outreach based on interaction history and prospect profile to free up sales representative’s time
• Virtual sales representatives that guide prospects through offerings through to a sale.
Product Development
• Analysis, cleaning, and labeling of large volumes of data, such as user feedback, market trends, logs
• Coding assistant to speed up development, refactoring, and systems integration
Generative AI Applications - Command
LLM Model from Cohere for text generation - Command
• Command is trained to follow user commands and to be instantly useful in practical business
applications
Example use Cases of Command
In-depth analysis of documents
• Command can write product descriptions, help draft emails, suggest example press release copy, and
much more.
Generative AI - Language Translation
Code snippet using Hugging Face's transformers library to perform a simple translation from English to
French:
from transformers import pipeline
translator = pipeline('translation_en_to_fr', model='Helsinki-NLP/opus-mt-en-fr')
english_text = "Hello, how are you?"
french_translation = translator(english_text)
print(french_translation[0]['translation_text'])
• You can also specify the model you want to use for
translation.
• 'Helsinki-NLP/opus-mt-en-fr', is a popular choice for
English to French translation.
Image-Based Applications
Image Synthesis
• DALL-E 3 and Stable Diffusion are two of the most advanced AI systems
designed for image synthesis from textual descriptions.
• They use different architectures but share the common goal of converting
text prompts into detailed images.
DALL-E 3
• Generative Process:
• DALL-E creates images based on user prompts "ideas" It matches words with pictures
it's learned from, then adjusts colors iteratively to produce an image made from a noise.
• Refinement and Selection:
• DALL-E begins with millions of pixels arranged randomly as 'noise.’’
• This 'noise' is like a mechanical starting point like a digital canvas..
• Now DALL-E enters an iterative process, adjusting the colors in the noise to match the
prompt.
• Output:
• A selection of images that closely match the text description.
Stable Diffusion
Known for its efficiency and ability to generate high-quality images quickly.
• Latent Diffusion Process:
• Operates on concept of latent space, where images are represented in a compressed form.
• The model diffuses this latent representation starting from noise, guided by the textual
description.
• Text Conditioning:
• The model is conditioned on the text through an encoder that represents the textual
description in a form that guides the generation process.
• Iterative Refinement:
• Stable Diffusion also refines the image over several steps.
• User Interaction:
• Users can provide additional input during the generation process to steer the outcome towards
a desired result, making it somewhat interactive.
Diffusion Models – How they Work
Pixel Space
Latent Space Conditioning
Zt – 1 .…….. Q(Z| Zt – 1) .……. Zt
X ε Z Forward Diffusion Process ZT
Semantic
Map
Image
Encoder Text
Latent Vector
Denoising U-Net εθ Noisy Vector
Representations
X’ Q Q Q Q Images
D KV KV KV KV
Cross Attention Cross Attention Cross AttentionCross Attention
Generated
Image
Zt Zt – 1
Skip Connector
Zt – 1 .…….. Q(Z| Zt – 1) .……. Zt τθ Text/Image
Transformer
Q
crossattention
KV
Denoising Network to recover the Original Latent Vector
Common Characteristics
• Training Data: The models are trained on
large datasets of images and their
associated textual descriptions, allowing
them to understand the relationship
between text and visual content.
• Prompt Dependency: The quality of the
generated image heavily depends on the
clarity of the text prompt.
• Creative Applications: concept art,
design, illustration, and even educational
purposes.
Try DALL-E 3 and Stable Diffusion at [Link]
Audio Applications
Speech synthesis and Music generation
• Text-to-speech (TTS) technologies like Google WaveNet and Amazon Polly are examples of
advanced speech synthesis systems.
• They convert text into lifelike spoken audio, using deep learning techniques to produce natural-
sounding voices that closely mimic human speech patterns.
[Link]
[Link]
Try at [Link]
Video Generation - OpenAI Sora
Try it at [Link]
• OpenAI revealed Sora to the world on February 15, 2024.
• Examples of generated videos and a research paper on X.
Sora will be available “this year” that is in 2024 and that it “could be a few months.” In an interview with The Wall Street Journal, OpenAI
chief technology officer Mira Murati said.
Video Generation
• Meta's MAV (Machine Actionable Video) uses DeepFake technology to
create realistic video content, raising ethical concerns.
• DeepFakes are manipulated media in which a person's likeness is replaced
with someone else's, often using ML techniques such as Generative
Adversarial Networks.
[Link]
Video Generation -Kling AI
Kling AI in action here
On June 6th, 2024 Kuaishou launched the first domestic video generation large model Kling AI
Features
• Generate videos up to 2 minutes long at 30fps.
• Deep understanding of text-video semantics and Diffusion Transformer architecture.
The Ethical Implications of Deep-Fake Technology
It's important to understand the ethical implications of Deep-Fake Technology.:
Misinformation
Consent Issues
Legal Implications
Privacy Invasion
Mental Health Impact
Generative AI Ecosystem Understanding
Generative AI Ecosystem
Command R:
• Instruction-following conversational model that performs
language tasks at a higher quality, more reliably, and with a
longer context than previous models. Used for code
generation, RAG, tool use and agents.
• Support: 10 key languages
• MAX INPUT TOKENS: 128k
• MAX OUTPUT TOKENS : 4096
• API Endpoint: /chat
Command R+ : (New)
• Command R+ is RAG-optimized model designed to tackle
enterprise-grade workloads.
• RAG with citation to reduce hallucinations
• Multilingual coverage in 10 key languages.
Generative AI Ecosystem
Cohere Command:
• A highly performant generation model.
• Use this model when you're optimizing for accuracy such
as text extraction and sentiment analysis.
• Draft your marketing copies, emails, blog posts, product
descriptions, and then review and use them.
Cohere Embed-English:
• Generate embeddings from text based on various
parameters.
• Embeddings can be used for estimating semantic
similarity between two sentences, choosing a sentence
which is most likely to follow another sentence, or
categorizing user feedback.
• Outputs from the Classify endpoint can be used for
any classification or analysis task.
Generative AI Ecosystem
Cohere Aya:
• This is a multilingual research model from Cohere For
AI (c4ai-aya)
• Support 21 Languages
• API Endpoint /generate
AutoGPT:
• AutoGPT is an open-source project developed by the
community that aims to create a self-improving
language model.
• It uses reinforcement learning techniques to learn from
user feedback and improve its performance over time.
Generative AI Ecosystem
ChatGPT:
• ChatGPT is a large language model developed by
OpenAI that can understand and generate human-
like text based on user input.
• It has been trained on a vast amount of data from
the internet and can be used for various tasks such
as chatbots, customer service, and content
generation.
Azure OPENAI:
• Azure OPEN AI is an open-source platform
developed by Microsoft that provides access to
pre-trained language models like ChatGPT.
• It also includes tools for building custom models
and integrating them into applications.
Generative AI Ecosystem
Hugging Face: (Platform)
• Hugging Face develops and provides access to
pre-trained language models.
• Models are trained on large amounts of text data
and can be used for various NLP tasks such as
sentiment analysis, named entity recognition, and
machine translation.
LaMDA:
• LaMDA is a large language model developed by
Alphabet (the parent company of Google) that can
understand and generate human-like text based on
user input.
Generative AI Ecosystem
LLaMA-3:
• LLaMA-3 is an LLM released by Meta.
• Refined post-training processes significantly lower false
refusal rates, improve response alignment, and boost
diversity in model answers.
• Elevates capabilities like reasoning, code generation, and
instruction following.
• LLaMA-3 is available in 8 and 70 billion parameters.
• Meta Launched Llama-3.1 in 405B, 70B and 8B on July 23,
2024
DALLE-3:
• DALL-E 3 is an AI program created by OpenAI that creates
images from textual descriptions.
• Using more than 10-20 billion parameters, it interprets
An expressive oil painting of a basketball player dunking, natural language inputs and generates the corresponding
depicted as an explosion of a nebula – created using
DALLE 2 image.
Thank You