0% found this document useful (0 votes)
31 views26 pages

Generative AI Application Overview

The document outlines various generative AI applications across text, image, video, and audio domains, detailing their functionalities and use cases in customer operations, marketing, sales, and product development. It discusses models like DALL-E 3 and Stable Diffusion for image synthesis, as well as advancements in video generation and ethical implications of technologies like DeepFakes. Additionally, it highlights the generative AI ecosystem, including models from Cohere, OpenAI, and Hugging Face, emphasizing their capabilities and applications in diverse fields.

Uploaded by

Nameless Wonder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views26 pages

Generative AI Application Overview

The document outlines various generative AI applications across text, image, video, and audio domains, detailing their functionalities and use cases in customer operations, marketing, sales, and product development. It discusses models like DALL-E 3 and Stable Diffusion for image synthesis, as well as advancements in video generation and ethical implications of technologies like DeepFakes. Additionally, it highlights the generative AI ecosystem, including models from Cohere, OpenAI, and Hugging Face, emphasizing their capabilities and applications in diverse fields.

Uploaded by

Nameless Wonder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Generative AI Applications

• Text Based Applications


Ram N Sangwan • Image-based Applications
• Video Generation
• Audio Applications
• Generative AI Ecosystem
Text Based Applications
Text Based Applications

Customer Operations
• Automated customer service based on the customer’s product suite, experience, and language
• Real-time AI call scripts based on conversation history and caller context

Marketing
• Content generation for e-commerce, articles optimized for SEO etc.

Sales
• Custom sales outreach based on interaction history and prospect profile to free up sales representative’s time
• Virtual sales representatives that guide prospects through offerings through to a sale.

Product Development
• Analysis, cleaning, and labeling of large volumes of data, such as user feedback, market trends, logs
• Coding assistant to speed up development, refactoring, and systems integration
Generative AI Applications - Command
LLM Model from Cohere for text generation - Command
• Command is trained to follow user commands and to be instantly useful in practical business
applications
Example use Cases of Command
In-depth analysis of documents

• Command can write product descriptions, help draft emails, suggest example press release copy, and
much more.
Generative AI - Language Translation
Code snippet using Hugging Face's transformers library to perform a simple translation from English to
French:
from transformers import pipeline
translator = pipeline('translation_en_to_fr', model='Helsinki-NLP/opus-mt-en-fr')
english_text = "Hello, how are you?"
french_translation = translator(english_text)

print(french_translation[0]['translation_text'])

• You can also specify the model you want to use for
translation.
• 'Helsinki-NLP/opus-mt-en-fr', is a popular choice for
English to French translation.
Image-Based Applications
Image Synthesis

• DALL-E 3 and Stable Diffusion are two of the most advanced AI systems
designed for image synthesis from textual descriptions.
• They use different architectures but share the common goal of converting
text prompts into detailed images.
DALL-E 3
• Generative Process:
• DALL-E creates images based on user prompts "ideas" It matches words with pictures
it's learned from, then adjusts colors iteratively to produce an image made from a noise.
• Refinement and Selection:
• DALL-E begins with millions of pixels arranged randomly as 'noise.’’
• This 'noise' is like a mechanical starting point like a digital canvas..
• Now DALL-E enters an iterative process, adjusting the colors in the noise to match the
prompt.
• Output:
• A selection of images that closely match the text description.
Stable Diffusion
Known for its efficiency and ability to generate high-quality images quickly.
• Latent Diffusion Process:
• Operates on concept of latent space, where images are represented in a compressed form.
• The model diffuses this latent representation starting from noise, guided by the textual
description.
• Text Conditioning:
• The model is conditioned on the text through an encoder that represents the textual
description in a form that guides the generation process.
• Iterative Refinement:
• Stable Diffusion also refines the image over several steps.
• User Interaction:
• Users can provide additional input during the generation process to steer the outcome towards
a desired result, making it somewhat interactive.
Diffusion Models – How they Work
Pixel Space
Latent Space Conditioning
Zt – 1 .…….. Q(Z| Zt – 1) .……. Zt
X ε Z Forward Diffusion Process ZT
Semantic
Map
Image
Encoder Text
Latent Vector
Denoising U-Net εθ Noisy Vector

Representations

X’ Q Q Q Q Images
D KV KV KV KV
Cross Attention Cross Attention Cross AttentionCross Attention
Generated
Image

Zt Zt – 1

Skip Connector
Zt – 1 .…….. Q(Z| Zt – 1) .……. Zt τθ Text/Image
Transformer
Q
crossattention
KV
Denoising Network to recover the Original Latent Vector
Common Characteristics
• Training Data: The models are trained on
large datasets of images and their
associated textual descriptions, allowing
them to understand the relationship
between text and visual content.
• Prompt Dependency: The quality of the
generated image heavily depends on the
clarity of the text prompt.
• Creative Applications: concept art,
design, illustration, and even educational
purposes.

Try DALL-E 3 and Stable Diffusion at [Link]


Audio Applications
Speech synthesis and Music generation
• Text-to-speech (TTS) technologies like Google WaveNet and Amazon Polly are examples of
advanced speech synthesis systems.
• They convert text into lifelike spoken audio, using deep learning techniques to produce natural-
sounding voices that closely mimic human speech patterns.

[Link]

[Link]

Try at [Link]
Video Generation - OpenAI Sora
Try it at [Link]

• OpenAI revealed Sora to the world on February 15, 2024.


• Examples of generated videos and a research paper on X.

Sora will be available “this year” that is in 2024 and that it “could be a few months.” In an interview with The Wall Street Journal, OpenAI
chief technology officer Mira Murati said.
Video Generation
• Meta's MAV (Machine Actionable Video) uses DeepFake technology to
create realistic video content, raising ethical concerns.
• DeepFakes are manipulated media in which a person's likeness is replaced
with someone else's, often using ML techniques such as Generative
Adversarial Networks.

[Link]
Video Generation -Kling AI
Kling AI in action here

On June 6th, 2024 Kuaishou launched the first domestic video generation large model Kling AI
Features
• Generate videos up to 2 minutes long at 30fps.
• Deep understanding of text-video semantics and Diffusion Transformer architecture.
The Ethical Implications of Deep-Fake Technology
It's important to understand the ethical implications of Deep-Fake Technology.:

Misinformation

Consent Issues

Legal Implications

Privacy Invasion

Mental Health Impact


Generative AI Ecosystem Understanding
Generative AI Ecosystem
Command R:
• Instruction-following conversational model that performs
language tasks at a higher quality, more reliably, and with a
longer context than previous models. Used for code
generation, RAG, tool use and agents.
• Support: 10 key languages
• MAX INPUT TOKENS: 128k
• MAX OUTPUT TOKENS : 4096
• API Endpoint: /chat

Command R+ : (New)
• Command R+ is RAG-optimized model designed to tackle
enterprise-grade workloads.
• RAG with citation to reduce hallucinations
• Multilingual coverage in 10 key languages.
Generative AI Ecosystem
Cohere Command:
• A highly performant generation model.
• Use this model when you're optimizing for accuracy such
as text extraction and sentiment analysis.
• Draft your marketing copies, emails, blog posts, product
descriptions, and then review and use them.

Cohere Embed-English:
• Generate embeddings from text based on various
parameters.
• Embeddings can be used for estimating semantic
similarity between two sentences, choosing a sentence
which is most likely to follow another sentence, or
categorizing user feedback.
• Outputs from the Classify endpoint can be used for
any classification or analysis task.
Generative AI Ecosystem
Cohere Aya:
• This is a multilingual research model from Cohere For
AI (c4ai-aya)
• Support 21 Languages
• API Endpoint /generate

AutoGPT:
• AutoGPT is an open-source project developed by the
community that aims to create a self-improving
language model.
• It uses reinforcement learning techniques to learn from
user feedback and improve its performance over time.
Generative AI Ecosystem

ChatGPT:
• ChatGPT is a large language model developed by
OpenAI that can understand and generate human-
like text based on user input.
• It has been trained on a vast amount of data from
the internet and can be used for various tasks such
as chatbots, customer service, and content
generation.
Azure OPENAI:
• Azure OPEN AI is an open-source platform
developed by Microsoft that provides access to
pre-trained language models like ChatGPT.
• It also includes tools for building custom models
and integrating them into applications.
Generative AI Ecosystem
Hugging Face: (Platform)
• Hugging Face develops and provides access to
pre-trained language models.
• Models are trained on large amounts of text data
and can be used for various NLP tasks such as
sentiment analysis, named entity recognition, and
machine translation.

LaMDA:
• LaMDA is a large language model developed by
Alphabet (the parent company of Google) that can
understand and generate human-like text based on
user input.
Generative AI Ecosystem
LLaMA-3:
• LLaMA-3 is an LLM released by Meta.
• Refined post-training processes significantly lower false
refusal rates, improve response alignment, and boost
diversity in model answers.
• Elevates capabilities like reasoning, code generation, and
instruction following.
• LLaMA-3 is available in 8 and 70 billion parameters.
• Meta Launched Llama-3.1 in 405B, 70B and 8B on July 23,
2024
DALLE-3:
• DALL-E 3 is an AI program created by OpenAI that creates
images from textual descriptions.
• Using more than 10-20 billion parameters, it interprets
An expressive oil painting of a basketball player dunking, natural language inputs and generates the corresponding
depicted as an explosion of a nebula – created using
DALLE 2 image.
Thank You

You might also like