0% found this document useful (0 votes)
29 views54 pages

Understanding Generative AI: Uses & Models

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views54 pages

Understanding Generative AI: Uses & Models

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Generative AI

Introduction
Generative AI
Generative AI refers to artificial intelligence systems that can create
new content, such as images, music, text, videos or even entire virtua
environments.
These systems learn patterns from existing data and generate new
data that resembles the training data.
It should be noted, Generative AI is not brand-new.
Generative AI was introduced in the 1960s in chatbots.
But it was not until 2014 , with the introduction of generative
adversarial networks or GANs by Ian Goodfellow, -- a type o
machine learning algorithm -- that generative AI could create
convincingly authentic images, videos and audio of real people.
Generative AI
On the one hand, this newfound capability has opened up opportunitie
that include better movie dubbing and rich educational content.
It also unlocked concerns about deepfakes -- digitally forged images o
videos -- and harmful cybersecurity attacks on businesses.
The rapid advances in so-called large language models (LLMs) -- i.e.
models with billions or even trillions of parameters -- have opened a new
era in which generative AI models can write engaging text, pain
photorealistic images and even create somewhat entertaining sitcoms on
the fly.
Moreover, innovations in multimodal AI enable teams to generate conten
across multiple types of media, including text, graphics and video.
This is the basis for tools like Dall-E that automatically create images from a
text description or generate text captions from images.
How does generative AI work?
Generative AI starts with a prompt that could be in the form of a
text, an image, a video, a design, musical notes, or any inpu
that the AI system can process.
Various AI algorithms then return new content in response to
the prompt. Content can include essays, solutions to problems
or realistic fakes created from pictures or audio of a person.
After an initial response, you can also customize the results with
feedback about the style, tone and other elements you want the
generated content to reflect.
Generative AI vs. AI
Generative AI focuses on creating new and original content, chat responses
designs, synthetic data or even deepfakes.
It's particularly valuable in creative fields and for novel problem-solving, as it can
autonomously generate many types of new outputs.
Generative AI, as noted above, relies on neural network techniques such a
transformers, GANs and VAEs.
Generative AI often starts with a prompt that lets a user or data source submit a
starting query or data set to guide content generation.
This can be an iterative process to explore content variations.
Traditional AI algorithms, on the other hand, often follow a predefined set o
rules to process data and produce a result.
Both approaches have their strengths and weaknesses depending on the problem
to be solved, with generative AI being well-suited for tasks involving NLP and
calling for the creation of new content, and traditional algorithms more
effective for tasks involving rule-based processing and predetermined
outcomes.
What are use cases for generative AI?
Some of the use cases for generative AI include the following:
Implementing chatbots for customer service and technical support.
Deploying deepfakes for mimicking people or even specific individuals.
Improving dubbing for movies and educational content in different languages.
Writing email responses, dating profiles, resumes and term papers.
Creating photorealistic art in a particular style.
Improving product demonstration videos.
Suggesting new drug compounds to test.
Designing physical products and buildings.
Optimizing new chip designs.
Writing music in a specific style or tone.
Use cases for generative AI, by industry
Here are some ways generative AI applications could impact different industries:
Finance can watch transactions in the context of an individual's history to bu
better fraud detection systems.
Legal firms can use generative AI to design and interpret contracts, analy
evidence and suggest arguments.
Manufacturers can use generative AI to combine data from cameras, X-ray an
other metrics to identify defective parts and the root causes more accurately an
economically.
Film and media companies can use generative AI to produce content mo
economically and translate it into other languages with the actors' own voices.
The medical industry can use generative AI to identify promising drug candidat
more efficiently.
Architectural firms can use generative AI to design and adapt prototypes mo
quickly.
Gaming companies can use generative AI to design game content and levels.
What are the benefits of generative AI?
Some of the potential benefits of implementing generative AI include
he following:
Automating the manual process of writing content.
Reducing the effort of responding to emails.
Improving the response to specific technical queries.
Creating realistic representations of people.
Summarizing complex information into a coherent narrative.
Simplifying the process of creating content in a particular style.
What are the limitations of generative AI?
Here are some of the limitations to consider when implementing or
using a generative AI:
It does not always identify the source of content.
It can be challenging to assess the bias of original sources.
Realistic-sounding content makes it harder to identify inaccurate
information.
It can be difficult to understand how to tune for new circumstances.
Results can gloss over bias, prejudice and hatred.
What are some examples of generative AI tools?
Generative AI tools exist for various modalities, such as text, imagery, musi
code and voices. Some popular AI content generators to explore include th
following:
Text generation tools include GPT, Jasper, AI-Writer and Lex.
mage generation tools include Dall-E 2, Midjourney and Stable Diffusion.
Music generation tools include Amper, Dadabots and MuseNet.
Code generation tools include CodeStarter, Codex, GitHub Copilot an
Tabnine.
Voice synthesis tools include Descript, Listnr and [Link].
AI chip design tool companies include Synopsys, Cadence, Google and Nvidia
What are Dall-E, ChatGPT and Gemini?
ChatGPT, Dall-E and Gemini (formerly Bard) are popular generative AI interfaces.

. ChatGPT:
GPT-1 (2018): Introduced basic text generation capabilities.
GPT-2 (2019): Improved contextual understanding and longer
esponses.
GPT-3 (2020): Became a powerhouse with 175 billion parameters,
nabling creative tasks like writing essays and coding.
GPT-4 (2023): Added multimodal capabilities, processing both text
nd images for enhanced accuracy and versatility
GPT-5 (2025): many more functions were added.
Google Gemini (2023):
Google Gemini, built on a lightweight version of its LaMDA family of large languag
models
o Combines multiple data types like text, images, and audio for comprehensive
insights.
o Enhanced the performance of Bard, Google’s AI chatbot.
o Focuses on real-time problem-solving and personalized experiences.
Google has a new version of Gemini built on its most advanced LLM, PaLM
2, which allows Gemini to be more efficient and visual in its response to us
queries.
Microsoft Copilot:
o Integrated with Microsoft 365 tools like Word, Excel, and Teams to
assist with everyday tasks.
o Provides contextual suggestions, real-time collaboration, and
automated document creation.
o Continuously evolves to offer better workflow automation.
Dall-E. Trained on a large data set of images and their associated text
descriptions.
Dall-E is an example of a multimodal AI application that identifies
connections across multiple media, such as vision, text and audio.
In this case, it connects the meaning of words to visual elements. It
was built using OpenAI's GPT implementation in 2021.
Dall-E 2, a second, more capable version, was released in 2022. It
enables users to generate imagery in multiple styles driven by user
prompts.
Generative AI Models
Generative AI models combine various AI algorithms to represent and process content.
Generative AI is a rapidly growing field with various models emerging, each with its
unique strengths and ideal use cases.
At their core, generative models work by capturing the patterns and structure within
whether it’s images, text, music, or any other form.
By understanding these patterns, they can then generate new, similar data that often
realistic.
However, despite this common working principle, different generative models
significantly in their architecture, training, capabilities, and variations.
For example, to generate text, various natural language processing techniques transform
characters (e.g., letters, punctuation and words) into sentences, parts of speech, entitie
actions, which are represented as vectors using multiple encoding techniques.
Similarly, images are transformed into various visual elements, also expressed as vectors
ome popular types of generative models include:
Variational Auto-encoders (VAEs),
Generative Adversarial Networks (GANs),
and Recurrent Neural Networks (RNNs).
Generative Vs Discriminative Models

their core, generative models are a class of


chine-learning models designed to learn the
derlying patterns in data.
s data can be audio, text, or visuals like
ages and videos.
hen the model learns those patterns and
eir distribution, it allows to generate new
ta.
wever, the way this works contrasts with
criminative models, which are the types of AI
odels trained for tasks like regression,
ssification, clustering, and more.
Encoder -Decoder Model
The Encoder-Decoder architecture is an RNN framework designed for
sequence-to-sequence tasks.
In this setup, the Encoder processes an input sequence and produces
a context vector (Latent Variables) , which encapsulates the
information from the input.
The Decoder then uses this context vector to generate an output
sequence.
This architecture is commonly applied in areas such as machine
translation, text summarization, and speech recognition.
Variational Autoencoder (VAE)
VAEs are a type of generative model that combines neural networks with
probabilistic graphical models.
They encode input data into a lower-dimensional latent space and then
decode it back to reconstruct the original data.
Working: VAEs consist of two main components: an encoder and a
decoder.
The encoder compresses the input into a latent space (representation
(context vector) , and the decoder reconstructs the data from this laten
space.
VAEs introduce a probabilistic approach by modeling the latent variables.
The learning process of an auto-encoder involves learning how to
compress the data while minimizing the reconstruction error.
This is useful when we want to de-noise images, feature extraction, and
image reconstruction.
Steps
Encoding: Maps input data to a distribution over latent variables. The input data
is passed through the encoder (a neural network) to obtain parameters of a
probability distribution (mean μ and variance σ2).
Sampling: From the distribution defined by μ and σ, a latent variable z i
sampled.
Latent Variables (z): Represents compressed information about the input data.
Decoding: The latent variable z is fed into the decoder (another neural network
to reconstruct the original input x′. (aiming to approximate the original input)
Loss Calculation: The loss is a combination of reconstruction loss (how well x
matches x) and a regularization term (Kullback-Leibler divergence) to ensure tha
the latent space follows a standard normal distribution.
Note: However, VAEs are a probabilistic model, take on auto-encoders, mapping
he image to a probabilistic distribution.
This gives VAEs the ability for image generation, although they produce blurry and
ess diverse results, and they can be resource-extensive for high-resolution images.
Input Data (x)

Encoder (NN) (Probabilistic Encoder)


(μ, σ²)

Sample z from Latent Space (Latent Vector)

Decoder (NN) (Probabilistic Encoder)

Reconstructed Data (x’)


Generative Adversarial Network (GAN)
Ns consist of two neural networks, the generator and the discriminator, which
ined together in a competitive process.
e generator creates fake data, while the discriminator attempts to disting
tween real and fake data.
eps:
nerator Input: The generator takes random noise z as input and produces fake d
z).
criminator Evaluation: The discriminator receives both real data x and genera
ta G(z) and outputs a probability D(x) indicating whether the data is real (1) or fake
ining: The generator aims to maximize the probability of the discrimin
sclassifying fake data, while the discriminator aims to minimize misclassification.
ates a feedback loop, improving both networks over time.
e two networks are trained simultaneously: the generator aims to fool
criminator, while the discriminator aims to become better at distinguishing real f
e data.
 Both the discriminator and generator learn the features of the dataset, but the
discriminator also learns to distinguish between the features.
 The generator then adds random noise to the image representations to
generate a new image.
 The generated image is sent to the discriminator, which identifies if the image is
fake or real, and gives guidance to the generator to modify the noise vector.
 The final step is when the discriminator is finally not able to distinguish
between the generated images and the training data.

Real Data
Real
Discriminator
Fake
Noise Vector Generator
(Z)
mplementation of GAN
Implementing Generative Adversarial Networks (GANs) is one of the
most critical developments in deep learning.
GANs take advantage of two deep learning concepts: adversaria
training and gradient descent with backpropagation.
Adversarial training refers to the idea that two neural networks can
learn how to compete with each other - one network acts as an
adversary who tries to fool the other network into thinking it's actua
data. In contrast, the other network tries to distinguish between rea
and fake data.
Gradient descent with backpropagation refers to adjusting weights to
minimize errors, which is what this system does when it learns how to
differentiate between fake and accurate data.
Generative Adversarial Network Frameworks
Several frameworks provide tools and libraries for implementing and training GANs
including:
TensorFlow: TensorFlow is an open-source machine learning framework developed b
Google. It provides various tools and libraries for implementing and training GANs
including the [Link]. You can use the GAN layer to build a GAN model in just a few
lines of code.
PyTorch: PyTorch is an open-source machine learning framework developed by Facebook
It provides tools and libraries for implementing and training GANs, including the
[Link] class, which you can use to build custom GAN models.
Keras: Keras is an open-source deep learning library that provides a high-level API fo
building and training deep learning models. It includes a GAN class that can quickly build
and train GANs.
Chainer: Chainer is an open-source deep-learning framework developed by Preferred
Networks. It provides tools and libraries for implementing and training GANs, includin
the [Link] and [Link]. Discriminator classes can be
used to build custom GAN models.
GANLab: GANLab is a web-based tool that allows users to experiment with GANs in
visual, interactive environment. It provides a simple, drag-and-drop interface for buildin
and training GANs without the need to write any code.
Sequential Data
Sequential data consists of information organized in a specific order
where the sequence is meaningful.
This type of data includes time series, text, audio, DNA, and music.
Analyzing sequential data often requires techniques such as time
series analysis and sequence modelling, using machine learning
models like Recurrent Neural Networks (RNNs) and Long Short-Term
Memory networks (LSTMs).
Why not Multi-Layer Perceptron?
Multilayer Perceptrons (MLPs) are designed to process fixed-size
inputs, treating each input as an independent data point withou
considering any sequential or time-based relationships.
Due to this limitation, MLPs cannot capture patterns that depend on
the order of the data, making them unsuitable for time series
analysis.
In contrast, Recurrent Neural Networks (RNNs) are specifically
designed to handle sequential information through their recurren
connections, making them a more suitable choice for tasks involving
time series data.
Recurrent Neural Network (RNN)
A Recurrent Neural Network (RNN) is a type of neural network
designed for processing sequential data.
RNNs can generate sequences based on learned patterns from the
training data.
It features loops that allow information to be retained across time
steps, making it effective at capturing temporal patterns.
This capability makes RNNs particularly useful for applications such as
time series forecasting, speech recognition, and natural language
processing.
Recurrent Neural Network Cont.…
The main and most important feature of RNN is its Hidden state
which remembers some information about a sequence.
The state is also referred to as Memory State since it remembers the
previous input to the network.
The memory of previous inputs, make them suitable for tasks like tex
generation or music composition.
It uses the same parameters for each input as it performs the same
task on all the inputs or hidden layers to produce the output.
This reduces the complexity of parameters, unlike other neura
networks.
More advanced variants, like Long Short-Term Memory (LSTM) and
Gated Recurrent Unit (GRU) networks, have been developed to
overcome the limitations of traditional RNNs, such as difficulty in
learning long-term dependencies.
Key Features of RNNs
Feedback loops
RNNs use feedback loops to process data, which allows information to
persist. This effect is often described as memory.
Sequential connections
RNNs establish sequential connections between nodes in a unidirectional
fashion. This allows previous outputs to be used as input for subsequent
nodes.
Mimics human data conversion
RNNs mimic how humans perform sequential data conversions. For
example, an RNN can be trained to translate text from one language to
another.
Computational power
RNNs have high computational power and can accurately represent
complex behaviors.
es of RNN Based on Cardinality (Input-output Numb
Different types of Recurrent Neural Networks (RNNs) can be categorize
based on input-output cardinality:
One-to-One (1:1): This is a standard feed forward neural network used
or non-sequential data.
Many-to-One (N:1): This type processes multiple inputs to produce a
single output, such as in sentiment analysis.
One-to-Many (1-N): This setup uses a single input to generate multiple
outputs, such as in image captioning.
Many-to-Many (N-N): This configuration handles multiple inputs and
produces multiple outputs, which is common in machine translation.
Many-to-Many (N-M): This flexible structure allows for varying sequen
engths in both inputs and outputs, useful in applications like video
analysis.
Types of RNNs (Based on Structure)
Long Short-Term Memory (LSTM): LSTMs are a type of RNN designed to
remember information for long periods.
They use special units called memory cells that can maintain information in
memory for long durations.
LSTMs are effective for tasks like time series prediction and natura
language processing.
Gated Recurrent Unit (GRU): GRUs are similar to LSTMs but with a simpler
structure.
They use gating mechanisms to control the flow of information, making
them faster to train and sometimes more efficient for certain tasks.
GRUs are often used in similar applications as LSTMs, such as speech
recognition and machine translation.
Character Prediction: This refers to RNNs used for predicting the next character in
a sequence.
These models are trained on text data and can generate text one character at a
time, making them useful for tasks like text generation and auto-completion.
Stacked RNNs: Stacked RNNs consist of multiple layers of RNNs stacked on top o
each other.
This architecture allows the model to learn more complex patterns by capturing
different levels of abstraction.
They are commonly used in tasks that require deep understanding, such a
language modelling and sequence-to-sequence tasks.
Bidirectional RNNs: These RNNs process sequences in both forward and backward
directions.
By having access to both past and future contexts, bidirectional RNNs can bette
understand the entire sequence.
They are particularly useful in tasks like speech recognition and text classification
where context is important.
ote: These various types of RNNs can be combined or adapted for specific use
ases, depending on the requirements of the task at hand.
Steps involved in Working of RNN
Input Sequence: The input is a sequence of data points (e.g., words in a
sentence).
RNN Cell Processing: Processes inputs in sequences, updating the hidden
state based on the current input and the previous hidden state.
At each time step t, the RNN takes an input xt and combines it with the
previous hidden state ht-1 to update its hidden state ht.
Output Generation: Produces an output based on the current hidden state
which can be used for tasks like predicting the next item in a sequence.
The updated hidden state ht is used to produce an output, which could be
the next item in the sequence.
Training: The model is trained to minimize the difference between the
predicted outputs and the actual data points in the sequence.
The architecture of an RNN

orthand notation often Unfolded notation for


used for RNNs RNNs

rices Wx, Wy, Wh — are the weights of the RNN architecture (Input, Output and Hidden Layers) whi
shared throughout the entire network.
A breakdown of the architecture
The green blocks are called hidden states.
The blue circles, defined by the vector a within each block, are
called hidden nodes or hidden units where the number of nodes i
decided by the hyper-parameter d.
Similar to activations in MLPs, think of each green block as an activation
function that acts on each blue node.
Vector h — is the output of the hidden state after the activation function
has been applied to the hidden nodes.
As you can see at time t, the architecture takes into account wha
happened at t-1 by including the h from the previous hidden state as wel
as the input x at time t.
This allows the network to account for information from previous input
that are sequentially behind the current input.
It is important to note that the zeroth vector(h0) will always start as a
vector of 0’s because the algorithm has no information preceding the firs
element in the sequence.
The hidden state at t=2, takes as input, the output from t-1
and x at t.
RNN Equations

These are the only three equations that we need.


The hidden nodes are a chain of the previous state’s output weighted by
weight matrix Wh and the input x weighted by the weight matrix Wx.
The tanh function is the activation function, symbolized by the green block
The output of the hidden state is the activation function applied to the hid
nodes.
To make a prediction, we take the output from the current hidden state
weight it by the weight matrix Wy with a soft max activation.
It’s also important to understand the dimensions of all the variables floating
around. In general for predicting a sequence:

Where
•k is the dimension of the input vector xᵢ
•d is the number of hidden nodes
How does RNN work?
The Recurrent Neural Network consists of multiple fixed activation
function units, one for each time step.
Each unit has an internal state which is called the hidden state of the
unit.
This hidden state signifies the past knowledge that the network
currently holds at a given time step.
This hidden state is updated at every time step to signify the change
in the knowledge of the network about the past.
The hidden state is updated using the following recurrence relation:-
The formula for calculating the current state:
ht =f(ht−1, xt)
where,
ht - current state
ht-1 - previous state
xt - input state
Formula for applying Activation function(tanh)
ht = tanh(whh ht-1 + wxh xt)
where,
whh - weight at recurrent neuron
wxh - weight at input neuron
The formula for calculating output:
Yt = Why ht
Where,
Yt - output
Why - weight at output layer
These parameters are updated using Backpropagation.
However, since RNN works on sequential data here we use an
updated backpropagation which is known as Backpropagation
through time.
Training through RNN
A single-time step of the input is provided to the network.
Then calculate its current state using a set of current input and the previous state
The current ht becomes ht-1 for the next time step.
One can go as many time steps according to the problem and join the information
from all the previous states.
Once all the time steps are completed the final current state is used to calculate
the output.
The output is then compared to the actual output i.e the target output and the
error is generated.
The error is then back-propagated to the network to update the weights and
hence the network (RNN) is trained using Backpropagation through time.
trivial example (a very simple and basic illustration of a
oncept)
ke the word “dogs,” where we want to train an RNN to predict the
er “s” given the letters “d”-“o”-“g”. The architecture above would lo
e the following:
To keep this example simple, we’ll use 3 hidden nodes in our RNN (d=3).
The dimensions for each of our variables are as follows:

ere k = 4, because our input x is a 4-dimensional one-hot vector for the lette
gs.”
orward Propagation
et’s see how a forward propagation would work at time t=1. First,
e have to calculate the hidden nodes a, then apply the
ctivation function to get h, and finally calculate the prediction.
t t=1
To make the example concrete, I’ve initialized random weights for
he matrices Wx, Wy, and Wh to provide an example with
numbers.

At t=1, our RNN would predict the letter “d” given the input “d”. This doesn’t
make sense, but that’s ok because we’ve used untrained random weights.
This was just to show the workflow of a forward pass in an RNN.
At t=2 and t=3, the workflow would be analogous except tha
the vector h from t-1 would no longer be a vector of 0’s, but a
vector of non-zeros based on the inputs before time t. (As a
reminder, the weight matrices Wx, Wh, and Wy remain the
same for t=1,2, and 3. )

It’s important to note that while the RNN can output a prediction
at every single time step, it isn’t necessary. If we were jus
interested in the letter after the input “dog” we could just take
the output at t=3 and ignore the others.
dvantages
 An RNN remembers each and every piece of information through time. It is usefu
in time series prediction only because of the feature to remember previous inputs
as well. This is called Long Short Term Memory.
 Recurrent neural networks are even used with convolutional layers to extend the
effective pixel neighborhood.
ome Common Problems Associated with RNN (Disadvantages)
 Exploding gradients: The algorithm assigns too much importance to the weights.
 Vanishing gradients: The values of a gradient are too small, causing the model to
learning or take too long.
 Complex training process: Processing data sequentially can make the training pr
tedious.
 Difficulty with long sequences: RNNs have a harder time remembering
information as the sequence gets longer. Either stop learning or take too long tim
Summary
VAEs focus on encoding data into a latent space and decoding it back
incorporating probabilistic approaches.
GANs involve two competing networks (generator and discriminator
to produce realistic data.
RNNs are specialized for handling sequential data, generating outputs
based on previous context.
These models are foundational in generative AI, each with unique
strengths suited for different applications.

You might also like