0% found this document useful (0 votes)
9 views35 pages

DL Decode

The document discusses Recurrent Neural Networks (RNNs), which are specialized neural networks designed for processing sequential data. RNNs utilize internal memory to remember historical inputs, allowing them to predict future scenarios based on previous information. The document also covers concepts such as unfolding computational graphs, advantages of RNNs, and methods for computing gradients using back-propagation through time.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views35 pages

DL Decode

The document discusses Recurrent Neural Networks (RNNs), which are specialized neural networks designed for processing sequential data. RNNs utilize internal memory to remember historical inputs, allowing them to predict future scenarios based on previous information. The document also covers concepts such as unfolding computational graphs, advantages of RNNs, and methods for computing gradients using back-propagation through time.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

wwwg

Unit lI
Deep L e a n i n g
3-2
Recurrent Neural Networks
Because the
se the definition of s at time

3
t
goes back to the identical
Recurrent Neural Networks finition at ime t -

1, equation (Q.2.1) is recurring.


The grap unfolded for a limited number of
can be
time steps tby
the definition t -1 times. For
usingthe
o
ywwwwIPININurun otorAutwwI ( V w e w w w w I O w w 9 9 .
nwvwwwwvarn example,
if we unfold equation
for T=3 time steps,
3.1: Basics of Recurrent Neural Networks (Q.2.1) we obtain
[Link]

Q.1 What is recurrent neural networks ?


wwwwwoet
S3)=f(s;0) .(Q2.2)
f
-

(f(s;0); 0) ...( Q2.3)


Ans. A class of neural networks for processing sequential da.
data is R Continually using the detinition in this manner to unfold the
known as recurrent neural networks (RNN).
nt1ation, an
expression that does not involve
recurrence has been
RNN are neural networks that processino are specialized for
a nroduced. n the
present, a
conventional directed acyclic
series of values x(1).., x (T), just like convolutional
networks are a computational network can
represent such an expression.
specialized for processing a grid of values X, Such as an image. Fig Q.2.1 shows the unfolded
RNNs
computational graph
of
are designed to recognize the sequential characteristics inn equations (Q2.1) and (Q.2.3).
data and use patterns to predict the next likely scenario.
Unlike
other neural networks, an RNN has an internal
memory that
enables it to remember historical input; this allows it to
make Fig. Q.2.1 A computational graph that has been unfurled serves as an
decisions by considering current input alongside learning from ilustration of the classical dynamical system given by equation (Q.2.1)
previous input.
The state at each node at time t is
a.2 Explain unfolding represented and the function f
computational graphs. translatesthe state at time t to the state at time t + 1. For each
Ans. time
The structure of a number of calculations, such as those step, the same parameters (i.e., the same value of used to
involved in mapping inputs and parameters to outputs and loss, parameterize f) are applied.
can be formalized using a
computational graph. The concept of As another example, let consider
unfolding a recursive or recurrent
us a
dynamical system driven by
computation into a
an external signal x
computational network with a repeated structure, often
corresponding to a series of occurrences, is explained in this s-f(s",x0) ...(Q.2.4)
section. The sharing of parameters across where we see that the state now contains information about the
is the outcome of
a
deep network structure
unfolding this
graph. whole past sequence.
Take the traditional form of a dynamical There are several methods for constructing recurrent neural
system, for instance
f(st; 0) ..Q2.1)
networks. Any function that involves recurrence may be seen as a
where s(tis called
the state of the
system. A Guide for Engineering Students
ICODES
-1)
3-3 Recurr Neural Networks Deep Learnlng 3-4 Recurrent Neural Networks
Deep
Leurntug

recurrent neural network,


much like practically any function
on can be
Can
Gimply by combining information from the input x into the state h
as a feedforward
neural network. is transmitted forward over time, this recurrent network
regarded d at
that

related equation is frequently ses information from the input x. Circuit schematic (left). AA
.Equation (Q.2.5) or a used
by processes

recurrent neural networks


to specify the values of its hidden unit
units.
onetime step delay is shown by the black square. (Right) The same

the variable h the state s network as a computational graph that has been unfolded, where
We now rewrite equation (Q.2.4) using as
to
netv

network's hidden units: each node de is now connected to specific time occurrence.
show that the state is the
a

h-fhh 0) List the advantages of unfolding process.


Q2.5)
Ans.: Advantages:
.Typlcal RNNs will include additional
architectural features, lik
output layers that read data
from the state h to make
predictions 1 Regardless of sequence length, learned model has same input
as shown in Fig. Q,2.2. size. Because it is specified in terms of transition from one state
to another state rather than specified in terms of a variable
The recurrent network often learns to utilize h as a type of loGe
ssy
of the previous sequence of length history of states
Nummary of the task-relevant elements
2. Possible to use same function f with same parameters at every
inputs up to t when it is trained to execute task that involves
a

forecasting the future from the past. Since it converts an arbitrary step.
length sequence (, x ,x) to a fixed length vector a4 Explain architecture of recurrent neural network.
h, this summary is inherently lossy. This summary may retain Ans.: Fig. Q.4.1 shows architecture of recurrent neural network.
some former sequence elements with greater precision than others Recurrent
Neurons
network
depending on the training criterion. For instance, it might not be
necessary to store all of the data in the input sequence up to time t,
only enough to predict the rest of the sentence if the RNN is used
in statistical language modelling, which typically predicts the next
word given previous words.
O
.The circumstance when we require h to be rich enough to allow
one to roughly recover the input sequence, like in autoencoder
Input layer Output layer (Ciasses
items, is the most challenging.
Fig. Q.4.1 Architecture of recurrent neural network

A recurrent neural network is a class of artificial neural networks


Unfold
that contain a network like series of nodes, each with a directed or
x+ 1 one-way connection to every other node. These nodes can be
classified as either input, output, or hidden. Input nodes receive
Flg. Q.2.2 An outputless recurrent network

OucODD A Guide for Engineering Students QECODE


A Guide for Engineering Students
Deep Learning 3-5 Recurrent Neural Networks
DeepLearning 3-6 Recurrent Neural Netrworks
hidden nodes modify the
data from outside of the network, in.
input (Q.2.1) and equation (Q.2.4). The parameters U, V, W, b and c, as
nodes provide the intended results.
data and output well as the series of nodes indexed by t for x, h, and Lare
RNNs can be used in the cases where, for example, next word
o
ll nodes in our
computational graph. Based on the gradient
place in a sentence can be predicted. Here, the informas
ation calculated at nodes that follow it in the graph, we must recursively
regarding the previous words is needed to predict the next word
d .n
compute the gradient VNL for each node N.
with the help of hidda.
the sentence. RNN can solve this issue de n
of the most important and unique featureees We start the recursion with the nodes immediately preceding the
layers, which is one
final loss
about the RNN. These hidden layers remember the previous
information to be used in the next layers. OL 1 .(Q.5.1)
recurrent neural network scans through the data from left to
The
RNNs powerful, because they combine In this derivation, we assume that the vector y of probabilities over
right. are very two

properties: the output is obtained by passing the outputs o as the argument


1. Distributed hidden state that allows them to store a lot of to the softmax function. Additionally, we presume that given the
information about the past efficiently. current input, the loss is the negative log-likelihood of the real goal

2. Non-linear dynamics that allows them to update their hidden


state in complicated ways. The gradient Vo"L on the outputs at time step t, for all i, t, is as
.With enough neurons and time, RNNs can compute anything that follows:
can be computed by your computer.
OL OL OL (t)
Q.5 How to compute the gradient in a recurrent neural ..(Q.5.2)
network?

Ans.: Itis simple to calculate the gradient using a recurrent neural .Starting at the conclusion of the series, we work our way
network. The unrolled computational network is simply subjected backward. At the final time step t, h only has o as a
to the generalized back-propagation method. No descendent, so its gradient is simple
particular
algorithms are required. The back-propagation through time
VhPL vvoL .Q.5.3)
(BPTT) technique applies back-propagation to the unrolled
graph .We can then iterate backwards in time to back-propagate gradients
Then, to train an RNN,
gradients generated from back-propagation
through time, from t = t - 1 down to t=1, noting that h (for t < t)
can be employed with any general-purpose gradient-based
approach. has as descendants both o and h
We give an example of how to compute gradients via BPTT for the Its gradient is thus given by,
aforementioned RNN equations in order to give the reader an

understanding of how the BPTT method functions ThHL- ) T(voL) ..(Q.5.4)


(Equation
6coo A Guide for Engineering Students
PECODE A Guide for Engineering Students
Deep Learning Recurrent Neural Networks 3 8
D e e pL e a r n i n g

Recurrent Neural Networks


= w (Vh-L) diag(1 -h)+v"(vo®L)
(Q.5.5) VULu
where diag (1 - (h indicates the diagonal matrix conta ..(Q5.11)
ing
the elements 1 (h, . This is the Jacobian of the
hyperbolic -
diag(1 -(h)(VhLxT (Q.5.12)
associated with the hidden unit i at time t+ 1.
tangent
eit has no parameters as ancestors in the
We may acquire the gradients on the parameter nodes once
we
Since

ofining the loss, We do not need to compute the gradient with


computational graph
defi.
have the gradients on the internal nodes of the computation
respect to x(t) for training.
network. We must be careful when designating calculus operations
ns
using these variables since they are
shared over several time sten 6
What is a
directed graphical model in RNN?
ution of a as.: Cross-entrop1es between training objectives y and outputs o
The bprop approach, which computes the contribut
single Ans.

edge in the computational graph to the gradient, is used in the re the losses
L in the
example recurrent network we have
the were

equations we want to employ. .2ated so far o . In theory, practically any loss may be used with
crea
. The calculus vW f operator, on the other hand, accounts for the a
recurrent network, much like with a feedforward network. The
contribution of W to the value off resulting from each edge in the iob should guide the loss selection.
computational graph. In order to clear up this uncertainty, we We often want to
interpret the output of the RNN as a probability
construct dummy variables W(t), which are duplicates of W that Aistribution, similar to how we would with a feedforward network
are only utilized at time step t. The weights' contribution to the and we typically use the cross-entropy of that distribution to
quantify the loss.
gradient at time step tis then shown by the symbol VW(t).
.The cross-entropy loss associated, for instance, with a feedforward
The gradient on the remaining parameters is represented by the
network and a unit Gaussian output distribution is called mean
following notation:
squared error.

VcL o-voL Q5.6)


. When we choose a training target for predictive log-likelihood, as
equation (Q2.4). Using the previous inputs, we train the RNN to

FbL - hL-diag(1 - H) Vh°L (05.7) estimate the conditional distribution of the following sequence
element, y(t).
.This may mean that we maximize the log-likelihood
wL-22 vo-2voLh()" (Q5.8)
logpy,.., (Q.6.1)
or, if the model includes connections from the output at one time
VWL - w .(Q.5.9) step to the next time step,

logp(y1x ), .., .(Q6.2)


-

diag(1 (h -

(Vh°L)H®T (Q.5.10)

OECODE A Guidefor Engineering Students


CECODE A Guide for Engineering Students
Deep Learning
3-9 Recurrent Neural Netwo 3-110
tworks Desp
ning
L e a r i n
Recurrent Neural Networks
O n e method to capture the tion acroe
whole joint distribution acroSs the
entire sequence is to decompose the joint probability
lity over h h5
values into a series of one-step
the
sequence fy
predictions. The directed graphical model does not contain
probabilisstic
in the present when
any
edges from any y" in the past to any y" do
we.

not feed previous y values as inputs


that condition the nev

of x values in this instancPstep


prediction. Given the sequence the
Fig. Q.6.2
The directed gran
outputs y are conditionally independent. phical Let'stake the scenario when the RNN models
values to
merely a series of
model has edges from all previous y the
present y) slar random variables Y = ly"",.., y') with no extra inputs x as
value when we feed the network the real y values (not # sC

eir hasic illustration. Simply expressed, the output at time step t1 is


prediction, but the actual observed or created values) back. b
dhe input at time stept. The directed graphical model over the y
.Fig. Q6.1 A fully connected graphical model for a sequence
of variables is thus defined by the RNN. Using the chain rule for
values y, y, .. yShows thateach prior observation eonditional probabilities, we parameterize the joint distribution of
may have an impact on the conditional distribution of some 6 . theseobservations:
values from before. It may be
(tor
t>I given the exceedingly wasteful
to parameterize the graphical model directly using this graph (as in
PO) P(y.y)
Equation 6), as there are an increasing number of inputs and
parameters for each member of the series. As seen in Fig. 062
-[IPg
t1
",y3,. 6.3)
RNNs achieve the same complete connectivity but with eficient naturally, for t =1, the
Thus, right-hand side of the bar is empty.
parameterization. Because of this, the negative log-likelihood ofa set of values y,.

and yin this model is

- - - - -. L 20
L(t) = - log P(y
Q.6.4)

Where

-yly,),, .(Q.6.5)
a.7 Explain advantages and disadvantages of RNN.
Ans.: Advantages
a) RNN can process inputs of any length.
b) RNN model is modeled to remember each information
Fig. Q.6.1 throughout the time which is very helpful in any time series
predictor.
OFCODB A Guide for Engineering Students
OECODE A Guide for Engineering Students
Deep Learning 3- 11 Recurrent Neural Networ Deep Learming
3-12 Recurrent Neural Networks

c)Even if the input size is larger, the model size does in oFks
not
incre
3.2 Types of Recurrent Neural Networks
d) The weights can be shared across the time steps.

e) RNN can use their internal memory for processing the arbit.. 40 Explain types of recurrent neural networks.
series of inputs which is not the case with feedforward . trary
ard neural
Q.10

1. One-to-one: This neural network


Ans. is used for fixed sized
networks. input to fixed sized output for example image classification.

This was formerly known as Vanilla RNN, usually


Disadvantages
characterized by a single variety of input, such as a word or
to its recurrent nature, the computation is slow.
a) Due image. At the same time, the outputs are produced as a single
b) Training of RNN models can be difficult.
token value. All traditional neural networks fall into this
c) Prone to problems such as exploding and gradient vanishin

RNN is called as recurrent ?


ing category.
Q.8 Why 2, One-to-many : A single input is used to create multiple
Ans. RNNs are called recurrent because they perform the same task outputs. A popular application for one to many is music
sk for
every element of a sequence. with the output being dependent on the previou. generation.
vious
computations and user already know that they have a "memory" whicl
3. Many-to-one: Consists of several inputs that used to create a
far.
captures information about what has been calculated so
single output. An example is sentiment analysis. Input is a
Q.9 What is difference between RNN and CNN ? movie's review (multiple words in input) and output is
Ans. sentiment associated with fhe review.

Sr. RNN CNN 4. Many-to-many: Several inputs are used for generating several
No. outputs. Name entity recognition is a famous example of this

1. RNN is applicable for time CNN is applicable for sparse category.


series and sequential data. data like images. .Fig. Q.10.1 shows types of RNN.
CNN has finite inputs and

-0- - 0
2. RNN can have no restriction in
length of inputs and outputs. finite outputs.
yo

3. RNN is primarily used for CNN can be used for video and (a) One-to-one
speech and text analysis. image processing.
RNN works on loops to handle CNN has a feedforward
sequential data. network. y2

5. While training the model, CNN While training the model, RNN
(b) One-to-many
uses a
simple back-propagation. uses back-propagation through
time to calcuiate the loss.

OECOD A Guide for Engineering Students A Guide for Engineering Students


OECODE
Recw Neural Networt 3.14 Recumemt Veunal Networks
kepLeaming

( Manytbome
) Many-to-many
RecurentNeuralNetwork (b) Feed-forward Neural Network
Fg 0101
Fig O-111
between RNNk and fed-forward neural
011 Whatis diference
nethworks? 33:longShort-Term Nemory Network
Ans.: In a fed-forward
neural network, the intormation

moves in one diretion-


trom the input layer, throughthehidi
adden Eplain nemoryless models for sequences
intormation moves straight throok :Autoregressive models:Predictthenext term in asequence
aTers to he output ayer. Ihe AnS

from ahxed numder ot previous terms using "delaytaps


the network.

Fed-forward neura networks have no memory of the input theey


reeiveand are bad at predicing what 1s comung next. Betause npua-)nut-1)
fed-forward network only considers the curent nput, it has no
Fig Q121
nodon of orderin [Link] s1mpy can not remember anyhing about
2. Feed-torward neural nets: Ihese generalized autoregressve
what hapened nthe pas exceptis training,
models by using one or more lavers of non-linear hiddenunis
Ina RNN the intormation cycles through a loop. When it makes a
OIdden
decison,it considers the current nput and aso whatit has leamed
tom the nputsit reeived presinualy.
ngut-2 npeat-1

0122

Q1 Whatare the najorobstaclesofRNNs?


Ans: RANS kace two types ofchalenges: Eiploding gradient and
vanishing gradient.

Agradient is used to measure the changes in the output of a


hunction when the inputs are sighty modited. I you consider
ORCODE
AGuidk or Engherimg Sadenit OICODE
AGuie for Enginering Snakens
Deg Learming
3- 15 Recurrent Neural orks 3-16 Recurremt Neural Networks
D e pL e a r n n g

of a function, then a higher gradient siont.


gradient as the slope nifies ides whether an input should be stored or erased depending
a steeper slope. the importance of the information through weights.

.This helps a model to learn faster. Similarly, if the slope is upor

Over time, the algorithm can understand the importance of the


then the model will stop the learning process.
A gradient indic. mation more precisely. The gates of an LSTM are divided as
i n f o r m a u

the change in error.


change in weights with regards
to
forget and the output gate.
inputgate,
when user will encounte
ploding Gradient: This is a scenario Output
an algorithm that has assigned extremely high value to
weights
Vanishing Gradient: The second challenge is vanishing gradiens
nt
ocurs when the values assigned are too small. This causes
the
computational model to stop learning or more processing time to
to Self-ioop
produce a result. This problem has been tackled in recent tim
times
with the introduction of the concept of LSTM.
State
Q14 How can you overcome the challenges of vanishing and

explodinggradience?
Ans.: Vanishing gradience can be overcome with Relu activatian
function, LSTM, GRU.

Exploding gradience can be overcome with Truncated BTT, Cin


gradience to threshold and RMSprop to adjust learning rate.
Input Input gate Forget gate Output gate
a.15 Explain long short-term memory.
Ans. Long short-term memory (LSTM) is
responsible for memory
extension. LSTM forms the building units for the
layers of an RNN.
The purpose of LSTM is to enable RNNs to memorize inputs for an
extended period. Fig. Q.15.1
Fig. Q.15.1 shows block diagram of LSTM. (See Fig. Q.15.1 on next The LSTM cell manipulates input information with three gates.
page) a) Input gate controls the intake of new information
Due to the existence of memory, LSTM has the
possibility of b) Forget gate deternmines what part of the cell state to be
reading, writing and deleting information from its memory, much updated. It is decided by the sigmoid function.
like your
personal computers. The gated cell in an LSTM network c)Output gate determines what part ofthe cell state to output.

OECODE 4 Guide for


Engineering Students DECODE A Guide for Engineering Students
Recurrent Neural Nerw - 18 Recurremm Veural Networks
Deg leaming p eL e a r m i n g

Encoder
3.4 Encoder Decoder Architectures

16 Draw and explain encoder decoder architectures.

An RNN mav be trained to translate input sequences


Ans: ces nto
output sequences that a r e not
alwavs theThis oe
same length.
in manv applications where the input and output sequences i
in the
training set are tpically not the same length. such as
speech
sn

recognition machine transiation o r question answering.


The R A s input is frequently reterred to as the "conte
xt."
presentation of this ontert C. is what ve aim to create A
the
ttine A vator or senes of vectors called C may be use
to Decoder
dense the input equenae \= ( , . ,
ncept is nelatively straighttonward : (1) The input sequenge is
a e s a i br an e n i e r . reader, or input RNN. The context C

simply a tunction of the enaader's final oncealed


nerally state
tate
To priue the cutput sequence Y - ) , a decoder
wer or output R\X is onditioned on the fived-length vector.
F i g 1 e i An enampie of an enanier demder or sequence-to
«quene RNN architeture that can leam to produce an output
ene ( trom an input sequence " , .(n
t s made up ot a demder RNN that creates the output Fig. Q.16.1
sequence
and an enuder RNN that reads the input sequence. The
decoder
RN The decoder RNN is just a vector-to-sequence RNN. A vector-to-
input of the typically tived-size contert variable
e i v e s an .

Cwhich represents a semantic summary sequence RNN can accept input in at least twvo ditterent ways. The
of the input sequence the RNN's be linked
trom the encoder RNN's final hidden state. input may be given as starting state o r it mav
[Reter Fig. Q.16.1 on
nert page] to the hidden conmponents at each time step. Both of these
To maimize the average approaches c a n be combined.
oflog P(y.. yyIx,, x) across
all the pairings of x and y There is no need that the hidden layer size of the encoder and
sequences in the training set, the two
RNNs in a
sequence-to-sequence
architecture are trained decoder be the same.
simultaneously. The input sequence that is sent as input to the .When the context C generated by the encoder RNN has a size that
decoder RNN is generally represented by the final state h, of the is too tiny to adequately describe a lengthy sequence. this design
encoder RNN.
clearly has a constraint. As oPposed to being a tixed-size vector,

ercoo A Gulde Jor Engineering Studens OIcODD


A Guide for Engineering Suden
Rrumem Neunal Networks
J-20
sean

making(
a
vanatbhe-length
eqvence
ther
sugestd

included
an
attenton
mechanusm

tothose
metaisepiennel
that can ho

intheeouth
utrut
oftheC
syuene
vmponents

35:RecursiveNeuralNWetworks

017 Write
shornote on

Ans: Another
recusiveneural

generalization
networks

networks
otreCurent
Which
etworks is tepres
have a dia
00
reursie neural
networks, ent
br treelkestructuro
with a deep
Computathonalraph

chainlie stuactune
of RANs ig Q71 depick
netvwork.
tor a recursve
Computinggraph
Recursive nethvorks have been eitectvely used in computer vi

natural language processing,


and proces1ng dataa sstruciutes

input to neural nets.

Retursive ne's hareanumtberotdistindtadvantages over TeCuen


e indudingtheatiy th subsantaly lower the depth
depth deline
the numberot composhons ot nonlinear operations) tom

Olog 7for a senies of the same length, which may be helnkt.

handing longpterm dependencies Abalanced binary toe :


evample of atestructure that is independent of the data,

The computatonal graph of a ecursive nehvork is agenerajia


of the recurent retwork trom a chan to a
tree. Afixedgin
0000 Fig. 217.1

representation the output o) with afxed:set of parameers can te Ih su aplicaton thelds,


In some
outside approaches can oter
Ceated from a vaniblesize sequence (X while
reCcommendations tor the best tree structure. For instance,
matnices U, V, W), The image showS a scenario of supervisel processing sentnces in natural language, the recursive network's
keaming where a taget yis given and is conected to the entie te structure can be adjusted to match the
sentence's parse tree as
Sequence. situahon would
supplied by the nahural language parser. The ideal
TT

the tree
be for the learner to independenthy identiíy and inter
structure that is optimal tor every given imput.

OECODY AGuidefor Engimeering SMudents


AGuide for (DICODI)
Emgmering Sukad
3-21 Recurrent Neural No Unit IV
Deep Learning
has a wide of concei.
range of
etworky
conceivab,
The recursive net concept
are cific nodes
linked to specific nodes of th

4
variations. The inputs and targets
with a tree structure. Everv
very node Autoencoders
tree and the data is associated
be the conventional artificial
computation need
When
not

concepts are represented by


neuron
contin.
computation.
and bilinear forms,
muouywhich
vectors, apply tensor operations have 4.1:UnderComplete Autoencoders
be beneficial for modelling interans
previously been proven to
ctions autoencoder ?
between concepts. a4 What is
An
Ans. : A autoencoder is a special type of neural network that is

trained to copy its input to its output.


END.. Cor example, given an
image of a handwritten digit, an
a1ttoencoder first encodes the image into a lower dimensional

latent representation, then decodes the latent representation back


to an image.

An der learns to compress the data while minimizing the


autoencode

reconstruction erro. An autoencoder is unsupervised learning

It is a artificial neural network used to learn data


technique.
encodings of unlabeled data or the task of representation learning

of Autoencoder.
Q.2 Explain properties
Ans.:Data-specitic: Autoencoders are only able to meaningfully

compress data similar to what they have been trained on. Since

they learn features specific for the given training data, they are

different than a standard data compression algorithm like gzip.

. Lossy: The output of the autoencoder will not be exactly the same
as the input, it will be a close but degraded representation.
Unsupervised : Autoencoders are considered an unsupervised
learning technique since they don't need explicit labels to train on.
a.3 Explain architecture of Autoencoder.
Ans.: Fig. Q.3.1 shows architecture of Autoencoder. (See Fig Q3.1
on next page.)

Autoencoders are a specific type of feedforward neural networks


trained to copy its A bottleneck is imposed in the
input to output.
OICOD A Guide for Engineering Students

(4-1)
Deep Learning
compressed knowledge of the
Autocncoden Deep Learulng 4-3 Autoencoders
rigin Which hyperparameters must be set before training the
.
a
network to represent
into a lower-dimensional cod.a Autoencoders ?
input. The input is compressed le
this representation. The
is reconstructed from are four hyperparameters that rnust be set before training
then the output co de
Ans.: There

is also called as representation which is aa


latent-space
compa« I
hey a s follows.
are
the autoencoders.
"summary" or "compression" of the input. 1. Code size : I t is the num ber of nodes in the middle layer
Input Output Smaller the size more is the compression.

Number of layers: The autoencoder can be as deep as we like


Code
without considering the input and output.
3. Number of nodes per layer The number of nodes per layer
decreases with each subsequent layer of the encoder, and
increases back in the decoder. Also the layer structure of
decoder is symmetric to the encoder.

4. Loss function: Mean squared error or binary cross-entropy can


be used as loss function. Cross-entropy is used if the input
values are in the range [0, 1] else mean squared error is used.

Enccdeer Decoder
Fig. Q.3.1 Architecture of autoencoder a.5 List the types of autoencoder.
Ans.: The different types of autoencoders are as follows
A n autoencoder consists components: encoder, code and
of 3 1. Undercomplete autoencoders
decoder. The encoder compresses the input and produces the code,
the decoder then reconstructs the input only using this code. 2. Sparse autoencoders

A s autoencoder is a special case of feedforward networks, trainino 3. Contractive autoencoders


techniques similar to feedforward neural network such as 4. Denoising autoencoders
minibatch gradient descent following gradients computed by back 5. Variational autoencoders
propagation can be used for training.
B o t h the encoder and decoder are fully-connected feedforward 4.2 Regularized Autoencoders
neural networks. Code is a single layer of a n ANN with the
dimensionality of user choice. The number of nodes in the code Q.6 Explain Sparse Autoencoder with its advantages and
layer is a hyperparameter that we set before training the disadvantages.
autoencoder. Ans.:Sparse autoencoders have hidden nodes greater than input
An autoerncoder learns to copy its nodes. They can still discover important features from the data. A
inputs to its outputs under some
constraints: for generic sparse autoencoder is visualized where the obscurity of a
example, limiting the dimensionality of the latent
node corresponds with the level of activation.
space, or
adding noise to the inputs.

OrcODD A Guide for Engineering Students OIcODES A Guide for Engineering Students
p Learnin
Autoemcodes 4utoencoders
Fi 1 shows simple singie-laver sparse auto encoder with Q7
Discuss abou
Denoising Autoencoders.
equal numbers of inputs ( . outputs (that) and hidden nodes (h)
Ans.: Auto
utoencoder can learn useful representations by
changing
onstruction error term ot the cost function rather than
ing the

adding a

penalty Q to the cost function.


Ta contrast to traditional autoencoder that
mirumize some loss
Anction as given
equation (Q7.1) the denoising autoencoder
in

DAE) minimizes the loss function


given by equation (Q7.2)
[Link])
Q7.1)
L is a loss function that penalizz gfx)) for being dissimilar
from x, e.g. L norm of their difrence
Lx. gATM)
Q72)
Q6.1 Single-layer sparse auto encoder is a
corrupted copy of x by adding some form of roise.
Fig.
constraint is introduced on the hidden layer. This is to
This helps to avoid the autoencoders to
Sparsity copy the input to the
prevent output layer copy input
data. output without learming features about the data
additional terms in the loss function
Sparsity may be obtained by Denoising autoencoders thus provide yet another example of how
either by comparing the probability
during the training process, useful properties can emerge as
distribution of the hidden unit activations
with some low desired byproduct of minimizing a

reconstruction error.
all but the strongest hidden unit
value, or by manually zeroing .They are also an
example of how overcomplete, high-capacity
activations.
models may be used as
autoencoders as
long as care is taken to
Advantages value close to
prevent them from learning the identity function.
1. Sparse autoencoders have a sparsity penalty, a
the
Sparsity penalty is applied
on
zero but not exactiy zero.
4.3: Stochastic Encoders and
in addition to the reconstruction
error. This Decoders
hidden layer
prevents overfitting a.8 Write short note on Stochastic Encoders and Decoders.
activation values in the hidden layer and
2. They take the highest Ans. Autoencoders are feedforward networks and use the same
zero out the rest of the hidden nodes. loss functions and output unit that are used in traditional
Disadvantages feedforward networks
1. For it to be working, it's essential that the individual nodes of a For designing the output units and the loss function of a
trained model which activate are data dependent, and that feedforward network, output distribution p(y I x) is defined
an

different inputs will result in activations of different nodes and the negative log-likelihood -log p(y I x) is minimized where y
through the network. is a vector of targets, e.g. class labels.

A Gulde for Engineerlng Studens PIcoDD A Gulde for Engineerlng Students


4-6 Autoencoders 4-7 Autoencoders
DeepLearming DegLearing

as well as the input is x and still thhe decoder


autoencoder, target a stochastic
But in an
same strategy as
and
be applied. So by using in h) Pmodej(x h) Q82)
same strategv can

given code h,
Pdecoder
can assume that for a
feedforward network, we
Pdecoder(xIh the encoder and decoder distributions need not
conditional distribution I n general,
decoder is providing a
unique joint
necessarily onditional distributions compatible with
e c e s s a r i l y cond a

by minimizing -lo8 Pdecqder(x


I hi
then be trained
n e c e

Autoencoder can distribution Pmodex, h).


function depends on the form oo

where the exact form of the loss


4.4: Denoising Autoencoders
Pdecoder
Similar to traditional
feedforward networks, linear output units are
Gaussian distribution for real o9 Define Denoising Autoencoders.
of a
parameterize the
mean
used to autoencoders stochastic of standard
a mean squared error Ans. : Denoising are a version

valued The negative log-likelihood yields stoencoders that reduces the risk of learning the identity function
Binary values correspond to a
Bernoull auto

get around this risk of identity


x
criterion in this case.
autoencoders attempt to
Denoising
parameters are given by
a sigmoid output unit
noise, ie. randomly corrupting
distribution whose
Gnction affiliation by introducing
distribution and so on. the
discrete x values correspond to a softmax input so
that the autoencoder must then "denoise" or reconstruct

Given h. the output variables


conditionallv
are treated as
original input.
that evaluation of probability distribution is of denoising autoencoder?
.10 What is
use
independent so

inexpensive. For modeling outputs with correlations, mixture Ans.: Denoisingautoencoderhelps:


be used. robust filters
density outputs can
a) The hidden layers of the autoencoder leam more
be generalized to
encoding an
Encodirng function f(x) can autoencoder
b) Reduce the risk of overfitting in the
distribution Pencoderh
I
x), as shown in Fig. Q8.1. Fig. Q.8.1 shows
c) Prevent the autoencoder from learning
a
simple identiry
the structure of stochastic autoencoder. Here
both encoder and
function
decoder involve some noise injection. The output of encoder and
decoder carn be seen as sampled from a distribution, pPencoder(h I x) a.11 Explain Denoising Autoencoders.
Ans. Fig Q.11.1 shows denoising autoencoder.
for encoder and Pdecoder x I h) for the decoder.

Veasure
Add noise to the reconstructon
Feed
Pencser th | Y) Paecoder (| h) input imag corrupted 0ss aganst

Input into nginal imag


autoencodor

Fig. Q.8.1 The structure of a stochastic autoencoder

Any latent variable model Pmodei(h, x) defines a stochastic encoder

PencoderhX) Pmodelh I x) .(Q.8.1) Fig. Q.11.1 Denoising autoencoder

pIcoo A Gulde for Engneerng Studens A Guide for Engineering Studens


QICOD
DeepLearning
4-8
Autoencoders 4-9
autoencoder (DAE)
receives a pted data point
corrupted dats D e e pL e a r n i n g
Autoencoders
The denoising
original, data
It is trained to predict the
uncorrupted (x) Ec log Pdecoder(x Ih-fx) ..(Q11.1)
as input. Pdata

as the output.
where Pdata (X) is the training distribution
Fig. Q.11.2 illustrates the training procedure of DAE. A
It represents a
corruption
corri.

conditi
C( Ix) is introduced. onal
process
a data sample x.
4.5:Contractive Autoencoders
x, given
distribution over corrupted samples
Q.12 Write
ite she short note on Contractive Autoencoders.
The main goal
of Contractive Autoencoder (CAE) is to have a
:
ns.
Ans.
bust learned representation that is less sensitive to small
in the data.
C x) variation

A penalty term is applied to the loss function so as to make the


representation robust.
graph of the cost function for a denoisi
Fig. Q.11.2 The computational sing order to
make the derivatives of f to be as small as possible,
ma
autoencoder
.In
ntractive autoencoder introduces an explicit regularizer on the
The autoencoder then learns a reconstruction distribution
code h f(x)

Preconstructx X) estimated from training pairs (x,X) as follows:


The penalty term is Frobenius norm of the Jacobian matrix which
from the training data. with respect to input for the hidden layer. Frobenius
1. Sample a training example x is calculated

2. Samplea corrupted version from C( I x =x). nOrm of the Jacobian


matrix is the sum of square of all elements.

This is shown
in Fig. Q.12.1.
3. Use (x) as a training example for estimating the autoencoder
reconstruction distribution preconstruct(X x) = Pdecoderx h}) with
L Ix-g(fx))l +211J(x)||E
h the output of encoder f() and Pdecoder typically defined by a
Oh
decoder glh).
Gradient-based approximate minimization (such as minibatch
gradient descent) can be performed on the negative log-likelihood Fig.Q.12.1 Loss function with penalty term Frobenius -
norm of the
-log Pdecoder(x h). Jacobian matrix

As long as the encoder is deterministic, the .Contractive autoencoder is similar to denoising autoencoder in a
denoising autoencoder
is feedforward network. So it can be trained using
a sense that in presence of small Gaussian noise the denoising
exactly the
same
techniques as that of any other feedforward network. DAE reconstruction error is equivalent to a contractive penalty on the
performs stochastic gradient descent on the following expectation:- reconstruction function that maps x to r =
g(f(x).
ECODE> A Guide for
Engineering Students A Guide for Engineering Students
OrCODE
4utoencaden
Deep Learning
case of d e n o i s i n g
a u t o e n c o d e r s the rec Autoen4,
onstru
Der

Thes
Learni

alues
( 0 1 2 . 2 ) .w h e r e
can be calculated using equation ( 12
1 2nd

This means, in
perturbations of
but finite
sized
the
function resist small
the teature extraction in m Observations

while in
contractive
autoencoders

of t h e input.
funct : Hidden layer rodes, and
infinitesimal perturbations
resist
can be defined by equation (O12.3)
obtained by reEUlarizing autoencod. function
CAE s u r p a s s e s resuits Loss
CAE is aa better a s ce
better
weight decay or by denoising. compare IIA O121
learn u s e f u l
feature extraction.
autoencoder to
denoising
to learn an encoding where simila
The model is encouraged
So the mode is forced to 1earn P
have similar encodings. how
neighborhood of inputs into a smaller neighborh
contract a
Q122)
outputs.
derivative of the reconstructes
indicates how the
Fig Q.12.2
(ie siope) is essentially zero for local neighborhoods of input da
datz
This can be ahieved by penalizing the instances where a L 2 .(Q123
in the
change in the input leads t o a
large change encoding
spa
For this the loss term should penalizes large derivatives of i
instances.
hidd where

input training Gradient field activations w r L the input


ayer activations w r . t the
Train1ng r e r z t e x) :
of hidden layer
earned reonvn all i training examples
functic summed over

Linear dertty ncin


(pefect recori 4.6 Applications of Autoencoders
feature of
Q13 Explain how the dimensionality reduction
autoencoder is useful in information retrieval t a s k
Infornmation retrieval is the task of searching the e n t r i e s in a
Ans. : .

Fia Q122 Siope of the recostructed data database that resemble query a
entr
Dimensionality reduction benefits the tasik of information retrieval
Pelarizztin oss term used is the squared Froberius n o r m lAl; become
In case of certain type of iow dimensional data search can

h e Jacian natrix J foz the hidden ayer activatíons wr.t e more effiient dimensionality reduction A s o n e
due to of the
input vservations. A Frobeniu5 n o r m is a n L n o r m for a matria be
application of a u t o e n c o d e r is dimensionality reduction. they
can

The Jacvizn natriz represents all first-order applied to information retrieval using semantic hashins
partial derivatives
a veco vaied function
4 Guide f o r E n z i n e e r i n z S u d e n t s

Guide fn Enzineering Student


Deep Learning Treduction algorithm 4-13 Autoecoders
MyLennh

dimensionality

the entrie
Brtraining code, äll
database
, B u t
toencoder cannot pertorm reconstruction task for imag
autoencoder

dumensional andbinary to enties


code
AYs
es h is not present in the training dataset because latent atributes
tabie thatmaps
binary which
a hash not be
be adap
adapted by network for any unseen image. As a result
retrievalcan bo
e
Using this
hash table
database
intormaton

entres having same binan


ary
peCodertorme will

the
not

losto
COnstruction loss
reconstruction for such image will be very high. 5o by
all appropriate threshold it can be easily identified as anomaly
retuming
bits trom encoding
ofthe
he uery, s
some
fipping
guery. Br efficienth orunusualimage

entries can
also be searched very ly
less similar
retrieval is suitable for t Qo to this autoencoders are good at powering anomaly detecthon
This approach ofintormation
sYstems

and image data.

huncthion vwith sigmonds is used Sed on ina 2 magedenoising

Iypically encoding to obtain the proper intormaton


tor producing biary
codes tor semantic hashi
hashing, The
Ihe sigmojd.
s tmage
Image
denoising I5 pertormed
content 0t an image. Denoising autoencoders can be used
must be saturated to nearly
Uor nearty addiftve noiy
1. For this additi. out tthe
about

forthis
injected betore siçmoid
functhon during training and its m aits magnith
Nonojsing autoencoders do not search for noise in image, instead
should increase over time.
intormation as possible they extract
tne image from noisy data fed as input to them by
much aa.
So in order to preserve as formed
learming their representahons. Ihen the noise free image
is

noise, the network must increase magmtude thein


the of
einputs by decompresing
these representations
the sigmoid function, untl saturation occurs.
accurate
hwo application of Autoencoders.
Denoising autoencoders can pertorm efficient and highly
Q14 Explain any
image denoising and can be used tor denoising complex images
Ans.:1. Anomalydetection that could not be denoised using traditional methods
Aneural network trained with specitic dataset learns the .
data END...&
commonly'sees and represenis the input aataset. Ihis netho
netwon
can also represent the ditierence between inmput and output and
and t
US when it 'ses unusual data

Autoencoders can be used in such systems where it is dificult


describe unusual or anomalous data. Undercomplete autoencote
ncoder
are used for anomaly detection.
lf autoencoder is trained on specific
image dataset say "D'", theni
is supposed to reconstruct the image as it is with ov
reconstructhon loss.

OECODE AGuide for


Enginering Sndend AGuide for Engimeering Students
OECODE
Unlt V -2 Representatlon Learnin
D e pL e a m

.iced
Unsupervised feature learning algorithm L, which takes a training

Representation Learning s e t
mples and returns an encoder
o fe x a m p l e s
or feature functionf The X

5 is
raw
input
data

F-Identify function
X- X
Unsupervised Pre-trainine
ng
Wise
5.1:GreedyLayer fork 1 . . . m do

Learning8? f-L(X)
Q.1 Whatis Representation concerned With training me
training machiine
Ans.:Representation
learning is f - f of
representations.
useful
to learn
learning algorithms
representationlea
Xf(KX)
Deep neural networks
can be considered

information which is
learning
projected into a
end for
models that typically encode
are then usually na
if fine-tuning then
different subspace. These representations ssed
on linear classifier to, for
to a
instance, train a classifier. fT(f X Y)
end if
Representation learning can be
divided into:
Learning representatin Return f
a) Supervised representation learning: tions
annotated data and used to solve task B each of the solution independently, on
on task A using
Gready
Optimize piece
b) Unsupervised representation learning Learnin
Learning a time.
piece at
representations on a task in an unsupervised way. These aare the layer of the network.
, Layer-Wise: The independent pieces are

then used to address downstream tasks and reducing the eed


once layer at a time, training the k" layer while
tasks. Training proceeds
for annotated data when learning news
ones fixed.
keeping the previous
.2 What is Greedy Algorithm? is trained with an unsupervised
.Unsupervised Each layer
Ans.: Greedy algorithms break a problem into many components
representation learning algorithm
then solve for the optimal version of each component in isolation. ideas work?
Unfortunately, combining the individually optimal components is not Q.4 When and why does unsupervised pretraining
Ans.
guaranteed to yield an optimal complete solution.
for a deep NN can have a
Q.3 Write and 1. Ideal :Choice of initial parameters
explain an algorithm for Gready Layer-Wise effect on the model.
significant regularizing
Unsupervised Pretraining.
[Link] Layer Wise Unsupervised Pretraining relies on
-

I t remains possible that pretraining initializes the model in a


single-layer representation learning algorithm. Each layer is location that would otherwise be inaccessible. For example : a
pretrained using unsupervised learning, taking the output of region surrounded by areas where the cost function varies so

previous layer and producing as output a new


the data, whose distribution is
representation of
hopefully simpler. OECODE>
A Guide for Engineering Studenis

(5-1)
Representation Len
Deep Learning
to another
that mini-batches oi. eurnin Deep Learning S-4
Representation Learning
much from one example simultaneously, instead of using the pretraining strategy, there
of the gradient
a region irrounded by area
surr

very noisy
estimate is a single hyperparameter, usually a coefficient attatched to the
matrix is so pooly conditioned ed that
that gradien
where the Hessian
small steps.
unsupervised cost, that
determine hows strongly supervised
descent methods must
use very objective will regularize the supervised model.
what aspects
characterize the exactly of
We cannot 2. Two separate training phases has its own hyperparameters. Ihe
during the
pretrained

training stages.
parameters are
retained

pervise
sun.
performance of the second phase cannot be predicted during
the first phase, there is a long delay between proposing
so

input distribution can heln


2. ldea2: Learning about the hyperparameters for the first phase and being able to update
to output.
wih
learning about the mapping from input them using feedback from the second phase. Most principled
Some features that are useful for the unsupervised
task ma validation
nay als approach: use
supervised phase
set error in the to
be useful for supervised task. This is not yet understood
select hyperparameters of the pretraining phase.
mathematical, theoretical level. Many aspects ot this apPpro. ata
models used.
roach
5.2 Transfer
are highly dependent on the specific Learning and Domain Adaption
For example, if we wish to add a linear classifier on to
top a.7 Define Transfer learning and domain adaptation.
pretrained features, the features must make the underls

lasses linearly separable. This is another reason ths


erlying Ans. Transfer learning and domain adaptation refer to situation
where what has been learned in one setting is exploited to improve
simultaneous supervised and unsupervised learning can h
generalization in another settings.
preferable.
a.8 What is transfer learning ? Explain its types.
Q.5 Why it is called Greedy layer-wise pretraining?
s. : Transter learning, multitask learning and domain adaptation
Ans.:Greedy because,itis a greedy algorithm that optimizes earh
can be achieved via representation learning when there exist
piece of the solution independently
features that are useful for different settings or tasks,
.Layer-wise because, independent pieces are the layers of the than
network and training proceeds one layer at a time. corresponding to underlying factors that appear in more one

setting.
Pretraining because, it is only a first step before applying a joint
Two extreme form of transfer learning:
training algorithm is applied to fine-tune all layers together.
1. One-shot learning: Only one example of transfer task is given
Q.6 Explain
disadvantage of Unsupervised Pretraining. for one-shot learning. It is possible because the representation
Ans.
learns cleanly separate the underlying classes during tirst stage
1.
Unsupervsied pretraining does not offer a clear way to adjust During the transfer learning stage, only one labeled example is
the strength of the regularization arising from the needed to infer the label of many possible test examples that al
unsupervised
stage. When we perform unsupervised and cluster around the same point in representation space.
supervised learning
OFCODE A Guide for Engineering Students
A Guide for Engineering Students IcoDD
Rgre Le
bee 5.4 Variants of CNN : DenseNet
ariciei urrN a r

S.3 Distiduted Resresentation Write short note on Dense Block

Ans
Standard Conv Net uses several comvolu ns to xtract high
ui rgesentation Eplain Symbolic level characterists trom the inpu: cHure
CsNat
dentity mappin 15 SugEested in ResNet to promote gradieni

ropagation it uses element-i element addition It mav be

thought of as an algorithm that is handed a state from one ResNet

vic «premeten:
Tte nput S 2SG d with a
singie
sin module to another.

s v v l s n tite dictionar.. a-
one Fach laver in DenseNet receives all levels that
r
vi ri angur. t te r extra inputs from
deetors ezcn UrrespOndins to th it and transmits its that
came before own feature-maps to all lavers
came
after it. You utilize concatenation. Each layer receie
raions ot the epresentation space an
"collective knowledge" from the levels that came before it.
a diieent on
is also calle
Rsbie aving äiierent regions in input space. It .Fig. Q.11.1
shovs the DenseNet block.
oTevt epresentzto
? Eplain example
e What is Nondistributed representations of it.

Ans: Nondistibuted epresentations mav contain many entries but


control o v e r each entry.
withoutsigriticant meaningful separate
Erampies of learming algorithm based on nondistributed
representation learming are
a Clustering method, incdude the k-means algorithm: Each input
k channeis k channeis k channes channels
assigned to exactly one duster
:Grewth rate
b. K Nearest neighbor If k > 1, multiple values describe each
Dense Block in DenseNet with growth rate k
input, but they cannot be controlled separately from each other, Fig. Q.11.1
so this does not qualify as a true distributed representation. Each layer receives feature maps from all layers that came before
it,

Decision tree Only one leaf is activated when the and thin network with fewer
:
input is allowing for a more compact
given channels. The extra number of channels for each layer is the growth
d. Gaussian mixture and mixtures of expert :Each
input is rate k.
represented with multiple values, but those values cannot be
Therefore, it has greater memory and processing efficiency.
readily be controlled separately from each other.

DECODD A Guide for Engineering Studen1s

A Guide for Engineering Students


OECOD
Representation Learnine
Deep Learning Deep Learning

Representation Learming
Q.12 Draw and explain DenseNet Architecture.

Ans.: Basic DenseNet composition layer

6uy00d

Q.12.1 Composition layer


Fig.
Pre-Activation Batch Norm (BN), RelU and 3 x 3 Conv re

with output feature maps


performed for each composition layer of
k channels, for example, to transform Xo X1 *2 and X3 to x4 The

Pre-Activation ResNet came up with this concept.


6ujoodA
.DenseNet-B (Bottleneck layers)
uOIInjoAu03
BN-ReLU-1 x 1 Conv is carried out prior to BN-ReLU-3 x 3

Conv to lessen the complexity and size of the model.

k 4k k 6uood
channeis channelsS channels
uONNOAUOO
Fig. Q.12.2 DenseNet-B

.Multiple Dense Blocks with Transition Layers


The transition layers between two adjacent dense blocks are
1x1 Conv and 2 x2 average pooling.
Within the dense block, feature map sizes are uniform,
making it simple to concatenate them.
A softmax classifier is applied uonjouo
once a global average pooling
is completed atthe conclusion of the final dense block.
(Refer
Fig. Q.12.3 on next page)

Fig. Q.12.3 Multiple dense blocks

OECODD A Guide for pIcoDD A Guide for Engineering Stuudenis


Enginering Studenrs
Representation [Link] Unit VI
-0
Decp Learning
DenseNet-BC (Further Compression)

6
leature man.
The transition laver produces
m output
<
0 SI is refer
Ps Applications of Deep Learning
feature-maps, where 0 red
dense block has m

factor.
as the compression
transition laye
The quantity of feature-maps
across
yers 6.1 Overview of Deep Learning Applications :
or DenseNet with a
constant when 1. DenseNet-C,

experiment.
value Image Classification
of 6 < 1, and 0 0 . 5 in the
DenseNet-BC when both
h
known as
The model is the How is deep learning applied to computer vision tasks ?
bottleneck and transition layers with 6 <l are implement.- a.1
An. With the help of convolutional neural networks, deep
An
various L layers, and k ro....
DenseNets with/without B/C, the
O th learning is able to perform following tasks:
rates are also trained at this point.
a) Object recognition b) Face recognition
DenseNets?
a.13 Why do we need c)Motiondetection d) Pose estimation
specially developed to improve accuracy causo
Ans.: DenseNet was
ised e) Semantic segnmentation
the vanishing gradient in high-level neural networks due to the
by
and the
information Object recognition (detection): Nowadays Al is able to recognize
long distance between input and output layers In
vanishes before reaching its destination. both static and dynamically moving objects with 99 % accuracy.
it is a matter of dividing the image into fragments and
of DenseNets.
general,
Q.14 List the advantages
letting algorithms find the similarities to o n e of the existing objects
Ans.: Advantages: in order to assign it to one of the classes. Classification plays
an
1. Parameter efficiency: Every layer adds only a limited number

of parameters- for e.g. only about 12 kernels are learned per


important role in this process and the success of object recognition
largely depends on the richness of the object database.
layer Face recognition is the identification of a specific person known to
2. Implict deep supervision: Improved flow
the network- Feature maps in all layers have direct access to the
of gradient through the system from the database.
Motion detection: Motion detection is a key part of any
loss function and its gradient alarm, send a
surveillance system. This may be used to trigger an

record the event for later


notification to someone, or simply
END..& analysis. One way to detect motion is by using a motion detector,
which detects changes between frames of a n image sequence. The

simplest form of motion detection is threshold.


Pose estimation : Human pose recognition is a challenging

computer vision task due to the wide variety of human shapes and
and crowded scenery. For these
appearance, difficult illumination
ECODED A Guide for Engineering Students
6-1)
rhons
Deep Le -

Photographs image sequenues


depth mages, arnin eey Leurning

What
isimage
image classification in deep learning ?
Applhcatonsof Dep Leurni"
SAS to estimate th
ocation classification is
are used a.4
Ans. m a g e classifica.
devirs aage where
a tnm motiom captur
a
computer can
analyses an

t hunnan iints.
and identify the 'class'
image and the image falls under. For example,
dcep learning
that
attempts
att.
ot an image of a sheep. Image classification is the process of the
is a tpe
segmentation input
asses,
antic
into o n e of
several
classes. compu
zing the image and telling you it is
ut e r a n a l y z i

lassity ench pinel in a n image


then u s e d during
suuch a
train.
sheep. a

wd skr or grass,
These labels
are

processed they can also be seg ining Early image classification relied on raw pixel
data.
break down images into individual
This meant that

that when new images


are

loOk like compare. amputers would pixels. The


these categories based on what they ed oblem is that two pictures of the same thing can look very
into

peviously seen pictuns.


with prob
hey
different. The can have different backgrounds, angles, poses, etc.

? This made it quite the challenge for computers to correctly 'see


Q.2 What is image classification
classitication is the task of categorizing and ass and categories inmages.
Ans.: Image
within an image depend
labels to groups of pixels or vectors
dent Image classification with deep learning most often involves
particular rules. The categorization law can be aPplied throuph on neural networks, CNNs. In CNNS, the nodes in
One convolutional or

textural characterizations. share their output with every node


or multiple spectral or
the hidden layers don't always
Image classification techniques are mainly divided into in the
next layer.
categories Supervised and unsupervised image lassificatio
two
ion .Deep learning
allows machines to
identify and extract features
techniques. This m e a n s they can learn the features to look for, in
from images.
Q.3 Explain supervised and unsupervised classification. images by analyzing lots of pictures.
Ans.: 1. Supervised classification:Supervised image classificatian
methods use previously classitied reference samples in order
to
train the classifier and subsequently classify new,
data. Therefore, the
unknown
supervised classification technique is the
process ofvisually choosing samples of training data within the
image and allocating them to pre-chosen categories, includin wioivwwwowow

vegetation, roads, water resources and buildings. This is done What the computer sees
to create statistical measures to be applied to the overall
image. 82 % cat
2. Unsupervised classification Unsupervised classification 15 % dog
image classification
technique is a
fully automated method that does not leverage 2 % hat
1 % mug
training data. This means machine
learning algorithms are used
to
analyze and cluster unlabeled datasets by
hidden patterns or data groups without the need for
discovering Fig. Q4.1
human
intervention.
OECODS A Guide for A Guide for Engineering Students
Engineering Students OEcoDD
Applications of Deep Learnin
Deep Learning Image is an arra of pixe Deep Learning 6-5
Applications of Deep Leurning
in the form of pixels.
Image is analyzed of
where size of the
matrix depends
on
resolution

pixels
an
mage. Imag
im:
into
to specifie
. CNN layers can be of four main types Convolution, ReLu. pooling
and fully-connected layer.
grouping
dassification is done by
task
to as classes. 1. Convolution Layer : A convolution is the simple application of
categories referred
most prominent features a filter to an input that results in an activation. The convolution
into its
image is segregated
The
idea about the class of th layer has a set of trainable filters that have a small receptive

mage
an
the classifier
algorithm giving extraction range but can be used to the full-dept of data provided.
process
to. Thus the feature is mos
it may belong data fed used
classification. Also
to
Convolution layers are the major building blocks in

important step in image


algorithm plays an important role
particularly in supervise
upervised image 2. Relu
convolutional neural networks.

Layer: ReLu layers, also known as Rectified linear unit


lassification technique.
and
Q.5 List the application areas of image classification. layers, are activation functions applied to lower overfitting
build the accuracy and effectiveness of the CNN. Models that
classification forms basis
of computer vision probler
Ans. : Image blems have these layers are easier to train and produce more accurate
image classification can be
e used to d
used
1. In self-driving cars

peoples around etc.


detecy results.
traffic lights, trees, vehicles,
of all neurons in
be used to analyze medical images 3. Pooling Layer: This layer collects the result
2 In healthcare it can s
illness.
and the layer preceding it and processes this data. The primary task
depict the symptoms of of factors being
Al and of a pooling layer is to lower the number
3. With ubiquitous technologies
such as
IOT, h.
considered and give streamlined output.
amounts of data in the form of images, video and speech
are
Fully-Connected Layer: This layer is the final output layer for
generated. Image and video data posted by persons can be use
used
4.
CNN models that flattens the input received from layers before
in recommendation system in online shopping or places to visit
it and gives the result.
etc.
Q.6 How image is classified using convolutional neural network ?
6.2: Social Network Analysis
Ans.:Fig. Q.6.1 shows CNN for image classification. w w . w . w * * * * * * * * * *

Cat: 0.7 Q.7 What is a social


network?
Dog: 0.1
Ans.: A social network is a group collaborating, and/or competin8
of
network is
to each other. Social

-9-
related
individuals or entities that are
members that a r e
social actors, or nodes,
formally defined as a set of
relations.
Tiger: 0.02 connected by one or m o r e types of
What is social network analysis?
Convoiution Pooling Fully-connected Q.8
social relations
Ans.: Social Network Analysis (SNA) is the study of
Fig. Q.6.1
among a set of actors.
Students
A Guide for Engineering
QECODD A Guide for Engineering Students
CIcoD
Application of
Deep
6-0 Learninp Deep Learning
6-7 Applications of Deep Learning
Deer Learming
Delation : It is the collection of ties of a specific kind among
analysis.
among pairs
network

The of friendship
members of a group. Example:
social set

Q.9
List the
principles

andtheir
of

actions
are
viewed

units.
as
interdeno

pendent rather of children


in a classroom.
Actors
and
1. the nature of the sets of actors
autonomous

Network can be categorized by


Ans.:
independent, actors are cha
than
(linkages)
between
Is foy of the ties among them. The number
of modes in a
the properties
material)
ties material or non
(either entities in
number of distinct kinds of social
elational

2.
network refers to the
resources

"flow" of
transfer or
individuals
view
ew the network
models
focusing on the network.
opportuni+

Two-mode networks
Network
providing
3. as

structure
environment

One-mode networks are a single set of actors.


action.
set of actors and one set ot
on
individual
focus on two sets of actors, or one
(social are

conomic,
constraints
structure
conceptualize
models events
patterns of relatic
4 Network
as lasting patt ons amon
and so forth)
What is social network analysis ? Explain.
political o.11
is the mapping and
actors. analysis, Ans.
Social Network Analysis [SNA]
soctal network
used in and flows between people, groups,
Q10 Explain
terminology
network analvsis measuring of relationships
social connected
used in computers, URLs and other
Ans.:Terminology is organizations,
group and relation. term "social network"
has
triad, subgroup, information/knowledge entities. The
relational tie, dyad,
Actor: Actor is discrete
individual, corporate, or
lective social
collective
been introduced by Barnes
in 1954.
departments within
a group, hin in a relations among a set of actors.
The
units. Examples People in .SNA is the study of social
in a city, nation-states are aimed at
the methods of data collection in network analysis
service agency
corporation, public
data in reliable manner.
world system. collecting relational a

to another by ial
social ties.
tiec
A tie using standard
Relational tie:
Actors are linked .Data collection is typically carried out
the
linkage between
a pair of actors. and observation techniques that aim to ensure
establishes a questionnaires
consists of pair of actors
between two actors and completeness of network data.
a ct

Dyad: Itis a tie correctness and

and the tie(s) between them. SNA is based on an assumption of the importance of relationships
ties. A subset of three
Triad Triples of actors and associated hree among interacting units.
models and
actors and the tie(s) among
them.
.The social network perspective encompasses theories,
or
are expressed in terms of
relational concepts
Subgroup of actors is defined as any subset of actors and all ties applications that

among them. processes.


the
.Group: Group is the collection of all actors on which ties are to be The nodes in the network are the people and groups while
SNA4
measured. links show relationships or flows between the nodes.

DECODE A Guide for Engineering Students A Guide for Engineering Students


QECOD
Applications of
6-8 Deep Learni
Deep Learning and a
m a t h e m a t i c a l
a.

nalysis ot huma Tninp 6-9


visual Deep L e a r n i n g
Applications of Deep Learning
provides
both a
man acebook is a social networking service and website that connects
analysis is that, unliken
relationships.

advantage ofsocial
network

Network
Net analysis allows
many ottherhe people
ped
with other people
and share data between people. A user
The i n t e r a c t i o n .

focuses
on
us to can create a personal profile, add other users as friends, exchange
it of networks
methods,
configuration
data, create and join common interest communities.
examine
how
the
organizations,
or systems funct
« how
social
individuals
and groups,
network
analysis Struch
Structural intuitioy . Twitter is a
net-working and microblogging service. The
users of Twitter can exchange text-based posts called tweets. A
of social
.Features

Sy'stematic
relational
data,
models.
graphic
8raphic
representation and
tweet is a maximum 140 characters long but can be augmented by
computational or audio The main concept of Twitter was to
recording.
mathematical
or
pictures
Social network analysis: build a social network formed by friends and followers. Friends are
the ties among them
em.
actors and people who you tollow, tollowers are those who follow you.
Refers to the set of
a) units arisi
of the social
b) Views on
characteristics

or focuses on prone
rising out of . The role of social networks in labor markets deserves attention for

relational processes at least two reasons: First, because of the central role networks
structural or of the
relational system themselves.
play in disseminating information about job openings they place a

information on relationshin. critical role in determining whether labor markets function


c) Inclusion of concepts and hips among
efficiently; and second, because network structure ends up having
units in a study.,
implications for things like human capital investment as well as
The task is to understand properties
of the social (econa
d)
structural environment, and
inequality.
political)
Social Network Analysis (SNA) primarily focuses on applyins
these structural properties influence
e) How
observed analytic techniques to the relationships between individuals and
characteristics and associations among characteristics.
Relational ties among actors are primary and attributes n
groups and investigating how those relationships can be used to
infer additional information about the individuals and groups.
actors are secondary.
SNA is used in a variety of domains. For example, business
g) Each individual has ties to other individuals, each of whom in
consultants use SNA to identify the ettective relationships between
turn is tied to a few, some, or many others and so on.
workers that enabie work to get done; these relationships often
Q.12 Explain different application of social network analysis. differ fronm connections seen in an organizational chart.
Ans.: Social Network Analysis (SNA) is an important and valuable Law entorcement personnel have used social networks to analyze
tool for knowledge
extraction from massive and un-structured terrorist networks and criminal networks. The capture of Saddam
data. Social network
provides a powerful abstraction of the Hussein was facilitated by social network analysis : Military
structure and
dynamics of diverse kinds of inter-personal officials constructed a network containing Hussein's tribal and
connection and interaction.
family links, allowing them to focus on individuals who had close
OECOD ties to Hussein.
A Guide for
Engineering Studens
OECODE) A Guide for Engineering Students
Applications of Deep
6-10

analysis
helps to
helps deep learning, Learnin, Deep Learning
6-11 Applicationsof Deep Learning
DegLomi
social
n e t w o r k

the prima stepis to encode the shows network representation. The


autoencoder based
Q13
How analrsis
Fig. Q.13.2
Ans: For
soial
network

network
mbeddings. Networ.
embe embedd aze function ot
encoder is to map the features of each node
into a latent

plsible
icationsduel
into data. reconstructed by
Information about the network. is then
network

network
data of
representations
space.
etc. are possil
[Link]
this latent space.
dimensional

prediction
low
custering link decoder from
hidden representation layer is usually small as compare
c l a s s i f i c a t i o n s

The size of
network epresentation learning.
renre
neural
networks
can
be used
to learn

based o n
presentaions fron to input/output
layer. The non linear network
structure

representation. The output is


decoded from
is captured
Deep Three
categories by this compressed
data.
dimensional Minimizing the
network

these low representations


decoded output is the main
are models reconstruction error between input and
table based
1) Look-up autoencoder.
based
models obiective function of
2
Autoencoder

is fed with the rows of the proximity matrix


Network (GCN) based models Autoencoder
3Graph Convolutional SERIVix/VI so as to learn and generate embeddings
tables
look -
Up zERIVIx D at the hidden laver.
with embedding
1. Models
netivork representation
.Look up
tables can be
used for
learmin
layers of nonlinear transtormations
instead of using multiple Constructing5
index is directly mapped
with its corre
onding representation
tables.
rector using the look up
Look up table can be implemented using matrix. Each row of sthe
of one node.
corresponds to the representation Input S Encoder Decoder
matrix
.Main building blocks of the model with embedding look-up table Fig Q.13.2 Autoencoder based network representation

are shown in Fig. Q.13.1. Sampling and modeling are the two kev Q.14 What is graph convolutional approaches?
is approach for
components of this approach. Ans. Graph Convolutional Network (GCN) an

Generate node graph-structured data. It is based on


Network Gwith Sample target semi-supervised learning on

cbservations frem G embeddings that


efficient variant of convolutional neural networks which operate
acjacency matrix A
preserve observations an

directly on graphs.
Fig Q.13.1 Building blocks of models with embedding look - up tables
via a
The choice of convolutional architecture is motivated
2. Autoencoder based models localized first-order approximation of spectral graph convolutions.

Two neural network modules of an autoencoder are: The model scales linearly in the number of graph edges and learns
i) Encoder and ii) hidden layer representations that encode both local graph structure
Decoder
and features of nodes.

ECODS A Guide for Engineering Studens OECOD


A Guide for Engineering Studens
Applications of Deep Lear
6-12 -13 Applcatloms of Deep Learning
p e r pL e a r n n

Deep Learnng
6.3:SpeechRecognltion
The
Language Model (LM) is an important module as it captures

the orammatical rules or the semantic information of a language.


? aguage models are important in order to recognize the output
Language moa

recognition Dro
is speech of a machine or
Q.16 What
recognition
is the ability
into readable
program
s to ken
take from the classification model as well as to make corrections

Ans.: Speech convert


them text. the output
text.
aloud and on
words spoken (ASR)
identify Kecognition Howrecurrent neural networks support speech recognition?
tormulate AutomaticSpeech a17
a16 How to
ASR system form an audio
is to transform
ns.: RNNs pertorm computations on the time sequence since their
The main goal of
an N) with a specific length T intoa An hidden states.
Ans.:
rrent hidden state is dependent on all the previous
( , X» Ay
input signal
x
V, wh
Yn), y, e V, where VN More specifically, they
are designed to model time-series signals as
characters y (yu Y
sequence ofwords or well as capture long-term
and short-term dependencies between

is the vocabulary of the input.


Recognition. different time-steps
shows Automatic Speech shows bidirectional RNN used for automatic speech
Fig Q16.1 Foature Clasiflcation Fig. Q.17.1
Input

50und
Pre-processing extracton
model
Predictong recognition.
Y-1

Language
Mocdel
(LM)

Fig. Q.16.1 ASR

for ASR system


are pre-processing, featu
ture
Processing steps
extraction, classification
and language modeling.

The pre-processing step aims to improve the audio signal byby -1


ratio, reducing the noise and filterine Flg. Q.17.1 Bidirectional RNN for ASR
reducing the signal-to-noise
the signal. Theinput sequence is X= (X, X X7) . ,

The features that used for ASR, are extracted with a specific
are Hidden sequence is h
=
(h, hy .., hy) and
number of values or coefficients, which are generated by applying
Output sequence is y (Y» Y»
=
yn)
various methods on the input. Feature extraction techniques are h
RNNs compute the sequence of hidden vectors
as :

Mel-Frequency Cepstral Coeficients (MFCCs) and Discrete


h-H(W, h x+Waah+ b)
Wavelet Transform (DWT).
The classification model aims to find the spoken text which is y,-W, h,+b,
the b are the bias vectors and H is the nonlinear
contained on the input signal. It takes the extracted features from where W are weights,
the pre-processing step and generates the output text. function.

OECODE A Guide for Engineering Students OICODE


A Guide for Engineering Students
4h ns of
Deep lear Deylearning
6-1S Applicartosof Deep Learning
Ayi y inavNmation ation n'Randing
uture contev etflix, are real-world examples of the operation of industry
athe past ntent.
Bidirandectional R
m y g n t o n

on Nes

n s h m y m v t a n t
as
strength recommender systems.
apual hr in both forwaard
int
backowons
is
mh the nput
shows recommend systems concept.
d
R:AAs

i m t n s
mre

and
tinis is
hidden
h iudden

nidirectional RNN,
unidirectiona
state
vector for each
direction.
bidirectional RNThat
Fig
Q18.1

ot u s n g
inshmd
Buy
wh speeh
egnition.

u s a i Ar in.
wdel c l a s s i t i c a t i o n
of audio
input signal Similar
t r a m e - w i s e

oth feed forward and ecurrent neural can


Onl both teed forw

Nrome
using

alignment
betiween nput audio and netwothy
correspon
Recommend

Sone be donsdone using Hi


output
is
needed.
This can
n
transcrihed

Models (HMNM) or
Connectionist
Temporal Classifica
Tempor
Markov Fig. Q.18.1 Recommendation systems
(CTC) loss between almost every modern
objective
function. Alignment the input sped
the
.
Recommendation systems are a key part of
CTC is an
ot the words is Cons The systems help drive customer interaction
the output
sequence consumer website.
signal and usin and sales by helping customers discover products and services

CTC they might


not e v e r find themselves.

6.4: Recommender System u s e r for thesee


.Recommender systems predict the preference of the
items, which could be in the form
of a rating o r response. When
? Explain in detai, the
a18 What is recommender systems more data becomes available for a customer profile,
are a way ot suggesting like Ore
Ans. Recommender systems recommendations become more accurate.

tems and ideas to a users specific way of hinking. Recommend There are a variety of applications for recommendations including
items
widely used on the Web for recommending produ products (e.g., Amazon or similar
systems are movies (e.g. Netflix), consumer

and services to users


on-line retailers), music (e.g. Spotify), or news, social media, online
Recommender systems try to automate aspects of a complet dating and advertising.
different information discovery model where people try to fin
other people with similar tastes and then ask them to suggest nev Recommendatlon process
in order
Every recommendation system follows a specific process
things.
recommender
to produce product recommendations
The goal of a system is to generate meaninghu be classified based on the
The recommendation approaches can
recommendations to a collection of users for items or productsthe Three possible
of sources
information sources they use
interest them.
might Suggestions for books on Amazon, or
movi information can be identified as input for the recommendation

pIcoDD A Guide for Engineering Students


A Guide Jor Engmeerng Suden
6-16
Applications of Deep 6-17 Applications of Deep Learning
DeerLearing are
the user data
d- Leurnin
(demographics\
DeepLearning

based recommender system is a system that


Knowledge
s o u r c e s

available

and the user-iten


process.
The
(keiwords,
genres)
and
ratings. produces its result based on additional and means-end
data be explicit
theitem
(tatingS
can
collected
Data knowledge.
icit (page views,
order histoprand
C o l l e c t i o n :

1. or implicit This type of


on
products)
Demographic based recommender system
comments 4.
based on a set of
recommendation system categorizes users
etc.).
used to create rrecommendation
to create ecomm
market research
2. Storing:
The tvpe of
kind of
ot data
storage
: we shoul ons Can demographic classes. This algorithm requires
help
database.
user
decide the

object storage,
or
standard SQL databaso
se. NSOL data to fully implement.
history of user
The main benefit is that it doesn t need

ratings
The recommender
system finds items
3. Analyzing:
engagement
data atter
analysis. similar 5. Hybrid recommender systems combine various

to
inputs and
take advantage ot the
strategies
data gets filtero
iltered to access recommendation
user
where different
This is the
last step
4. Filtering: synergy among them.
required to provide
the relevant
information
this, user will need
recommendatitiong
recom

to advantages and disadvantages of


collaborative
a.21 What
enable are
the user. To se an
to
the
recommendation system filtering?
algorithm suiting
recommender system? Ans.:Advantages
Q.19 What are challenges of recommend interesting

the challenges for building re 1. Collaborative filtering application is to


Ans. Following
are
commender or popular information as judged by
the communitv

SV'stems can make more personalized


amounts of data,
tens or mllions of
customers 2. Collaborative filtering system
1. Huge and information from your past
millions of distinct catalog items. recommendation by analyzing
of similar taste.
2. Results are required to be returned in real time. activity or the history of other users
customers have limited information.
3. New Disadvantages

4. Oid customers can have a glut ot intormation. commercial recommender svstems are based on large
1. Many
5. Customer data is volatile. datasets. As a result, the user-item matrix used tor collaborative

various types of recommender system? could be extremely large and sparse, which brings
Q.20 Explain filtering
ot the
Ans.: In general, there are three types of recommender system: about the challenges in the pertormances
1. Collaborative recommender system is a system that produes recommendation.

traditional CF
its result based on past ratings of users with similar 2. As the numbers of users and items grow,
preferences. algorithms will sutter serious scalability problems.
not
2. Content based recommender system is a system that produces 3. Gray sheep reters to the
users whose opinions do
its result based on the and
similarity of the content of the
consistently agree or disagree with any group ot people
documents or items. thus do not benefit from collaborative tiltering.

OECODD A Guide for Engineering Studens


AGuide for Engineering Studen
OICODD
6-78 Applications of Deep
Deer Learning Learning Deep Learning 6-19
4A collaborative filtering system doesnt atically match
automaticall..

two main problems of


Applications of Deep Leurning
content to one's preferences.
The user-based CF are that the whole user
collaborative filtering87 1atabase has to be
kept in memory and that expensive similarity
.22 Explain typesof
collaborative filtering algoi outation between the
computa
active user and all other the
Ans. There are two tvpes of thms database has to be performed.
users in

U'ser based and item based.


2. Item-based llaborative filtering
1. User based

User-based collaborative filtering algorithms work offf the premise r


Ttem-basedCF is a model-based
approach which produces
recommendations based on the relationship between items interred
a user (A) has a
that if similar profile to another user (B), tho.
A is from the rating matrix. The assumption behind this approach is
more likely to prefer things that B preters when compared wi
ith a that users will preter items that are similar to other items thev like
user chosen at random.
T h e assumption is that users with Similar preferences will .The model-building step consists of calculating a similar1ty matrnx
items similarly. Thus missing ratings for a user can be predicted
rate containing all item-to-item similarities using a given simiar1tv

first finding a neighborhood of similar users and then


by
aggrepa.
measure. Popular are again Pearson correlation and Cos1ne
similarity. All pair-wise similarities are stored in n nsimilar1ty
the ratings of these users to form a prediction.
matrix S.
The neighborhood is defined in terms of similarity between 1u.
users, .ltem-based collaborative filtering has become popularized due to
either by taking a given number of most similar users (k neare
arest its use by YouTube and Amazon to provide recommendations to
neighbors) or all users within a given similarity threshold. Populs.
users. This algorithm works by building an item-to-item matri
similarity measures for CF are the Pearson correlation coefficient
which defines the relationship between pairs ot items
and the Cosine similarity.
When a user indicates a preterence for a certain type ot item. the
For example, a filtering recommendation system for
collaborative
matrix is used to identify other items with similar characteristics
television tastes could make predictions about which television
that can also be recommended.
show a user should like given a partial list of that user's tastes
.Item-based CF is more etticient than user-based CF since the model
(likes or dislikes).
is relatively small (N * k) and can be fully pre-computed. Item
.Note that these predictions specific to the user,
are
but use based CF is known to only produce slightly interior results
information from many users. This differs from the
gleaned
compared to user-based CF and higher order models which take
simpler approach of giving an average score for each item of the joint distribution of sets of itenms into account are possible.
interest for example based on its number of votes. in large scale
Furthernore, item-based CF is successtully applied
.User-based CF is a memory-based algorithm which tries to mimics recommender systems (e g.. by [Link]).

word-of-mouth by analyzing rating data from many individuals.

Sudens
A Guide for Engineering Studens 4Guide for Engineering
pecob
Applications of Deep
Deep Learning
6-20

a
arrc
chhi
itte
ec tu
ct ur
ree
of conten
content Learming D e Learnlng
6-21
Applicarioms of Deep Learming
a.23 Draw and explain

of
documents
and products.

apDro
based
reter to such
r e c o m m e n d a t i o n
information

Content based
recommenders
hes, that sourc
Feechay

Ans.
by comparing
representations
ons of
terestscontethent
that int e
recommendations
provide content
of content User
proflg
tem
representations

item to descriptions
an
describing also referred to
user. These approaches are sometimes nt Content Useru Profle ctive
training deaner
based filtering to
analyzer
examples
Content based
recommendation systems try mend item
recomme

given user
has liked in
the past. User
orofig
similar to those a STUcured

application, a movie tem meare


In a movie
recommendation
may be
director6
repre sentation

represented by such features as specific actors, rector, genre, Represented Praties


Ft

subject matter, etc. items

preference is
also represented by the
the same se
e.

The user's interest or

of features, called the user profile.


made by comparing the user profil Eia, Q.23.1 High level architecture content-based recommender systems
Recommendations are
set of features, The
with
in the same
candidate items expressed top-k 3. Flterlng Component
items are recommended to the
best matched or most similar ser. profile with the actual item to be recommended
. Matching the user

based recommendation
Thesimplest approach content - to
to . Uses different strategies
of the user profile with each item.
compute the similarity . Users have no detailed knowledge of collection makeup and the

environment. Most often need to reformulate their


High level architecture content-based
recommender systems retrieval users

Fig. Q.23.1 shows high level architecture content-based queries to obtain the results of their interest.
.Fig of content based
recommender systems. (See Fig. Q.23.1 on next page.) a.24 Explain advantages and disadvantages
filtering
1. Content Analyzer Ans.: Advantages
Extracts the features (keywords, n-grams) from the source 1. User Independence : Recommends only the items that interest

Conversion from unstructured to structured item the user


Recommendation is based on the item features,
Data stored in the repository represented items 2. Transparency :

explicitly list the contents features


2. Profile Learner
3. New Item: Helps in recommending new items that are not yet
.To build user
profile rated by other users.
Updates profile using the data in Feedback repository
the

OECODS A Guide for Engineering Students OECODS


A Guide for Engineering Students
e Loamin
6- nlications oDeep 6-23 Applicarioms of Deep Learning

Drawbacks for ditfo.


Learnin D e gL e a n t n g

6.5: Natural LanguageProcessing


The user
will never
e
recommended

ierent items
led as
as the
t user does
be enpanded t
cannot
try a language processing?
diffe What isnatural
Business

tpe of product.
a.26

(NLP) describes the interaction


Recommends those iten Natural Language Processing that
Overspecialization: at Score tüg human language and computers. It is a technology
between often
but
protile. and has been around for years,
is
with the
people use daily
user

ew user,
new svste.
many
4 Cold Start
Problem: tor a systems don' taken for granted.
intormation to
recommend items. ha day are Spell check,
examples of NLP that people every
historical use

collaborative filteri A few


a.25 Explain difference
between
ltering and conte autocomplete, voice text messaging, spam filters,
related keywords
based filtering. onlen search engines and Siri,
Alexa, or Google Assistant
on
Ans. of natural language processing.
a.27 Explain phases
Collaborative Filtering Content Filtering two phases 1) Data preprocessing
and 2) Algorithm
i.
Ans. NLPhas
No.
development.
1 Collaborative-Filtering Content-based systems focus onon Data preprocessing: In preprocessing
text data is features in
sV'stems focus on the properties of items. o
so as to make it suitable
to analyze and
relationship between users text data are highlighted
can be done by,
tokenization,
and items. process
machine
by preprocessing

stop word removal,


lemmatization and speech tagging.
2 Example: Netflix movie Example [Link] music After preprocessing the data, NLP
recommendations recommendations o Algorithm development:
main types of
algorithm is developed to process it. Two
Pro: Does not assume Con: Assumes access
to side algorithms used for NLP are,
access to side information information about items rules of
about items 1. Rule based system:It uses carefully desigmed linguistic
a language.
4 Cannot recommend new It can recommend new items based system: It uses statistical methods.
items 2. Machine learning
Models learns to perform the task trom the training data
5. Item features are inferred Match the item features with own rules by using
user provided. NLP design their
algorithms can
from ratings.
preferences. combination of machine learning, neural
network and deep
b. Con: Does not work on Pro: Got a new item to add ? No learning through repeated processing and learming.
new items that have framework for
no
problem, just be sure to include the a.28 Explain convolutional neural network based
ratings side information. NLP.
tor
Ans. : CNN can be used to constitute words or n-grams
based
extracting high level features. Fig. Q,28.1 shows the CNN

OICODED A Guide for Engineering Studenrs


A Gulde for Engneerng Students PECODED
6-25
pegLearning

Applicatiomsof Deep Lenrning


Deer Learming 6-24 Applications of Deep rawback of CNN is that
are transtormed into
arning The
it cannot handle long distance
words
tTamework. Here the
table. This results vector xtual information and also ineficient in preserving sequential
contextual information
in
epresentation through look-up wor r Of context. Therefore Recurrent Neural Networks (RNN) are
learned during trajni
embedding approach where weights are
training of suitable for this,
network. more

Sotmax classification Explain recurrent neural network based framework for NLP
a28
: RNN are effective for sequential data processing in RNN
Ans.

Fully connected layer


tation is recursively applied to each instance of input
o m p u t a t i

sequence
ence from previous computed results. Recurrent unit is
Max-pool sequentially fed with the sequences represented by fixed size
over time
vector of tokens. RNN based framework is shown in Fig Q29.1

Convolution
ayer
|W W

Unfold

X-1

Feiure Fig. Q.29.1 RNN based framework for NLP

. The advantage of RNN is that it can memorize the results of


input
previous computation and utilize that information in current
sentence]
W W WN-1 computation. So it is possible to model context dependencies in

of
inputs of arbitrary length with RNN and proper composition
Fig. Q.28.1 CNN based framework for NLP can be created.
input
The steps to perform sentence modeling with CNN are as .Mainly RNNs are used in different NLP tasks like.
follows machine
1. Natural language generation (eg. image captioning,
1. Sentences are tokenized into words. Then it is further translation, visual question answering)
transformed into word embedding matrix of dimension 'd 2. Word -

level classification ( e.g. Named Entity Recognition


This forms the input embedding layer. (NER))
2. Convolutional filters are applied to this input layer of word 3. Language modeling
embeddings to produce feature map. 4. Semantic matching
3. Then 5. Sentence-level classification (eg., sentiment polarity)
max
pooling applied
is each filter. This reduces the
to

dimensionality of the and


END...
output produce fixed length output.
Thus the final sentence
representation
is created.
A Guide for Engineering Studen
OECODE A Guide for Engineering Students OECODED
M-2 Solved Model Question Paper
p e e pL e a r u i n g

Autoencoders.
b)Write short note on Contractive
Solved Model Question Paper [End Sem (Refer Q.12 of Chapter 4) (5
Deep Learning c)What is autoencoder ? Explain properties of Autoencoder.
(Refer Q.1 and Q.2 of Chapter 4) (71
B.E. (IT) Semester V1l (As Per 2019 Pattern)
DenseNet Architecture.
a) Draw and explain
Time: 251 Hours 0a5 (Refer Q.12 of Chapter 5) 61
(Maximum Marks: 70 note on Dense Block. (Refer Q.11 of Chapter -5)16]
N.B.:i) b)Write short
Attempt Q.1 or Q.2, Q.3 or Q.4, Q5 or Q.6, Q.7 or Q.8, c) Write and explain an algorithm for Gready Layer-Wise
ii) Neat diagrams must be drawm wherever necessary. (Refer Q.3 of Chapter- 5) 41
Unsupervised Pretraining.
ii) Figures to the right side indicate full marks. OR
iv) Assume suitable data, What is representation? Explain symbolic
distributed
if necessary. a.6 a)
Nondistributed representations? Explain
Q.1 a) What is a directed graphical model in RNN? representation. What is
(Refer Q.9 and Q.10 of Chapter 5) [81
(Refer Q.6 of Chapter - 3) example ofit.
b) Explain types of recurrent neural networks. 6) ?
b) What is transfer learning Explain
its types.
16)
(Refer Q.10 of Chapter 3) (ReferQ.8 of Chapter-5)
6 ideas work ?
c) Draw and explain encoder decoder architectures. c) When and why does unsupervised pretraining
Chapter- 5) [41
(Refer Q4 of
(Refer Q.16 of Chapter 3)
OR
6
OR classification.
Q.7 a) Explain supervised and unsupervised
Q.2 a) Explain unfolding computational graplhs
(Refer Q.3 of Chapter- 6)
61
(Refer Q.2 of Chapter-3) to computer vision tasks ?
6 b) How is deep learning applied 6
b) Explain memoryless models for sequences.
(Refer Q.1 of Chapter-6)
(Refer Q.12 of Chapter 3) Automatic Speech Recognition (ASR) ?
161 c) How to formulate
c) Write short note on recursive neural networks.
(ReferQ.16 of Chapter-6)
(Refer Q.17 of Chapter 3) OR
61
Q.3 a) Explain Denoising Autoencoders. (Refer Q.11 of Chapter- What is recommender systems ? Explain in detail.
4)16 Q.8 a)
I61
b) Write short note on Stochastic Encoders and Decoders. (Refer Q.18 of Chapter- 6)
(Refer Q.8 of Chapter- 4) b) What is social network analysis ? Explain.
6]
c) Explain any two application of Autoencoders. (Refer Q.11of Chapter-6 (8
(Refer Q.14 of Chapter-4) c) List the application areas of image classification.
(Refer Q.5 of Chapter-6) [31
OR
Q.4 a) Explain Sparse Autoencoder with its advantages and END .E
disadvantages. (Refer Q.6 of Chapter 4)
A Guide for Engineering Students
OECODE
(M-1)

You might also like