0% found this document useful (0 votes)
5 views104 pages

Deep Learning RK New

The document discusses the development and training of deep learning algorithms that can understand and describe scenes in natural language, infer semantic concepts, and model complex behaviors. It highlights the challenges of deep learning, such as the need for large labeled datasets and the limitations of back-propagation methods, while also presenting solutions like Deep Belief Networks. Additionally, it covers various types of neural networks, their applications, and the limitations of deep learning, including data availability and hardware dependence.

Uploaded by

nehacharles2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views104 pages

Deep Learning RK New

The document discusses the development and training of deep learning algorithms that can understand and describe scenes in natural language, infer semantic concepts, and model complex behaviors. It highlights the challenges of deep learning, such as the need for large labeled datasets and the limitations of back-propagation methods, while also presenting solutions like Deep Belief Networks. Additionally, it covers various types of neural networks, their applications, and the limitations of deep learning, including data availability and hardware dependence.

Uploaded by

nehacharles2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A Motivational Task: Percepts 🡺 Concepts

• Create algorithms
• that can understand scenes and describe
them in natural language
• that can infer semantic concepts to allow
machines to interact with humans using these
concepts
• Requires creating a series of abstractions
• Image (Pixel Intensities) 🡺 Objects in Image 🡺 Object
Interactions 🡺 Scene Description

• Deep learning aims to automatically learn these


abstractions with little supervision
Courtesy: Yoshua Bengio, Learning Deep Architectures for AI

1
Deep Visual-Semantic Alignments
for Generating Image Descriptions
(Karpathy, Fei-Fei; CVPR 2015)

“two young girls are "boy is doing backflip "construction worker in "man in black shirt is
playing with lego on wakeboard." orange safety vest is playing guitar."
toy.” working on road."

[Link]

2
Challenge in Modelling Complex
Behaviour
• Too many concepts to learn
• Too many object categories
• Too many ways of interaction between objects categories
• Behaviour is a highly varying function underlying
factors
• f: L 🡺 V
• L: latent factors of variation
• low dimensional latent factor space
• V: visible behaviour
• high dimensional observable space
• f: highly non-linear function

3
Example: Learning the Configuration Space of a Robotic
Arm

4
How do We Train Deep
Architectures?
• Inspiration from mammal brain
• Multiple Layers of “neurons” (Rumelhart et al 1986)
• Train each layer to compose the representations of the previous
layer to learn a higher level abstraction
• Ex: Pixels 🡺 Edges 🡺 Contours 🡺 Object parts 🡺 Object categories
• Local Features 🡺 Global Features
• Train the layers one-by-one (Hinton et al 2006)
• Greedy strategy

5
Multilayer Perceptron with Back-propagation
First deep learning model (Rumelhart, Hinton, Williams
Compare outputs
1986) Back-propagate with correct answer
error signal to to get error signal
get derivatives
for learning output
s

hidde
n
layers

input
vector
Source: Hinton’s 2009 tutorial on Deep Belief Networks 8
Drawbacks of Back-propagation based
Deep Neural Networks
• They are discriminative models
• Get all the information from the labels
• And the labels don’t give so much of information
• Need a substantial amount of labeled data

• Gradient descent with random initialization leads to poor local


minima
Hand-written digit recognition
• Classification of MNIST hand-written digits
• 10 digit classes
• Input image: 28x28 gray scale
• 784 dimensional input
A Deeper Look at the Problem

• One hidden layer with 500 neurons


=> 784 * 500 + 500 * 10
≈ 0.4 million weights
• Fitting a model that best explains the training data is an
optimization problem in a 0.4 million dimensional
space
• It’s almost impossible for Gradient descent with
random initialization to arrive at the global optimum
A Solution – Deep Belief
Networks (Hinton et al. 2006)
Pre-trained Slow Fine-tuning
N/WWeights (Using Back-propagation)

Fast unsupervised Good


pre-training Solution

Random
Initial position Very slow Back-propagation
(Often gets stuck at poor local minima)

Very high-dimensional parameter space


A Solution – Deep Belief
Networks (Hinton et al. 2006)

• Before applying back-propagation, pre-train the network as a


series of generative models

• Use the weights of the pre-trained network as the initial point


for the traditional back-propagation
• This leads to quicker convergence to a good solution

• Pre-training is fast; fine-tuning can be slow


Quick Check: MLP vs DBN on
MNIST
• MLP (1 Hidden Layer)
• 1 hour: 2.18%
• 14 hours: 1.65%

• DBN
• 1 hour: 1.65%
• 14 hours: 1.10%
• 21 hours: 0.97%
Intel QuadCore 2.83GHz, 4GB RAM
MLP: Python :: DBN: Matlab
Intermediate Representations
in Brain
• Disentanglement of factors of variation
underlying the data

• Distributed Representations
• Activation of each neuron is a function of Localized Representation

multiple features of the previous layer


• Feature combinations of different
neurons are not necessarily mutually
exclusive
• Sparse Representations
Distributed Representation
• Only 1-4% neurons are active at a time

1
3
Local vs. Distributed in Input
Space
• Local Methods
• Assume smoothness prior
• g(x) = f(g(x1), g(x2), …, g(xk))
• {x1, x2, …, xk} are neighbours of x
• Require a metric space
• A notion of distance or similarity in the input space
• Fail when the target function is highly varying
• Examples
• Nearest Neighbour methods
• Kernel methods with a Gaussian kernel
• Distributed Methods
• No assumption of smoothness 🡺 No need for a notion of similarity
• Ex: Neural networks 1
4
Multi-task Learning

Source:
1
[Link] 5
Desiderata for Learning AI
• Ability to learn complex, highly-varying functions
• Ability to learn multiple levels of abstraction with little human input
• Ability to learn from a very large set of examples
• Training time linear in the number of examples
• Ability to learn from mostly unlabeled data
• Unsupervised and semi-supervised
• Multi-task learning
• Sharing of representations across tasks
• Fast predictions

1
6
What is Deep Learning?

•Deep Learning is a subset of Machine Learning
that uses mathematical functions to map the
input
• to the output.
•These functions can extract non-redundant
information or patterns from the data, which
enables

them to form a relationship between
the input and the output.
•This is known as learning, and the process
of learning is called training.
Deep Learning
• Modern deep learning models use artificial neural
networks or simply neural networks to extract
information.
• These neural networks are made up of a simple
mathematical function that can be stacked on top of
each other and arranged in the form of layers, giving
them a sense of depth, hence the term Deep Learning.
• Deep learning can also be thought of as an approach to
Artificial Intelligence, a smart combination of hardware
and software to solve tasks requiring human
intelligence.
Deep Learning

•Deep Learning was first theorized in the 1980s,
but it has only become useful recently because:
– It requires large amounts of labeled data
– It requires significant computational power (high
performing GPUs)
Deep Learning
• Modern deep learning models use artificial neural
networks or simply neural networks to extract
information.
• These neural networks are made up of a simple
mathematical function that can be stacked on top of
each other and arranged in the form of layers, giving
them a sense of depth, hence the term Deep Learning.
• Deep learning can also be thought of as an approach to
Artificial Intelligence, a smart combination of hardware
and software to solve tasks requiring human
intelligence.
Why Deep Learning ?
Deep Learning vs. Machine Learning


•On the downside, deep learning is
computationally expensive compared to machine
learning,
• which also means that it requires a lot
of time to process.
•Deep Learning and Machine Learning are both
capable

of different types of learning:
Supervised Learning (labeled data),
Unsupervised Learning (unlabeled data), and
Reinforcement Learning.
•But their usefulness is usually determined by
• Machine learning requires data preprocessing, which
involves human intervention.
• The neural networks in deep learning are capable of
extracting features; hence no human intervention is
required.
• Deep Learning can process unstructured data.
• Deep Learning is usually based on representative
learning i.e., finding and extracting vital information or
patterns that represent the entire dataset.
• Deep learning is computationally expensive and time-
consuming.
How does Deep Learning
work?
• Deep Neural Networks have multiple layers of
interconnected artificial neurons or nodes that are
stacked together.
• Each of these nodes has a simple mathematical
function - usually a linear function that performs
extraction and mapping of information.
• There are three layers to a deep neural network: the
input layer, hidden layers, and the output layer.
How does Deep Learning
work?
Programming Patterns
Deep Learning vs. Machine Learning
Deep Learning: Applications
Types of Neural Network

• Artificial Neural Network


• Convolutional Neural Network
• Recurrent Neural Network
• Generative Adversarial Network
CN
N
• The Convolutional Neural Networks or CNNs are
primarily used for tasks related to computer vision
or image processing.
• CNNs are extremely good in modeling spatial data
such as 2D or 3D images and videos.
• They can extract features and patterns within an
image, enabling tasks such as image classification
or object detection.
CN
N
RN
N
• The Recurrent Neural Networks or RNN are
primarily used to model sequential data, such as
text, audio, or any type of data that represents
sequence or time.
• They are often used in tasks related to natural
language processing (NLP).
RN
N
GA
N
• Generative adversarial networks or GANs are
frameworks that are used for the tasks related to
unsupervised learning.
• This type of network essentially learns the
structure of the data, and patterns in a way that it
can be used to generate new examples, similar to
that of the original dataset.
GA
N
Transformers

• Transformers are the new class deep learning


model that is used mostly for the tasks related to
modeling sequential data, like that in NLP.
• It is much more powerful than RNNs and they are
replacing them in every task.
• Recently, transformers are also being applied in
computer vision tasks and they are proving to be
quite effective than the traditional CNNs.
Deep Learning: Limitations

• Data availability
• The complexity of the model
• Lacks global generalization
• Incapable of Multitasking

Hardware dependence
Deep Learning: Limitations

• Data availability
– Deep learning models require a lot of data to
learn the representation, structure, distribution,
and pattern of the data.
– If there isn't enough varied data available, then
the model will not learn well and will lack
generalization (it won't perform well on unseen
data).
– The model can only generalize well if it is trained
on large amounts of data.
Deep Learning: Limitations

• The complexity of the model


– Designing a deep learning model is often a trial
and error process.
– A simple model is most likely to underfit, i.e. not
able to extract information from the training set, and
a very complex model is most likely to overfit, i.e.,
not able to generalize well on the test dataset.
– Deep learning models will perform well when their
complexity is appropriate to the complexity of the
data.
Deep Learning: Limitations

• Lacks global generalization


– A simple neural network can have thousands to tens
of thousands of parameters.
– The idea of global generalization is that all the
parameters in the model should cohesively update
themselves to reduce the generalization error or test
error as much as possible. However, because of the
complexity of the model, it is very difficult to achieve
zero generalization error on the test set.
– Hence, the deep learning model will always lack global
generalization which can at times yield wrong results.
Deep Learning: Limitations

• Incapable of Multitasking
– Deep neural networks are incapable of
multitasking.
– These models can only perform targeted tasks, i.e.,
process data on which they are trained. For
instance, a model trained on classifying cats and
dogs will not classify men and women.
– Furthermore, applications that require reasoning
or general intelligence are completely beyond
what the current generation’s deep learning
techniques can do, even with large sets of data.
Deep Learning: Limitations

• Hardware dependence
– As mentioned before, deep learning models are
computationally expensive.
– These models are so complex that a normal CPU will
not be able to withstand the computational complexity.
– However, multicore high-performing graphics
processing units (GPUs) and tensor processing units
(TPUs) are required to effectively train these models in
a shorter time.
– Although these processors save time, they are
expensive and use large amounts of energy.
source:[Link]/deep--learning--cours
es
Machine Learning

Input: X Output: Y

Label”motorcycle”
Why is it hard?
You see this

But the camera sees this:


Raw Image
pixel 1
Representation
Learning
Algorithm
pixel 2

pixel 2
Cars
“Non”--Car
s

pixel 1
Raw Image
pixel 1
Representation
Learning
Algorithm
pixel 2

pixel 2
Cars
“Non”--Car
s

pixel 1
Raw Image
pixel 1
Representation
Learning
Algorithm
pixel 2

pixel 2
Cars
“Non”--Car
s

pixel 1
Better Feature
Representation?
handle
s
Learning
Algorithm
wheel

Cars
“Non”--Car
pixel 2
s

handle
s

pixel 1 wheel
Feature
Representations

Expert
Knowledge!

Source: feature representations in computer vision(Honglak lee)


Deep Learning: learn
representations!

Source:Lee [Link].
So, 1. what exactly is deep learning ?

And, 2. why is it generally better than other methods on


image, speech and certain other types of data?

The short answers


1. ‘Deep Learning’ means using a neural network
with several layers of nodes between input and output

2. the series of layers between input & output do


feature identification and processing in a series of
stages, just as our brains seem to.

Sources:
hmmm… OK,
but:
3. multilayer neural networks have been around for
25 years. What’s actually new?

we have always had good algorithms for learning the


weights in networks with 1 hidden layer

but these algorithms are not good at learning the weights for
networks with more hidden layers

what’s new is: algorithms for training many--later networks

[Link]
Single Unit, Input, weights, activation function, output

Bias x0

w0
f(x) = g(w0 x0+ w1 x1+ w2 x2 )

Input x1
w1
f(x) outpu
t

w2 Activation
functions:
1. linear
Input x2
2. Sigmoid
3. Tanh
4. Relu
5. Softma
x etc.
[Link]
A dataset
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Train the deep neural network

[Link]
A dataset
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 Initialize with random weights
etc …

1.4

2.7 0.7 (0)

Error=0.7
1.9

Compare with the


target output

[Link]
Adjust weights based on error

1.4

2.7 0.7 (0)

Error=0.7
1.9

Repeat this thousands, maybe millions of times – each


time taking a random training instance, and making slight
weight adjustments
Algorithms for weight adjustment are designed to make
changes that will reduce the error

[Link]
Inputs and Outputs
256 X 256
Matrix

DL model

4-Element Vector

X Y

1
2 A
3 C M
4 T F
5 G
6

With deep learning, we are searching for a surjective


(or onto) function f from a set X to a set Y.
Datase
Learning Principle
t

Output/Prediction

Target
x … x Output
..
x n

1
2 Error - =
: 5
(Image Credit: NVIDIA Deep Learning Institute)
Learning Principle

Output/Prediction

Target
x … x Output
..
x n

1
2 Error - =
: 15
(Image Credit: NVIDIA Deep Learning Institute)
Learning Principle

Output/Prediction

Target
x … x Output
..
x n

1
2 Error - =
: 2.5
(Image Credit: NVIDIA Deep Learning Institute)
Supervised Deep Learning with Neural
Networks
Input Hidden Layers Output
From one layer to the next

X1
W1

X2 W2
f is the activation function,
Wi is the weight, and bi is Y3
the bias.
X3 W3
Training - Minimizing the Loss
The loss function with regard to weights Input Output
and biases can be defined as

W1, b1 X1

Y2

The weight update is computed by moving W2, b2 X2


a step to the opposite direction of the cost
gradient. L

W3, b3 X3

Iterate until L stops decreasing.


Convolution in 2D

(Image Credit: Applied Deep Learning | Arden Dertat)


Convolution Kernel

(Image Credit: Applied Deep Learning | Arden Dertat)


Convolution on Image

Image Credit: Deep Learning Methods for Vision | CVPR 2012 Tutorial
Activation Functions

Image Credit: [Link]


Introducing Non-Linearity (ReLU)

Image Credit: Deep Learning Methods for Vision | CVPR 2012 Tutorial
Max Pooling

(Image Credit: Applied Deep Learning | Arden Dertat)


Pooling - Max-Pooling and Sum-Pooling

Image Credit: Deep Learning Methods for Vision | CVPR 2012 Tutorial
Convolutional Neural
Networks
A convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward
artificial neural networks that explicitly assumes that the inputs are images, which allows
us to encode certain properties into the architecture.

LeNet-5 Architecture (Image Credit: [Link]


CIFAR 10 and Convolutional Neural Network

CIFAR 10 dataset:
50,000 training images
10,000 testing images
10 categories (classes)
Accuracies from different

methods: Human: ~94%

Whitening K--mean: 80%


……

Deep CNN: 95.5%

[Link]
0/ [Link]
Deep Convolutional Neural Networks on
CIFAR10

convolution2D MaxPooling2D convolution2D

Fully--connected

output
Dropout MaxPooling2D

Convolutional Layer: filters work on every


part of the image, therefore, they are
searching for the same feature everywhere
in the image.

Input image Convolutional output


[Link]
Deep Convolutional Neural Networks on
CIFAR10

convolution2D MaxPooling2D convolution2D

Fully--connected

output
Dropout MaxPooling2D

Convolutional output
MaxPoolin MaxPooling: usually present after
g the convolutional layer. It provides a
(2,2)
down--sampling of the convolutional
output

[Link]
Deep Convolutional Neural Networks on
CIFAR10

convolution2D MaxPooling2D convolution2D

Fully--connected

output
Dropout MaxPooling2D

Dropout: randomly drop units along with their


connections during training. It helps to learn more
robust features by reducing complex co--adaptations
of units and alleviate overfitting issue as well.

Srivastava et al. Dropout: A Simple way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research
Deep Convolutional Neural Networks on
CIFAR10

convolution2D MaxPooling2D convolution2D

Fully--connected

output
Dropout MaxPooling2D

hidde
inpu n
outpu Fully--connected layer (dense): eachnode
t connected
is fully to all input nodes, each node computes
t
weighted sum of all input nodes. It has one--
dimensional structure. It helps to classify input pattern
with high--level features extracted by previous layers.

[Link]
Why GPU Matters in Deep Learning?

vs
Running time without GPU Running time with GPU

With GPU, the running time is 733/27=27.1 times faster then the running time without GPU!!!

Again, WHY GPUs?


1. Every set of weights can be stored as a matrix (m,n)
2. GPUs are made to do common parallel problems fast. All similar calculations are done at the
same time. This extremely boosts the performance in parallel computations.

Blankevoort T., Neural Networks and Deep Learning, slides

You might also like