INTRODUCTION TO
Deep learning
Index
• Perceptron Learning Rule
• What is Deep Learning
• Types of Perceptron's model
• Why Deep Learning
• Perceptron's Function
• Layers in Deep Neural Network
• Inputs of a Perceptron
• The architecture of Neural Network • Activation Functions of Perceptron
• Machine Learning vs Deep Learning • Output of Perceptron
• How Machine Learning Works • Error in Perceptron
• How Deep Learning Works • Perceptron: Decision Function
• Deep Learning Applications • Bias Unit
• Deep Learning Process • Activation Functions
• Deep Learning Models • Types of Activation Functions
• CNN • Step Function
• Operations of CNN • Sigmoid Function
• RNN • ReLU
• ANN • Leaky ReLU
• Biological Neuron • Tanh Function
• Artificial Neuron • Softmax Function
• Biological Neuron vs Artificial Neuron • Activation Functions At Glance
What is Deep Learning?
• Deep learning is a machine learning technique that teaches computers to do what comes naturally to
humans.
• Example, Deep learning is a key technology behind driverless cars, enabling them to recognize a stop sign.
• In deep learning, a computer model learns to perform classification tasks directly from images, text, or
sound.
• Deep learning models are capable enough to focus on the accurate features themselves by requiring a little
guidance from the programmer.
• Deep learning is implemented with the help of Neural Networks, and the idea behind the motivation of
neural network is the biological neurons, which is nothing but a brain cell
Why Deep Learning Matters
• Accuracy. Deep learning achieves recognition
accuracy at higher levels than ever before. This
helps consumer electronics meet user
expectations and is crucial for safety-critical
applications like driverless cars.
• Deep learning requires large amounts of labeled
data. For example, driverless car development
requires millions of images and thousands of hours
of video.
Layers in Deep Neural Network
• Most deep learning methods use neural network architectures, which is why deep learning models are
often referred to as deep neural networks.
• The term “deep” usually refers to the number of hidden layers in the neural network. Traditional neural
networks only contain 2-3 hidden layers, while deep networks can have as many as 150.
Input Layer– This layer will accept the data and pass it to the rest of the
network.
Hidden Layer– The second type of layer is called the hidden layer. Hidden
layers are either one or more in number for a neural network. Hidden layers
are the ones that are actually responsible for the excellent performance and
complexity of neural networks. They perform multiple functions at the same
time such as data transformation, automatic feature creation, etc.
Output layer–The output layer holds the result or the output of the problem.
Raw images get passed to the input layer and we receive output in the output
layer.
Architectures
Shallow Neural Network:
• The Shallow neural network has only one hidden layer between the input and output.
Deep Neural Networks
• It is a neural network that incorporates the complexity of a certain level, which means several numbers of
hidden layers are encompassed in between the input and output layers. They are highly proficient in model
and process non-linear associations
Machine Learning Vs Deep
Learning
• In broad terms, deep learning is a subset of machine learning, and machine learning is a subset
of artificial intelligence.
How Machine Learning Works?
• The working of machine learning models can be understood by the example of identifying the
image of a cat or dog.
• To identify this, the ML model takes images of both cats and dogs as input, extracts the different
features of images such as shape, height, nose, eyes, etc., applies the classification algorithm, and
predicts the output.
How Deep Learning Works
• We can understand the working of deep learning with the same example of identifying cat vs.
dog.
• The deep learning model takes the images as the input and feeds it directly to the algorithms
without requiring any manual feature extraction step.
• The images pass to the different layers of the artificial neural network and predict the final output
• In the table below, we summarize the difference between machine
learning and deep learning.
Deep Learning Applications
• Self-Driving Cars
• Voice Controlled Assistance
• Automatic Image Caption Generation
• Automatic Machine Translation
• Automatic Game Playing
• Language Translations
Deep learning Process.
• A deep neural network provides state-of-the-art accuracy in many tasks, from object detection to
speech recognition. They can learn automatically, without predefined knowledge explicitly coded
by the programmers.
• A neural network works quite the same. Each layer represents a deeper level of knowledge, i.e.,
the hierarchy of knowledge. A neural network with four layers will learn more complex feature
than with that with two layers.
The learning occurs in two
phases.
The first phase consists of applying a nonlinear transformation of the input and create a
statistical model as output.
The second phase aims at improving the model with a mathematical method known as
derivatives.
The neural network repeats these two phases hundreds to thousands of times until it has reached a
tolerable level of accuracy. The repeat of this two-phase is called an iteration.
Deep Learning Models
Some popular deep-learning models are:
• Convolutional Neural Network (CNN)
• Recurrent Neural Network (RNN)
• Auto encoders
• Classic Neural Networks, etc.
What is CNN?
• Convolutional Neural Networks (ConvNets or CNNs) are a category of neural networks that have proven
very effective in areas such as image recognition and classification.
• ConvNets have been successful in identifying faces, objects, and traffic signs apart from powering vision
in robots and self-driving cars.
• Using CNNs for deep learning is popular due to three important factors:
• CNN eliminates the need for manual feature extraction—the features are learned directly by the CNN.
• CNN produces highly accurate recognition results.
• CNN can be retrained for new recognition tasks, enabling you to build on pre-existing networks.
Operations of CNN
There are four main operations in the ConvNets:
• Convolution
• Non Linearity (ReLU)
• Pooling or Sub Sampling
• Classification (Fully Connected Layer)
Operations of CNN(cont.)
• Convolution puts the input images through a set of convolutional filters, each of which activates certain
features from the images.
• Rectified linear unit (ReLU) allows for faster and more effective training by mapping negative values to zero and
maintaining positive values. This is sometimes referred to as activation because only the activated features are
carried forward into the next layer.
• Pooling (also called subsampling or downsampling) reduces the dimensionality of each feature map but retains
the most important information. Spatial Pooling can be of different types: Max, Average, Sum, etc.
• Fully Connected Layer: The term “Fully Connected” implies that every neuron in the previous layer is connected
to every neuron on the next layer.
These operations are repeated over tens or hundreds of layers, with each layer learning to identify different
features.
Example of a network with many convolutional layers. Filters are applied to each training
image at different resolutions, and the output of each convolved image is used as the input to
the next layer.
Recurrent Neural Networks
(RNNs)
• Recurrent Neural Network (RNN) are a type of Neural Network where the output from the previous step is
fed as input to the current step.
• The main and most important feature of RNN is Hidden state, which remembers some information about a
sequence.
• RNN have a “memory” which remembers all information about what has been calculated.
• It uses the same parameters for each input as it performs the same task on all the inputs or hidden layers to
produce the output.
• This reduces the complexity of parameters, unlike other neural networks
Formula for calculating current state:
Formula for applying Activation Function:
Formula for calculating output:
Artificial Neural Network(ANN)
• The Artificial Neural Network (ANN) is a deep learning method that arose from the concept of the human
brain’s Biological Neural Networks.
• The development of ANN was the result of an attempt to replicate the workings of the human brain.
• The human brain has a mind to think and analyze any task in a particular situation.
But how can a machine think like that?
• For this purpose, an artificial brain was designed known as a neural network. The neural network
is made up of many perceptron's.
• A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm enables neurons
to learn and processes elements in the training set one at a time
Biological Neuron
• A human brain has billions of neurons.
• Neurons are interconnected nerve cells in the human brain that are involved in processing and
transmitting chemical and electrical signals.
• Dendrites are branches that receive information from other neurons.
• The cell nucleus or Soma processes the information received from dendrites.
• Axon is a cable that is used by neurons to send information. Synapse is the connection between an
axon and other neuron dendrites.
Artificial Neuron
• Artificial neuron also known as perceptron is the basic unit of the neural network.
Each artificial neuron has the following main functions:
• Takes inputs from the input layer
• Weighs them separately and sums them up
• Pass this sum through a nonlinear function to produce output.
Biological Neuron vs. Artificial Neuron
• The biological neuron is analogous to artificial neurons in the following terms:
Biological Neuron Artificial Neuron
Cell Nucleus (Soma) Node
Dendrites Input
Weights or
Synapse
interconnections
Axon Output
Perceptrons
• A perceptron is a neural network unit (an artificial neuron) that does certain computations to detect
features or business intelligence in the input data.
• Perceptron was introduced by Frank Rosenblatt in 1957.
The perceptron(neuron) consists of 4 parts:
• Input values or One input layer
We pass input values to a neuron using this layer. It might be something as simple as a collection of
array values. It is similar to a dendrite in biological neurons.
• Weights and Bias
Weights are a collection of array values that are multiplied by the respective input values. We then
take a sum of all these multiplied values which is called a weighted sum. Next, we add a bias value to
the weighted sum to get the final value for prediction by our neuron.
• Activation Function
The activation function decides whether or not a neuron is fired. It decides which of the two output
values should be generated by the neuron.
• Output Layer
The output layer gives the final output of a neuron which can then be passed to other neurons in
the network or taken as the final output value.
Perceptron Learning Rule
• Perceptron Learning Rule states that the algorithm would automatically learn the optimal weight coefficients.
The input features are then multiplied with these weights to determine if a neuron fires or not.
• The Perceptron receives multiple input signals, and if the sum of the input signals exceeds a certain
threshold, it either outputs a signal or does not return an output.
• In the context of supervised learning and classification, this can then be used to predict the class of a sample.
Types of Perceptron Models
Single Layer Perceptron Model:
• The main objective of the single-layer perceptron model is to analyze the linearly separable objects with binary
outcomes.
Multi-Layered Perceptron Model:
• Like a single-layer perceptron model, a multi-layer perceptron model also has the same model structure but a
greater number of hidden layers.
• The multi-layer perceptron model is also known as the Backpropagation algorithm, which executes in two stages
as follows:
• Forward Stage: Activation functions start from the input layer in the forward stage and terminate on the output
layer.
• Backward Stage: In the backward stage, weight and bias values are modified as per the model's requirement. In
this stage, the error between actual output and demands originates backward on the output layer and ended on
the input layer.
Perceptron Function
• Perceptron is a function that maps its input “x,” which is multiplied with the learned weight coefficient; an
output value ”f(x)”is generated.
In the equation given above:
“w” = vector of real-valued weights
“b” = bias (an element that adjusts the boundary away from origin without any dependence on
the input value)
“x” = vector of input x values
“m” = number of inputs to the Perceptron
The output can be represented as “1” or “0.” It can also be represented as “1” or “-1”
depending on which activation function is used.
Inputs of a Perceptron
• A Perceptron accepts inputs, moderates them with certain weight values, then applies the
transformation function to output the final result. The image below shows a Perceptron with a
Boolean output.
• A Boolean output is based on inputs such as salaried, married, age, past credit profile, etc. It has only two values: Yes
and No or True and False. The summation function “∑” multiplies all inputs of “x” by weights “w” and then adds them
up as follows:
Activation Functions of Perceptron
• The activation function applies a step rule (convert the numerical output into +1 or -1) to check if
the output of the weighting function is greater than zero or not.
Step function gets triggered above a certain value of the neuron output; else it outputs zero. Sign Function outputs +1
or -1 depending on whether neuron output is greater than zero or not. Sigmoid is the S-curve and outputs a value
between 0 and 1.
Output of Perceptron
Perceptron with a Boolean output:
Inputs: x1…xn
Output: o(x1….xn)
Weights: wi=> contribution of input xi to the Perceptron output;
w0=> bias or threshold
If ∑w.x > 0, the output is +1, else -1. The neuron gets triggered only when the weighted input reaches a certain
threshold value.
An output of +1 specifies that the neuron is triggered. An output of -1 specifies that the neuron did not get triggered.
“sgn” stands for sign function with output +1 or -1.
Error in Perceptron
• In the Perceptron Learning Rule, the predicted output is compared with the known output. If it does not
match, the error is propagated backward to allow weight adjustment to happen.
Perceptron: Decision Function
A decision function φ(z) of Perceptron is defined to take a linear combination of x and w vectors.
The value z in the decision function is given by:
The decision function is +1 if z is greater than a threshold θ, and it is -1 otherwise.
This is the Perceptron algorithm.
Bias Unit
• For simplicity, the threshold θ can be brought to the left and represented as w0.x0, where w0= -θ and x0=
1.
The value w0 is called the bias unit.
The decision function then becomes:
Output: The figure shows how the decision function squashes wTx to either +1 or -1 and how it can be used to
discriminate between two linearly separable classes.
Activation Functions
• The activation function decides whether a neuron should be activated or not by calculating the
weighted sum and further adding bias to it.
• The neuron is basically is a weighted average of input, then this sum is passed through an
activation function to get an output.
Y = ∑ (weights*input + bias)
• Here Y can be anything for a neuron between range -infinity to +infinity. So, we have to bound our
output to get the desired prediction or generalized results.
Y = Activation function(∑ (weights*input + bias))
• So, we pass that neuron to activation function to bound output values.
Types of Activation Functions –
• The function f can be a different type of activation function.
Step Function:
• The Step Function is one of the simplest kinds of activation functions. In this, we consider a
threshold value and if the value of net input say y is greater than the threshold then the neuron is
activated.
• Mathematically,
a neuron or perceptron activates only when the value
of x exceeds the threshold value, 0. Otherwise, it
outputs 0.
Sigmoid Function:
• The main reason why we use the sigmoid function is that it exists between (0 to 1). Therefore, it is
especially used for models where we have to predict the probability as an output. Since the
probability of anything exists only between the range of 0 and 1.
• The sigmoid activation function translates the input ranged in (-∞,∞) to the range in (0,1)
The Sigmoid Function curve looks like an S-shape.
The equation for the sigmoid function is
ReLU(Rectifier Linear Unit):
• The ReLU is the most used activation function in the world right now.
• The main advantage of using the ReLU function over other activation functions is that it does not
activate all the neurons at the same time. What does this mean? If you look at the ReLU function
if the input is negative it will convert it to zero and the neuron does not get activated.
The equation for the ReLU function is
Leaky ReLU:
• Leaky Relu is a variant of ReLU.
• Instead of defining the ReLU function as 0 for x less than 0, we define it as a small linear
component of x.
The Leaky ReLU function is defined as :
Tanh(hyperbolic tangent) Function:
• Tanh is also like the sigmoid activation function but better.
• Unlike a sigmoid function that will map input values between 0 and 1, the Tanh will map values
between -1 and 1.
• The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be
mapped near zero in the Tanh graph.
• Tanh is also sigmoidal (s-shaped).
The equation for the Tanh function is
Softmax Function :
• The softmax function is also a type of sigmoid function but is handy when we are trying to handle
classification problems.
• Usually used when trying to handle multiple classes. The softmax function would squeeze the
outputs for each class between 0 and 1 and would also divide by the sum of the outputs.
• The softmax function is ideally used in the output layer of the classifier where we are actually
trying to attain the probabilities to define the class of each input.
The equation for the softmax function is
Activation Functions At Glance
Thank You!