Understanding Artificial Neural Networks
Understanding Artificial Neural Networks
(SITA3007)
UNIT – II
Unit II: Basic concepts - Single layer Perceptron - Multilayer Perceptron - Supervised and Unsupervised
learning -deep learning algorithms - Back propagation Networks - Performance Issues.
Introduction
The term "Artificial Neural Network" is derived from Biological neural networks that develop
the structure of a human brain. Similar to the human brain that has neurons interconnected to
one another, artificial neural networks also have neurons that are interconnected to one
another in various layers of the networks. These neurons are known as nodes.
Fig 1
The given figure illustrates the typical diagram of Biological Neural Network.
Fig 2
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell
nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
Relationship between Biological neural network and artificial neural network:
An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic the
network of neurons makes up a human brain so that computers will have an option to
understand things and make decisions in a human-like manner. The artificial neural network is
designed by programming computers to behave simply like interconnected brain cells.
There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.
We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs. If
one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off," then
we get "Off" in output. Here the output depends upon input. Our brain does not perform the
same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."
Fig 3
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results
in output that is conveyed using this layer.
Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
Data that is used in traditional programming is stored on the whole network, not on a database.
The disappearance of a couple of pieces of data in one place doesn't prevent the network from
working.
After ANN training, the information may produce output even with inadequate data. The loss of
performance here relies upon the significance of missing data.
For ANN is to be able to adapt, it is important to determine the examples and to encourage the
network according to the desired output by demonstrating these examples to the network. The
succession of the network is directly proportional to the chosen instances, and if the event can't
appear to the network in all its aspects, it can produce false output.
Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.
There is no particular guideline for determining the structure of artificial neural networks. The
appropriate network structure is accomplished through experience, trial, and error.
It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
ANNs can work with numerical data. Problems must be converted into numerical values before
being introduced to ANN. The presentation mechanism to be resolved here will directly impact
the performance of the network. It relies on th
the user's abilities.
The network is reduced to a specific value of the error, and this value does not give us optimum
results.
Artificial Neural Network can be best represented as a weighted directed graph, where the
artificial neurons form the nodes. The association between the neurons outputs and neuron
inputs can be viewed as the directed edges with weights. The Artificial Neural Network receives
the input signal from the external source in the form of a pattern and image in the form of a
vector. These inputs are then mathematically assigned by the notations x(n) for every n number
of inputs.
Fig 4
Afterward, each of the input is multiplied by its corresponding weights ( these weights are the
details utilized by the artificial neural networks to solve a specific problem ). In general terms,
these weights normally represent the strength of the interconnection between neurons inside
the artificial neural network. All the weighted inputs are summarized inside the computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity. Here,
to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or non-
linear sets of functions. Some of the commonly used sets of activation functions are the Binary,
linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look at each of them in
details:
There are various types of Artificial Neural Networks (ANN) depending upon the human brain
neuron and network functions, an artificial neural network similarly performs tasks. The
majority of the artificial neural networks will have some similarities with a more complex
biological partner and are very effective at their expected tasks. For example, segmentation or
classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved results
internally. As per the University of Massachusetts, Lowell Centre for Atmospheric Research. The
feedback networks feed information back into itself and are well suited to solve optimization
issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output layer,
and at least one layer of a neuron. Through assessment of its output by reviewing its input, the
intensity of the network can be noticed based on group behavior of the associated neurons,
and the output is decided. The primary advantage of this network is that it figures out how to
evaluate and recognize input patterns.
Perceptron model
Perceptron model is also treated as one of the best and simplest types of Artificial Neural
networks. However, it is a supervised
ervised learning algorithm of binary classifiers. Hence, we can
consider it as a single-layer
layer neural network with four main parameters, i.e., input values,
weights and Bias, net sum, and an activation function.
Fig 5
Fig 6
o Activation Function:
These are the final and important components that help to determine whether the neuron will
fire or not. Activation Function can be considered primarily as a step function.
Sign function
Step function, and
Sigmoid function
Fig 7
The data scientist uses the activation function to take a subjective decision based on various
vari
problem statements and forms the desired outputs. Activation function may differ (e.g., Sign,
Step, and Sigmoid) in perceptron models by checking whether the learning process is slow or
has vanishing or exploding gradients.
Fig 8
Perceptron models are divided into two types.
1. Single-layer
layer Perceptron Model
2. Multi-layer
layer Perceptron model
"Single-layer
layer perceptron can learn only linearly separable pattern
patterns."
Fig 9
Multi-Layered Perceptron Model:
Like a single-layer perceptron model, a multi-layer perceptron model also has the same model
structure but has a greater number of hidden layers.
The multi-layer perceptron model is also known as the Backpropagation algorithm, which
executes in two stages as follows:
o Forward Stage: Activation functions start from the input layer in the forward stage and
terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified as per the
model's requirement. In this stage, the error between actual output and demanded
originated backward on the output layer and ended on the input layer.
Hence, a multi-layered perceptron model has considered as multiple artificial neural networks
having various layers in which activation function does not remain linear, similar to a single
layer perceptron model. Instead of linear, activation function can be executed as sigmoid, TanH,
ReLU, etc., for deployment.
A multi-layer perceptron model has greater processing power and can process linear and non-
linear patterns. Further, it can also implement logic gates such as AND, OR, XOR, NAND, NOT,
XNOR, NOR.
Supervised learning is a type of machine learning algorithm that learns from labeled data.
Labeled data is data that has been tagged with a correct answer or classification.
Supervised learning, as the name indicates, has the presepresence
nce of a supervisor as a teacher.
Supervised learning is when we teach or train the machine using data that is well-labelled.
well
Which means some data is already tagged with the correct answer. After that, the machine is
provided with a new set of examples(da
examples(data)
ta) so that the supervised learning algorithm analyses
the training data(set of training examples) and produces a correct outcome from labeled data.
For example, a labeled dataset of images of Elephant, Camel and Cow would have each image
tagged with eitherr “Elephant” , “Camel”or “Cow.”
Fig 10
Unsupervised Learning
Unsupervised learning
ng is a type of machine learning that learns from unlabeled data. This
means that the data does not have any pre pre-existing
existing labels or categories. The goal of
unsupervised learning is to discover patterns and relationships in the data without any explicit
guidance.
Unsupervised learning is the training of a machine using information that is neither classified
nor labeled and allowing the algorithm to act on that information without guidance. Here the
task of the machine is to group unsorted information accordi
according
ng to similarities, patterns, and
differences without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will be given to the
machine. Therefore the machine is restricted to find the hidden structure in unlabeled
u data by
itself.
You can use unsupervised learning to examine the animal data that has been gathered and
distinguish between several groups according to the traits and actions of the animals. These
groupings might correspond to various animal spec species,
ies, providing you to categorize the
creatures without depending on labels that already exist.
Fig 11
Unsupervised learning allows the model to discover patterns and relationships in
unlabeled data.
Clustering algorithms group similar data points together based on their inherent
characteristics.
Feature extraction captures essential information from the data, enabling the model to
make meaningful distinctions.
Label association assigns categories to the clusters based on the extracted patterns and
characteristics.
Model We can test our model. We can not test our model.
Deep learning algorithms are dynamically made to run through several layers of neural
networks, which are nothing but a set of decision-making networks that are pre-trained to
serve a task. Later, each of these is passed through simple layered representations and move
on to the next layer. However, most machine learning is trained to work fairly well on datasets
that have to deal with hundreds of features or columns. For a data set to be structured or
unstructured, machine learning tends to fail mostly because they fail to recognize a simple
image having a dimension of 800x1000 in RGB. It becomes quite unfeasible for a traditional
machine learning algorithm to handle such depths. This is where deep learning.
CNN's popularly known as ConvNets majorly consists of several layers and are specifically used
for image processing and detection of objects. It was developed in 1998 by Yann LeCun and was
first called LeNet. Back then, it was developed to recognize digits and zip code characters. CNNs
have wide usage in identifying the image of the satellites, medical image processing, series
forecasting, and anomaly detection.
CNNs process the data by passing it through multiple layers and extracting features to exhibit
convolutional operations. The Convolutional Layer consists of Rectified Linear Unit (ReLU) that
outlasts to rectify the feature map. The Pooling layer is used to rectify these feature maps into
the next feed. Pooling is generally a sampling algorithm that is down-sampled and it reduces
the dimensions of the feature map. Later, the result generated consists of 2-D arrays consisting
of single, long, continuous, and linear vector flattened in the map. The next layer i.e.,
called Fully Connected Layer which forms the flattened matrix or 2-D array fetched from the
Pooling Layer as input and identifies the image by classifying it.
Fig 12
LSTMs can be defined as Recurrent Neural Networks (RNN) that is programmed to learn and
adapt for dependencies for the long term. It can memorize and recall past data for a greater
period and by default, it is its sole behavior. LSTMs are designed to retain over time and
henceforth they are majorly used in time series predictions because they can restrain memory
or previous inputs. This analogy comes from their chain-like structure consisting
of four interacting layers that communicate with each other differently. Besides applications of
time series prediction, they can be used to construct speech recognizers, development in
pharmaceuticals, and composition of music loops as well.
LSTM work in a sequence of events. First, they don't tend to remember irrelevant details
attained in the previous state. Next, they update certain cell-state values selectively and finally
generate certain parts of the cell-state as output. Below is the diagram of their operation.
Fig 13
Recurrent Neural Networks or RNNs consist of some directed connections that form a cycle that
allow the input provided from the LSTMs to be used as input in the current phase of RNNs.
These inputs are deeply embedded as inputs and enforce the memorization ability of LSTMs
lets these inputs get absorbed for a period in the internal memory. RNNs are therefore
dependent on the inputs that are preserved by LSTMs and work under the synchronization
phenomenon of LSTMs. RNNs are mostly used in captioning the image, time series analysis,
recognizing handwritten data, and translating data to machines.
RNNs follow the work approach by putting output feeds (t-1) time if the time is defined
as t. Next, the output determined by t is feed at input time t+1. Similarly, these processes are
repeated for all the input consisting of any length. There's also a fact about RNNs is that they
store historical information and there's no increase in the input size even if the model size is
increased. RNNs look something like this when unfolded.
Fig 14
GANs are defined as deep learning algorithms that are used to generate new instances of data
that match the training data. GAN usually consists of two components namely a generator that
learns to generate false data and a discriminator that adapts itself by learning from this false
data. Over some time, GANs have gained immense usage since they are frequently being used
to clarify astronomical images and simulate lensing the gravitational dark matter. It is also used
in video games to increase graphics for 2D textures by recreating them in higher resolution
like 4K. They are also used in creating realistic cartoons character and also rendering human
faces and 3D object rendering.
GANs work in simulation by generating and understanding the fake data and the real data.
During the training to understand these data, the generator produces different kinds of fake
data where the discriminator quickly learns to adapt and respond to it as false data. GANs then
send these recognized results for updating. Consider the below image to visualize the
functioning.
Fig 15
RBFNs are specific types of neural networks that follow a feed-forward approach and make use
of radial functions as activation functions. They consist of three layers namely the input layer,
hidden layer, and output layer which are mostly used for time-series prediction, regression
testing, and classification.
RBFNs do these tasks by measuring the similarities present in the training data set. They usually
have an input vector that feeds these data into the input layer thereby confirming the
identification and rolling out results by comparing previous data sets. Precisely, the input layer
has neurons that are sensitive to these data and the nodes in the layer are efficient in
classifying the class of data. Neurons are originally present in the hidden layer though they
work in close integration with the input layer. The hidden layer contains Gaussian
transfer functions that are inversely proportional to the distance of the output from the
neuron's center. The output layer has linear combinations of the radial-based data where the
Gaussian functions are passed in the neuron as parameter and output is generated. Consiider
the given image below to understand the process thoroughly.
Fig 16
MLPs are the base of deep learning technology. It belongs to a class of feed-forward neural
networks having various layers of perceptrons. These perceptrons have various activation
functions in them. MLPs also have connected input and output layers and their number is the
same. Also, there's a layer that remains hidden amidst these two layers. MLPs are mostly used
to build image and speech recognition systems or some other types of the translation software.
The working of MLPs starts by feeding the data in the input layer. The neurons present in the
layer form a graph to establish a connection that passes in one direction. The weight of this
input data is found to exist between the hidden layer and the input layer. MLPs use activation
functions to determine which nodes are ready to fire. These activation functions
include tanh function, sigmoid and ReLUs. MLPs are mainly used to train the models to
understand what kind of co-relation the layers are serving to achieve the desired output from
the given data set. See the below image to understand better.
Fig 17
SOMs were invented by Teuvo Kohenen for achieving data visualization to understand the
dimensions of data through artificial and self-organizing neural networks. The attempts to
achieve data visualization to solve problems are mainly done by what humans cannot visualize.
These data are generally high-dimensional so there are lesser chances of human involvement
and of course less error.
SOMs help in visualizing the data by initializing weights of different nodes and then choose
random vectors from the given training data. They examine each node to find the relative
weights so that dependencies can be understood. The winning node is decided and that is
called Best Matching Unit (BMU). Later, SOMs discover these winning nodes but the nodes
reduce over time from the sample vector. So, the closer the node to BMU more is the more
chance to recognize the weight and carry out further activities. There are also multiple
iterations done to ensure that no node closer to BMU is missed. One example of such is
the RGB color combinations that we use in our daily tasks. Consider the below image to
understand how they function.
Fig 18
DBNs are called generative models because they have various layers of latent as well as
stochastic variables. The latent variable is called a hidden unit because they have binary values.
DBNs are also called Boltzmann Machines because the RGM layers are stacked over each other
to establish communication with previous and consecutive layers. DBNs are used in applications
like video and image recognition as well as capturing motional objects.
DBNs are powered by Greedy algorithms. The layer to layer approach by leaning through a top-
down approach to generate weights is the most common way DBNs function. DBNs use step by
step approach of Gibbs sampling on the hidden two-layer at the top. Then, these stages draw a
sample from the visible units using a model that follows the ancestral sampling method. DBNs
learn from the values present in the latent value from every layer following the bottom-up pass
approach.
Fig 19
The functioning of RBMs is carried out by accepting inputs and translating them to numbers so
that inputs are encoded in the forward pass. RBMs take into account the weight of every input,
and the backward pass takes these input weights and translates them further into
reconstructed inputs. Later, both of these translated inputs, along with individual weights, are
combined. These inputs are then pushed to the visible layer where the activation is carried out,
and output is generated that can be easily reconstructed. To understand this process, consider
the below image.
Fig 20
Autoencoders
Autoencoders are a special type of neural network where inputs are outputs are found usually
identical. It was designed to primarily solve the problems related to unsupervised learning.
Autoencoders are highly trained neural networks that replicate the data. It is the reason why
the input and output are generally the same. They are used to achieve tasks like pharma
discovery, image processing, and population prediction.
Autoencoders constitute three components namely the encoder, the code, and
the decoder. Autoencoders are built in such a structure that they can receive inputs and
transform them into various representations. The attempts to copy the original input by
reconstructing them is more accurate. They do this by encoding the image or input, reduce the
size. If the image is not visible properly they are passed to the neural network for clarification.
Then, the clarified image is termed a reconstructed image and this resembles as accurate as of
the previous image. To understand this complex process, see the below-provided image.
Fig 21
Backpropagation
Backpropagation is one of the important concepts of a neural network. Our task is to classify
our data best. For this, we have to update the weights of parameter and bias, but how can we
do that in a deep neural network? In the linear regression model, we use gradient descent to
optimize the parameter. Similarly here we also use gradient descent algorithm using
Backpropagation.
For a single training example, Backpropagation algorithm calculates the gradient of the error
function. Backpropagation can be written as a function of the neural network. Backpropagation
algorithms are a set of methods used to efficiently train artificial neural networks following a
gradient descent approach which exploits the chain rule.
The main features of Backpropagation are the iterative, recursive and efficient method through
which it calculates the updated weight to improve the network until it is not able to perform
the task for which it is being trained. Derivatives of the activation function to be known at
network design time is required to Backpropagation.
Fig 22
Input values
X1=0.05
X2=0.10
Initial weight
W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55
Bias Values
b1=0.35 b2=0.60
Target Values
T1=0.01
T2=0.99
Forward Pass
To find the value of H1 we first multiply the input value from the weights as
H1=x1*w1+x2*w2+b1
H1=0.05*0.15+0.10*0.20+
0.15+0.10*0.20+0.35
H1=0.3775
H2=x1�w3+x2� �w4+b1
H2=0.05�0.25+0.10
0.25+0.10�0.30+0.35
H2=0.3925
Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and H2.
To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2 from the
weights as
y1=H1�w5+H2�w6+b2
y1=0.593269992�0.40+0.596884378
0.40+0.596884378�0.45+0.60
y1=1.10590597
y2=H1�w7+H2�w8+b2
y2=0.593269992�0.50+0.596884378
0.50+0.596884378�0.55+0.60
y2=1.2249214
Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our target values
T1 and T2.
Now, we will find the total error, which is simply the difference between the outputs from the
th
target outputs. The total error is calculated as
From equation two, it is clear that we cannot partially differentiate it with respect to w5
because there is no any w5. We split equation one into multiple terms so that we can easily
differentiate it with respect to w5 as
Now, we calculate each term one by one to differentiate Etotal with respect to w5 as
Putting the value of e-yy in equation (5)
So, we put the values of in equation no (3) to find the final result.
Now, we will calculate the updated weight w5new with the help of the following formula
w5new=0.35891648
w6new=408666186
w7new=0.511301270
w8new=0.561370121
Backward pass at Hidden layer
From equation (2), it is clear that we cannot partially differentiate it with respect to w1 because
bec
there is no any w1. We split equation (1) into multiple terms so that we can easily differentiate
it with respect to w1 as
Now, we calculate each term one by one to differentiate Etotal with respect to w1 as
We calculate the partial derivative of the total net input to H1 with respect to w1 the same as
we did for the output neuron:
So, we put the values of in equation (13) to find the final result.
Now,
ow, we will calculate the updated weight w1new with the help of the following formula
In the same way, we calculate w2new,w3new, and w4 and this will give us the following values
w1new=0.149780716
w2new=0.19956143
9956143
w3new=0.24975114
w4new=0.29950229
We have updated all the weights. We found the error 0.298371109 on the network when we
fed forward the 0.05 and 0.1 inputs. In the first round of Backpropagation, the total error is
down to 0.291027924. After repeating this process 10,000, the total error is down to
0.0000351085. At this point, the outputs neurons generate 0.159121960 and 0.984065734 i.e.,
nearby our target value when we feed forward the 0.05 and 0.1.
overfitting (where the model learns training data too well and performs poorly on new data),
underfitting (when the model is too simple to capture complex patterns in the data),
vanishing/exploding gradients (problems with gradient updates during trainin
trainingg due to large or
very small values), slow convergence (taking a long time to reach optimal performance), data
imbalance (uneven distribution of classes in the data), and high computational cost requiring
significant resources to train large models.
Overfitting:
● Symptoms: High accuracy on training data, low accuracy on test data.
● Causes: Complex model, insufficient training data, lack of regularization techniques.
● Mitigation: Data augmentation, regularization techniques (L1/L2), dropout layers
Underfitting:
● Symptoms: Low accuracy on both training and test data.
● Causes: Too simple model, not enough training data.
● Mitigation: Increase model complexity, add more layers or neurons
Vanishing/Exploding Gradients:
● Symptoms: Gradients becoming very small or very large during backpropagation,
hindering learning.
● Causes: Deep network architecture, inappropriate activation functions
● Mitigation: Use activation functions like ReLU, gradient clipping techniques
Data Imbalance:
● Symptoms: Model performs poorly on minority classes
● Causes: Uneven distribution of classes in the dataset
● Mitigation: Oversampling, undersampling, cost-sensitive learning