0% found this document useful (0 votes)
5 views42 pages

Understanding Artificial Neural Networks

Uploaded by

tejaforyou5
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views42 pages

Understanding Artificial Neural Networks

Uploaded by

tejaforyou5
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

COMPUTATIONAL INTELLIGENCE

(SITA3007)
UNIT – II

ARTIFICIAL NEURAL NETWORKS

Unit II: Basic concepts - Single layer Perceptron - Multilayer Perceptron - Supervised and Unsupervised
learning -deep learning algorithms - Back propagation Networks - Performance Issues.
Introduction

The term "Artificial Neural Network" is derived from Biological neural networks that develop
the structure of a human brain. Similar to the human brain that has neurons interconnected to
one another, artificial neural networks also have neurons that are interconnected to one
another in various layers of the networks. These neurons are known as nodes.

Fig 1

The given figure illustrates the typical diagram of Biological Neural Network.

Fig 2

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell
nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
Relationship between Biological neural network and artificial neural network:

Biological Neural Network Artificial Neural Network


Dendrites Inputs

Cell Nucleus Nodes


Synapse Weights
Axon Out

An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic the
network of neurons makes up a human brain so that computers will have an option to
understand things and make decisions in a human-like manner. The artificial neural network is
designed by programming computers to behave simply like interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.

We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs. If
one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off," then
we get "Off" in output. Here the output depends upon input. Our brain does not perform the
same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."

The architecture of an artificial neural network:

To understand the concept of the architecture of an artificial neural network, we have to


understand what a neural network consists of. In order to define a neural network that consists
of a large number of artificial neurons, which are termed units arranged in a sequence of layers.
Lets us look at various types of layers available in an artificial neural network.
Artificial
tificial Neural Network primarily consists of three layers:

Fig 3

Input Layer:

As the name suggests, it accepts inputs in several different formats provided by the
programmer.

Hidden Layer:

The hidden layer presents in-between


between input and output layers. It perfo
performs
rms all the calculations
to find hidden features and patterns

Output Layer:

The input goes through a series of transformations using the hidden layer, which finally results
in output that is conveyed using this layer.

The artificial neural network takes ininput


put and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.

It determines weighted total is passed as an input to an activation function to produce the


output. Activation functions
ons choose whether a node should fire or not. Only those who are fired
make it to the output layer. There are distinctive activation functions available that can be
applied upon the sort of task we are performing.
Advantages of Artificial Neural Network (ANN)

Parallel processing capability:

Artificial neural networks have a numerical value that can perform more than one task
simultaneously.

Storing data on the entire network:

Data that is used in traditional programming is stored on the whole network, not on a database.
The disappearance of a couple of pieces of data in one place doesn't prevent the network from
working.

Capability to work with incomplete knowledge:

After ANN training, the information may produce output even with inadequate data. The loss of
performance here relies upon the significance of missing data.

Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples and to encourage the
network according to the desired output by demonstrating these examples to the network. The
succession of the network is directly proportional to the chosen instances, and if the event can't
appear to the network in all its aspects, it can produce false output.

Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.

Disadvantages of Artificial Neural Network:

Assurance of proper network structure:

There is no particular guideline for determining the structure of artificial neural networks. The
appropriate network structure is accomplished through experience, trial, and error.

Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.

Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.

Difficulty of showing the


he issue to the network:

ANNs can work with numerical data. Problems must be converted into numerical values before
being introduced to ANN. The presentation mechanism to be resolved here will directly impact
the performance of the network. It relies on th
the user's abilities.

The duration of the network is unknown:

The network is reduced to a specific value of the error, and this value does not give us optimum
results.

mid th century are


Science artificial neural networks that have steeped into the world in the mid-20
exponentially developing. In the present time, we have investigated the pros of artificial neural
networks and the issues encountered in the course of their utilization. It should not be
overlooked that the cons of ANN networks, which are a flourishi flourishing
ng science branch, are
eliminated individually, and their pros are increasing day by day. It means that artificial neural
networks will turn into an irreplaceable part of our lives progressively important.

How do artificial neural networks work?

Artificial Neural Network can be best represented as a weighted directed graph, where the
artificial neurons form the nodes. The association between the neurons outputs and neuron
inputs can be viewed as the directed edges with weights. The Artificial Neural Network receives
the input signal from the external source in the form of a pattern and image in the form of a
vector. These inputs are then mathematically assigned by the notations x(n) for every n number
of inputs.
Fig 4

Afterward, each of the input is multiplied by its corresponding weights ( these weights are the
details utilized by the artificial neural networks to solve a specific problem ). In general terms,
these weights normally represent the strength of the interconnection between neurons inside
the artificial neural network. All the weighted inputs are summarized inside the computing unit.

If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity. Here,
to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.

The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or non-
linear sets of functions. Some of the commonly used sets of activation functions are the Binary,
linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look at each of them in
details:

Types of Artificial Neural Network:

There are various types of Artificial Neural Networks (ANN) depending upon the human brain
neuron and network functions, an artificial neural network similarly performs tasks. The
majority of the artificial neural networks will have some similarities with a more complex
biological partner and are very effective at their expected tasks. For example, segmentation or
classification.

Feedback ANN:

In this type of ANN, the output returns into the network to accomplish the best-evolved results
internally. As per the University of Massachusetts, Lowell Centre for Atmospheric Research. The
feedback networks feed information back into itself and are well suited to solve optimization
issues. The Internal system error corrections utilize feedback ANNs.

Feed-Forward ANN:

A feed-forward network is a basic neural network comprising of an input layer, an output layer,
and at least one layer of a neuron. Through assessment of its output by reviewing its input, the
intensity of the network can be noticed based on group behavior of the associated neurons,
and the output is decided. The primary advantage of this network is that it figures out how to
evaluate and recognize input patterns.
Perceptron model

Perceptron is Machine Learning algorithm for supervised learning of various binary


classification tasks. Further, Perceptron is also und
understood
erstood as an Artificial Neuron or neural
network unit that helps to detect certain input data computations in business intelligence.
intelligence

Perceptron model is also treated as one of the best and simplest types of Artificial Neural
networks. However, it is a supervised
ervised learning algorithm of binary classifiers. Hence, we can
consider it as a single-layer
layer neural network with four main parameters, i.e., input values,
weights and Bias, net sum, and an activation function.

Basic Components of Perceptron

Mr. Frank Rosenblatt


nblatt invented the perceptron model as a binary classifier which contains three
main components. These are as follows:

Fig 5
Fig 6

o Input Nodes or Input Layer:


This is the primary component of Perceptron which accepts the initial data into the system for
further processing. Each input node contains a real numerical value.

o Wight and Bias:


Weight parameter represents the strength of the connection between units. This is another
most important parameter of Perceptron components. Weight is directly proportional to the
strength of the associated input neuron in deciding the output. Further, Bias can be considered
as the line of intercept in a linear equation.

o Activation Function:
These are the final and important components that help to determine whether the neuron will
fire or not. Activation Function can be considered primarily as a step function.

Types of Activation functions:

 Sign function
 Step function, and
 Sigmoid function
Fig 7

The data scientist uses the activation function to take a subjective decision based on various
vari
problem statements and forms the desired outputs. Activation function may differ (e.g., Sign,
Step, and Sigmoid) in perceptron models by checking whether the learning process is slow or
has vanishing or exploding gradients.

How does Perceptron work?

In Machine Learning, Perceptron is considered as a single


single-layer
layer neural network that consists of
four main parameters named input values (Input nodes), weights and Bias, net sum, and an
activation function. The perceptron model begins with the multiplication of all input values and
their weights, then adds these values together to create the weighted sum. Then this weighted
sum is applied to the activation function 'f' to obtain the desired output. This activation
function is also known as the step function and is represented by 'f'.

Fig 8
Perceptron models are divided into two types.

1. Single-layer
layer Perceptron Model
2. Multi-layer
layer Perceptron model

Single Layer Perceptron Model:


This is one of the easiest Artificial neural networks (ANN) types. A single
single-layered
layered perceptron
perce
model consists feed-forward
forward network and also includes a threshold transfer function inside the
model. The main objective of the singlesingle-layer
layer perceptron model is to analyze the linearly
separable objects with binary outcomes.

In a single layer perceptron


ron model, its algorithms do not contain recorded data, so it begins
with inconstantly allocated input for weight parameters. Further, it sums up all inputs (weight).
After adding all inputs, if the total sum of all inputs is more than a pre
pre-determined
determined value,
val the
model gets activated and shows the output value as +1.

If the outcome is same as pre--determined


determined or threshold value, then the performance of this
model is stated as satisfied, and weight demand does not change. However, this model consists
of a few discrepancies triggered when multiple weight inputs values are fed into the model.
Hence, to find desired output and minimize errors, some changes should be necessary for the
weights input.

"Single-layer
layer perceptron can learn only linearly separable pattern
patterns."

Fig 9
Multi-Layered Perceptron Model:

Like a single-layer perceptron model, a multi-layer perceptron model also has the same model
structure but has a greater number of hidden layers.

The multi-layer perceptron model is also known as the Backpropagation algorithm, which
executes in two stages as follows:

o Forward Stage: Activation functions start from the input layer in the forward stage and
terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified as per the
model's requirement. In this stage, the error between actual output and demanded
originated backward on the output layer and ended on the input layer.
Hence, a multi-layered perceptron model has considered as multiple artificial neural networks
having various layers in which activation function does not remain linear, similar to a single
layer perceptron model. Instead of linear, activation function can be executed as sigmoid, TanH,
ReLU, etc., for deployment.

A multi-layer perceptron model has greater processing power and can process linear and non-
linear patterns. Further, it can also implement logic gates such as AND, OR, XOR, NAND, NOT,
XNOR, NOR.

Advantages of Multi-Layer Perceptron:

o A multi-layered perceptron model can be used to solve complex non-linear problems.


o It works well with both small and large input data.
o It helps us to obtain quick predictions after the training.
o It helps to obtain the same accuracy ratio with large as well as small data.

Disadvantages of Multi-Layer Perceptron:

o In Multi-layer perceptron, computations are difficult and time-consuming.


o In multi-layer Perceptron, it is difficult to predict how much the dependent variable
affects each independent variable.
o The model functioning depends on the quality of the training.

Supervised and Unsupervised learning

Supervised learning is a type of machine learning algorithm that learns from labeled data.
Labeled data is data that has been tagged with a correct answer or classification.
Supervised learning, as the name indicates, has the presepresence
nce of a supervisor as a teacher.
Supervised learning is when we teach or train the machine using data that is well-labelled.
well
Which means some data is already tagged with the correct answer. After that, the machine is
provided with a new set of examples(da
examples(data)
ta) so that the supervised learning algorithm analyses
the training data(set of training examples) and produces a correct outcome from labeled data.
For example, a labeled dataset of images of Elephant, Camel and Cow would have each image
tagged with eitherr “Elephant” , “Camel”or “Cow.”

Fig 10

 Supervised learning involves training a machine from labeled data.


 Labeled data consists of examples with the correct answer or classification.
 The machine learns the relationship between inputs (fruit images) and outputs (fruit
labels).
 The trained machine can then make predictions on new, unlabeled data.

Types of Supervised Learning

Supervised learning is classified into two categories of algorithms:


Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”.
Classification: A classification problem is when the output variable is a category, such as
“Red” or “blue”, “disease” or “no disease”.
Supervised learning deals with or learns with “labeled” data. This implies that some data is
already tagged with the correct answer.
Regression
Regression is a type of supervised learning that is used to predict continuous values, such as
house prices, stock prices, or customer churn. Regression algorithms learn a function that maps
from the input features to the output value.
Some common regression algorithms include:
 Linear Regression
 Polynomial Regression
 Support Vector Machine Regression
 Decision Tree Regression
 Random Forest Regression
Classification
Classification is a type of supervised learning that is used to predict categorical values, such as
whether a customer will churn or not, whether an email is spam or not, or whether a medical
image shows a tumor or not. Classification algorithms learn a function that maps from the input
features to a probability distribution over the output classes.
Some common classification algorithms include:
 Logistic Regression
 Support Vector Machines
 Decision Trees
 Random Forests
 Naive Baye
Evaluating Supervised Learning Models
Evaluating supervised learning models is an important step in ensuring that the model is
accurate and generalizable. There are a number of different metrics that can be used to
evaluate supervised learning models, but some of the most common ones include:
For Regression
 Mean Squared Error (MSE): MSE measures the average squared difference between the
predicted values and the actual values. Lower MSE values indicate better model
performance.
 Root Mean Squared Error (RMSE): RMSE is the square root of MSE, representing the
standard deviation of the prediction errors. Similar to MSE, lower RMSE values indicate
better model performance.
 Mean Absolute Error (MAE): MAE measures the average absolute difference between
the predicted values and the actual values. It is less sensitive to outliers compared to
MSE or RMSE.
 R-squared (Coefficient of Determination): R-squared measures the proportion of the
variance in the target variable that is explained by the model. Higher R-squared values
indicate better model fit.
For Classification
 Accuracy: Accuracy is the percentage of predictions that the model makes correctly. It is
calculated by dividing the number of correct predictions by the total number of
predictions.
 Precision: Precision is the percentage of positive predictions that the model makes that
are actually correct. It is calculated by dividing the number of true positives by the total
number of positive predictions.
 Recall: Recall is the percentage of all positive examples that the model correctly
identifies. It is calculated by dividing the number of true positives by the total number of
positive examples.
 F1 score: The F1 score is a weighted average of precision and recall. It is calculated by
taking the harmonic mean of precision and recall.
 Confusion matrix: A confusion matrix is a table that shows the number of predictions for
each class, along with the actual class labels. It can be used to visualize the performance
of the model and identify areas where the model is struggling.
Applications of Supervised learning
Supervised learning can be used to solve a wide variety of problems, including:
 Spam filtering: Supervised learning algorithms can be trained to identify and classify
spam emails based on their content, helping users avoid unwanted messages.
 Image classification: Supervised learning can automatically classify images into different
categories, such as animals, objects, or scenes, facilitating tasks like image search,
content moderation, and image-based product recommendations.
 Medical diagnosis: Supervised learning can assist in medical diagnosis by analyzing
patient data, such as medical images, test results, and patient history, to identify
patterns that suggest specific diseases or conditions.
 Fraud detection: Supervised learning models can analyze financial transactions and
identify patterns that indicate fraudulent activity, helping financial institutions prevent
fraud and protect their customers.
 Natural language processing (NLP): Supervised learning plays a crucial role in NLP tasks,
including sentiment analysis, machine translation, and text summarization, enabling
machines to understand and process human language effectively.
Advantages of Supervised learning
 Supervised learning allows collecting data and produces data output from previous
experiences.
 Helps to optimize performance criteria with the help of experience.
 Supervised machine learning helps to solve various types of real-world computation
problems.
 It performs classification and regression tasks.
 It allows estimating or mapping the result to a new sample.
 We have complete control over choosing the number of classes we want in the training
data.
Disadvantages of Supervised learning
 Classifying big data can be challenging.
 Training for supervised learning needs a lot of computation time. So, iitt requires a lot of
time.
 Supervised learning cannot handle all complex tasks in Machine Learning.
 Computation time is vast for supervised learning.
 It requires a labelled data set.
 It requires a training process.

Unsupervised Learning

Unsupervised learning
ng is a type of machine learning that learns from unlabeled data. This
means that the data does not have any pre pre-existing
existing labels or categories. The goal of
unsupervised learning is to discover patterns and relationships in the data without any explicit
guidance.
Unsupervised learning is the training of a machine using information that is neither classified
nor labeled and allowing the algorithm to act on that information without guidance. Here the
task of the machine is to group unsorted information accordi
according
ng to similarities, patterns, and
differences without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will be given to the
machine. Therefore the machine is restricted to find the hidden structure in unlabeled
u data by
itself.
You can use unsupervised learning to examine the animal data that has been gathered and
distinguish between several groups according to the traits and actions of the animals. These
groupings might correspond to various animal spec species,
ies, providing you to categorize the
creatures without depending on labels that already exist.

Fig 11
 Unsupervised learning allows the model to discover patterns and relationships in
unlabeled data.
 Clustering algorithms group similar data points together based on their inherent
characteristics.
 Feature extraction captures essential information from the data, enabling the model to
make meaningful distinctions.
 Label association assigns categories to the clusters based on the extracted patterns and
characteristics.

Types of Unsupervised Learning

Unsupervised learning is classified into two categories of algorithms:


 Clustering: A clustering problem is where you want to discover the inherent groupings in
the data, such as grouping customers by purchasing behavior.
 Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.
Clustering
Clustering is a type of unsupervised learning that is used to group similar data points
together. Clustering algorithms work by iteratively moving data points closer to their cluster
centers and further away from data points in other clusters.
1. Exclusive (partitioning)
2. Agglomerative
3. Overlapping
4. Probabilistic
Clustering Types:-
1. Hierarchical clustering
2. K-means clustering
3. Principal Component Analysis
4. Singular Value Decomposition
5. Independent Component Analysis
6. Gaussian Mixture Models (GMMs)
7. Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
Association rule learning
Association rule learning is a type of unsupervised learning that is used to identify patterns in a
data. Association rule learning algorithms work by finding relationships between different items
in a dataset.
Some common association rule learning algorithms include:
 Apriori Algorithm
 Eclat Algorithm
 FP-Growth Algorithm
Evaluating Non-Supervised Learning Models
Evaluating non-supervised learning models is an important step in ensuring that the model is
effective and useful. However, it can be more challenging than evaluating supervised learning
models, as there is no ground truth data to compare the model’s predictions to.
There are a number of different metrics that can be used to evaluate non-supervised learning
models, but some of the most common ones include:
 Silhouette score: The silhouette score measures how well each data point is clustered
with its own cluster members and separated from other clusters. It ranges from -1 to
1, with higher scores indicating better clustering.
 Calinski-Harabasz score: The Calinski-Harabasz score measures the ratio between the
variance between clusters and the variance within clusters. It ranges from 0 to
infinity, with higher scores indicating better clustering.
 Adjusted Rand index: The adjusted Rand index measures the similarity between two
clusterings. It ranges from -1 to 1, with higher scores indicating more similar clusterings.
 Davies-Bouldin index: The Davies-Bouldin index measures the average similarity
between clusters. It ranges from 0 to infinity, with lower scores indicating better
clustering.
 F1 score: The F1 score is a weighted average of precision and recall, which are two
metrics that are commonly used in supervised learning to evaluate classification
models. However, the F1 score can also be used to evaluate non-supervised learning
models, such as clustering models.
Application of Unsupervised learning
Non-supervised learning can be used to solve a wide variety of problems, including:
 Anomaly detection: Unsupervised learning can identify unusual patterns or deviations
from normal behavior in data, enabling the detection of fraud, intrusion, or system
failures.
 Scientific discovery: Unsupervised learning can uncover hidden relationships and
patterns in scientific data, leading to new hypotheses and insights in various scientific
fields.
 Recommendation systems: Unsupervised learning can identify patterns and similarities
in user behavior and preferences to recommend products, movies, or music that align
with their interests.
 Customer segmentation: Unsupervised learning can identify groups of customers with
similar characteristics, allowing businesses to target marketing campaigns and improve
customer service more effectively.
 Image analysis: Unsupervised learning can group images based on their content,
facilitating tasks such as image classification, object detection, and image retrieval.
Advantages of Unsupervised learning
 It does not require training data to be labeled.
 Dimensionality reduction can be easily accomplished using unsupervised learning.
 Capable of finding previously unknown patterns in data.
 Unsupervised learning can help you gain insights from unlabeled data that you might not
have been able to get otherwise.
 Unsupervised learning is good at finding patterns and relationships in data without being
told what to look for. This can help you learn new things about your data.
Disadvantages of Unsupervised learning
 Difficult to measure accuracy or effectiveness due to lack of predefined answers during
training.
 The results often have lesser accuracy.
 The user needs to spend time interpreting and label the classes which follow that
classification.
 Unsupervised learning can be sensitive to data quality, including missing values, outliers,
and noisy data.
 Without labeled data, it can be difficult to evaluate the performance of unsupervised
learning models, making it challenging to assess their effectiveness.

Supervised vs. Unsupervised

Supervised machine Unsupervised machine


Parameters learning learning

Algorithms are trained using Algorithms are used against


Input Data labeled data. data that is not labeled

Computational Complexity Simpler method Computationally complex

Accuracy Highly accurate Less accurate

No. of classes No. of classes is known No. of classes is not known

Data Analysis Uses offline analysis Uses real-time analysis of data

Linear and Logistics


regression,KNN Random
K-Means clustering,
forest, multi-class
Hierarchical clustering, Apriori
classification, decision tree,
algorithm, etc.
Support Vector Machine,
Algorithms used Neural Network, etc.

Output Desired output is given. Desired output is not given.


Use training data to infer
No training data is used.
Training data model.

It is not possible to learn


It is possible to learn larger
larger and more complex
and more complex models
models than with supervised
with unsupervised learning.
Complex model learning.

Model We can test our model. We can not test our model.

Supervised learning is also Unsupervised learning is also


Called as called classification. called clustering.

Example: Optical character Example: Find a face in an


Example recognition. image.

supervised learning needs Unsupervised learning does


supervision to train the not need any supervision to
Supervision model. train the model.

Deep learning algorithm:

Deep learning algorithms are dynamically made to run through several layers of neural
networks, which are nothing but a set of decision-making networks that are pre-trained to
serve a task. Later, each of these is passed through simple layered representations and move
on to the next layer. However, most machine learning is trained to work fairly well on datasets
that have to deal with hundreds of features or columns. For a data set to be structured or
unstructured, machine learning tends to fail mostly because they fail to recognize a simple
image having a dimension of 800x1000 in RGB. It becomes quite unfeasible for a traditional
machine learning algorithm to handle such depths. This is where deep learning.

Deep Learning Algorithms

The Deep Learning Algorithms are as follows:

Convolutional Neural Networks (CNNs)

CNN's popularly known as ConvNets majorly consists of several layers and are specifically used
for image processing and detection of objects. It was developed in 1998 by Yann LeCun and was
first called LeNet. Back then, it was developed to recognize digits and zip code characters. CNNs
have wide usage in identifying the image of the satellites, medical image processing, series
forecasting, and anomaly detection.

CNNs process the data by passing it through multiple layers and extracting features to exhibit
convolutional operations. The Convolutional Layer consists of Rectified Linear Unit (ReLU) that
outlasts to rectify the feature map. The Pooling layer is used to rectify these feature maps into
the next feed. Pooling is generally a sampling algorithm that is down-sampled and it reduces
the dimensions of the feature map. Later, the result generated consists of 2-D arrays consisting
of single, long, continuous, and linear vector flattened in the map. The next layer i.e.,
called Fully Connected Layer which forms the flattened matrix or 2-D array fetched from the
Pooling Layer as input and identifies the image by classifying it.

Fig 12

Long Short Term Memory Networks (LSTMs)

LSTMs can be defined as Recurrent Neural Networks (RNN) that is programmed to learn and
adapt for dependencies for the long term. It can memorize and recall past data for a greater
period and by default, it is its sole behavior. LSTMs are designed to retain over time and
henceforth they are majorly used in time series predictions because they can restrain memory
or previous inputs. This analogy comes from their chain-like structure consisting
of four interacting layers that communicate with each other differently. Besides applications of
time series prediction, they can be used to construct speech recognizers, development in
pharmaceuticals, and composition of music loops as well.

LSTM work in a sequence of events. First, they don't tend to remember irrelevant details
attained in the previous state. Next, they update certain cell-state values selectively and finally
generate certain parts of the cell-state as output. Below is the diagram of their operation.
Fig 13

Recurrent Neural Networks (RNNs)

Recurrent Neural Networks or RNNs consist of some directed connections that form a cycle that
allow the input provided from the LSTMs to be used as input in the current phase of RNNs.
These inputs are deeply embedded as inputs and enforce the memorization ability of LSTMs
lets these inputs get absorbed for a period in the internal memory. RNNs are therefore
dependent on the inputs that are preserved by LSTMs and work under the synchronization
phenomenon of LSTMs. RNNs are mostly used in captioning the image, time series analysis,
recognizing handwritten data, and translating data to machines.

RNNs follow the work approach by putting output feeds (t-1) time if the time is defined
as t. Next, the output determined by t is feed at input time t+1. Similarly, these processes are
repeated for all the input consisting of any length. There's also a fact about RNNs is that they
store historical information and there's no increase in the input size even if the model size is
increased. RNNs look something like this when unfolded.
Fig 14

Generative Adversarial Networks (GANs)

GANs are defined as deep learning algorithms that are used to generate new instances of data
that match the training data. GAN usually consists of two components namely a generator that
learns to generate false data and a discriminator that adapts itself by learning from this false
data. Over some time, GANs have gained immense usage since they are frequently being used
to clarify astronomical images and simulate lensing the gravitational dark matter. It is also used
in video games to increase graphics for 2D textures by recreating them in higher resolution
like 4K. They are also used in creating realistic cartoons character and also rendering human
faces and 3D object rendering.

GANs work in simulation by generating and understanding the fake data and the real data.
During the training to understand these data, the generator produces different kinds of fake
data where the discriminator quickly learns to adapt and respond to it as false data. GANs then
send these recognized results for updating. Consider the below image to visualize the
functioning.
Fig 15

Radial Basis Function Networks (RBFNs)

RBFNs are specific types of neural networks that follow a feed-forward approach and make use
of radial functions as activation functions. They consist of three layers namely the input layer,
hidden layer, and output layer which are mostly used for time-series prediction, regression
testing, and classification.

RBFNs do these tasks by measuring the similarities present in the training data set. They usually
have an input vector that feeds these data into the input layer thereby confirming the
identification and rolling out results by comparing previous data sets. Precisely, the input layer
has neurons that are sensitive to these data and the nodes in the layer are efficient in
classifying the class of data. Neurons are originally present in the hidden layer though they
work in close integration with the input layer. The hidden layer contains Gaussian
transfer functions that are inversely proportional to the distance of the output from the
neuron's center. The output layer has linear combinations of the radial-based data where the
Gaussian functions are passed in the neuron as parameter and output is generated. Consiider
the given image below to understand the process thoroughly.
Fig 16

Multilayer Perceptrons (MLPs)

MLPs are the base of deep learning technology. It belongs to a class of feed-forward neural
networks having various layers of perceptrons. These perceptrons have various activation
functions in them. MLPs also have connected input and output layers and their number is the
same. Also, there's a layer that remains hidden amidst these two layers. MLPs are mostly used
to build image and speech recognition systems or some other types of the translation software.

The working of MLPs starts by feeding the data in the input layer. The neurons present in the
layer form a graph to establish a connection that passes in one direction. The weight of this
input data is found to exist between the hidden layer and the input layer. MLPs use activation
functions to determine which nodes are ready to fire. These activation functions
include tanh function, sigmoid and ReLUs. MLPs are mainly used to train the models to
understand what kind of co-relation the layers are serving to achieve the desired output from
the given data set. See the below image to understand better.
Fig 17

Self Organizing Maps (SOMs)

SOMs were invented by Teuvo Kohenen for achieving data visualization to understand the
dimensions of data through artificial and self-organizing neural networks. The attempts to
achieve data visualization to solve problems are mainly done by what humans cannot visualize.
These data are generally high-dimensional so there are lesser chances of human involvement
and of course less error.

SOMs help in visualizing the data by initializing weights of different nodes and then choose
random vectors from the given training data. They examine each node to find the relative
weights so that dependencies can be understood. The winning node is decided and that is
called Best Matching Unit (BMU). Later, SOMs discover these winning nodes but the nodes
reduce over time from the sample vector. So, the closer the node to BMU more is the more
chance to recognize the weight and carry out further activities. There are also multiple
iterations done to ensure that no node closer to BMU is missed. One example of such is
the RGB color combinations that we use in our daily tasks. Consider the below image to
understand how they function.
Fig 18

Deep Belief Networks (DBNs)

DBNs are called generative models because they have various layers of latent as well as
stochastic variables. The latent variable is called a hidden unit because they have binary values.
DBNs are also called Boltzmann Machines because the RGM layers are stacked over each other
to establish communication with previous and consecutive layers. DBNs are used in applications
like video and image recognition as well as capturing motional objects.

DBNs are powered by Greedy algorithms. The layer to layer approach by leaning through a top-
down approach to generate weights is the most common way DBNs function. DBNs use step by
step approach of Gibbs sampling on the hidden two-layer at the top. Then, these stages draw a
sample from the visible units using a model that follows the ancestral sampling method. DBNs
learn from the values present in the latent value from every layer following the bottom-up pass
approach.
Fig 19

Restricted Boltzmann Machines (RBMs)


RBMs were developed by Geoffrey Hinton and resemble stochastic neural networks that learn
from the probability distribution in the given input set. This algorithm is mainly used in the field
of dimension reduction, regression and classification, topic modeling and are considered the
building blocks of DBNs. RBIs consist of two layers namely the visible layer and the hidden layer.
Both of these layers are connected through hidden units and have bias units connected to
nodes that generate the output. Usually, RBMs have two phases namely forward
pass and backward pass.

The functioning of RBMs is carried out by accepting inputs and translating them to numbers so
that inputs are encoded in the forward pass. RBMs take into account the weight of every input,
and the backward pass takes these input weights and translates them further into
reconstructed inputs. Later, both of these translated inputs, along with individual weights, are
combined. These inputs are then pushed to the visible layer where the activation is carried out,
and output is generated that can be easily reconstructed. To understand this process, consider
the below image.
Fig 20

Autoencoders

Autoencoders are a special type of neural network where inputs are outputs are found usually
identical. It was designed to primarily solve the problems related to unsupervised learning.
Autoencoders are highly trained neural networks that replicate the data. It is the reason why
the input and output are generally the same. They are used to achieve tasks like pharma
discovery, image processing, and population prediction.

Autoencoders constitute three components namely the encoder, the code, and
the decoder. Autoencoders are built in such a structure that they can receive inputs and
transform them into various representations. The attempts to copy the original input by
reconstructing them is more accurate. They do this by encoding the image or input, reduce the
size. If the image is not visible properly they are passed to the neural network for clarification.
Then, the clarified image is termed a reconstructed image and this resembles as accurate as of
the previous image. To understand this complex process, see the below-provided image.
Fig 21

Backpropagation

Backpropagation is one of the important concepts of a neural network. Our task is to classify
our data best. For this, we have to update the weights of parameter and bias, but how can we
do that in a deep neural network? In the linear regression model, we use gradient descent to
optimize the parameter. Similarly here we also use gradient descent algorithm using
Backpropagation.

For a single training example, Backpropagation algorithm calculates the gradient of the error
function. Backpropagation can be written as a function of the neural network. Backpropagation
algorithms are a set of methods used to efficiently train artificial neural networks following a
gradient descent approach which exploits the chain rule.

The main features of Backpropagation are the iterative, recursive and efficient method through
which it calculates the updated weight to improve the network until it is not able to perform
the task for which it is being trained. Derivatives of the activation function to be known at
network design time is required to Backpropagation.
Fig 22

Input values
X1=0.05
X2=0.10

Initial weight
W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55

Bias Values
b1=0.35 b2=0.60

Target Values
T1=0.01
T2=0.99

Now, we first calculate the values of H1 and H2 by a forward pass.

Forward Pass
To find the value of H1 we first multiply the input value from the weights as
H1=x1*w1+x2*w2+b1
H1=0.05*0.15+0.10*0.20+
0.15+0.10*0.20+0.35
H1=0.3775

To calculate the final result of H1, we performed the sigmoid function as

We will calculate the value of H2 in the same way as H1

H2=x1�w3+x2� �w4+b1
H2=0.05�0.25+0.10
0.25+0.10�0.30+0.35
H2=0.3925

To calculate the final result of H1, we performed the sigmoid function as

Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and H2.

To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2 from the
weights as

y1=H1�w5+H2�w6+b2
y1=0.593269992�0.40+0.596884378
0.40+0.596884378�0.45+0.60
y1=1.10590597

To calculate the final resultt of y1 we performed the sigmoid function as


We will calculate the value of y2 in the same way as y1

y2=H1�w7+H2�w8+b2
y2=0.593269992�0.50+0.596884378
0.50+0.596884378�0.55+0.60
y2=1.2249214

To calculate the final


nal result of H1, we performed the sigmoid function as

Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our target values
T1 and T2.

Now, we will find the total error, which is simply the difference between the outputs from the
th
target outputs. The total error is calculated as

So, the total error is


Now, we will backpropagate this error to update the weights using a backward pass.

Backward pass at the output layer


To update the weight, we calculate the error correspond to ea each
ch weight with the help of a total
error. The error on weight w is calculated by differentiating total error with respect to w.

We perform backward process so first consider the last weight w5 as

From equation two, it is clear that we cannot partially differentiate it with respect to w5
because there is no any w5. We split equation one into multiple terms so that we can easily
differentiate it with respect to w5 as

Now, we calculate each term one by one to differentiate Etotal with respect to w5 as
Putting the value of e-yy in equation (5)
So, we put the values of in equation no (3) to find the final result.

Now, we will calculate the updated weight w5new with the help of the following formula

In the same way, we calculate w6new,w7new, and w8n


w8new
ew and this will give us the following
values

w5new=0.35891648
w6new=408666186
w7new=0.511301270
w8new=0.561370121
Backward pass at Hidden layer

Now, we will backpropagate


ckpropagate to our hidden layer and update the weight w1, w2, w3, and w4 as
we have done with w5, w6, w7, and w8 weights.

We will calculate the error at w1 as

From equation (2), it is clear that we cannot partially differentiate it with respect to w1 because
bec
there is no any w1. We split equation (1) into multiple terms so that we can easily differentiate
it with respect to w1 as

Now, we calculate each term one by one to differentiate Etotal with respect to w1 as

We again split this because there is no any H1final term in Etoatal as

will again split because in E1 and E2 there is no H1 term. Splitting is done


as
We again Split both because there is no any y1 and y2 term in E1 and E2. We split
it as

Now, we find the value of by putting values in


n equation (18) and (19) as

From equation (18)

From equation (8)

From equation (19)

Putting the value of e-y2


y2 in equation (23)
From equation (21)

Now from equation (16) and (17)


Put the value of in equation (15) as

We have we need to figure out as

Putting the value of e-H1


H1 in equation (30)

We calculate the partial derivative of the total net input to H1 with respect to w1 the same as
we did for the output neuron:

So, we put the values of in equation (13) to find the final result.

Now,
ow, we will calculate the updated weight w1new with the help of the following formula
In the same way, we calculate w2new,w3new, and w4 and this will give us the following values

w1new=0.149780716
w2new=0.19956143
9956143
w3new=0.24975114
w4new=0.29950229

We have updated all the weights. We found the error 0.298371109 on the network when we
fed forward the 0.05 and 0.1 inputs. In the first round of Backpropagation, the total error is
down to 0.291027924. After repeating this process 10,000, the total error is down to
0.0000351085. At this point, the outputs neurons generate 0.159121960 and 0.984065734 i.e.,
nearby our target value when we feed forward the 0.05 and 0.1.

overfitting (where the model learns training data too well and performs poorly on new data),
underfitting (when the model is too simple to capture complex patterns in the data),
vanishing/exploding gradients (problems with gradient updates during trainin
trainingg due to large or
very small values), slow convergence (taking a long time to reach optimal performance), data
imbalance (uneven distribution of classes in the data), and high computational cost requiring
significant resources to train large models.

Overfitting:
● Symptoms: High accuracy on training data, low accuracy on test data.
● Causes: Complex model, insufficient training data, lack of regularization techniques.
● Mitigation: Data augmentation, regularization techniques (L1/L2), dropout layers
Underfitting:
● Symptoms: Low accuracy on both training and test data.
● Causes: Too simple model, not enough training data.
● Mitigation: Increase model complexity, add more layers or neurons
Vanishing/Exploding Gradients:
● Symptoms: Gradients becoming very small or very large during backpropagation,
hindering learning.
● Causes: Deep network architecture, inappropriate activation functions
● Mitigation: Use activation functions like ReLU, gradient clipping techniques

Data Imbalance:
● Symptoms: Model performs poorly on minority classes
● Causes: Uneven distribution of classes in the dataset
● Mitigation: Oversampling, undersampling, cost-sensitive learning

Other potential performance issues:

● Label noise: Incorrect labels in the training data


● Poor data preprocessing: Not properly cleaning or normalizing data
● Hyperparameter tuning issues: Not finding the best settings for learning rate, batch
size, etc
● Computational resource constraints: Limited hardware to train large models

You might also like