Unit 5 Machine Learning
Introduction to Machine Learning
The term Machine Learning was coined by Arthur Samuel in 1959, an
American pioneer in the field of computer gaming and artificial intelligence,
and stated “it gives computers the ability to learn without being explicitly
programmed”.
And in 1997, Tom Mitchell gave a “well-posed” mathematical and relational
definition that “A computer program is said to learn from experience E with
respect to some task T and some performance measure P, if its performance on
T, as measured by P, improves with experience E.
Machine Learning is the latest buzzword floating around. It deserves to, as it is
one of the most interesting subfields of Computer Science.
Machine Learning is a subset of AI that uses data to solve tasks. These solvers
are trained models of data that learn based on the information provided to them.
This information is derived from probability theory and linear algebra. ML
algorithms use our data to learn and automatically solve predictive tasks. The
goal of machine learning generally is to understand the structure of data and fit
that data into models that can be understood and utilized by people.
Although machine learning is a field within computer science, it differs from
traditional computational approaches. In traditional computing, algorithms are
sets of explicitly programmed instructions used by computers to calculate or
problem solve. Machine learning algorithms instead allow for computers to
train on data inputs and use statistical analysis in order to output values that fall
within a specific range. Because of this, machine learning facilitates computers
in building models from sample data in order to automate decision-making
processes based on data inputs.
There are much more examples of ML in use.
Prediction — Machine learning can also be used in the prediction systems.
Considering the loan example, to compute the probability of a fault, the
system will need to classify the available data in groups.
Image recognition — Machine learning can be used for face detection in an
image as well. There is a separate category for each person in a database of
several people.
Speech Recognition — It is the translation of spoken words into the text. It
is used in voice searches and more. Voice user interfaces include voice
dialing, call routing, and appliance control. It can also be used a simple data
entry and the preparation of structured documents.
Medical diagnoses — ML is trained to recognize cancerous tissues.
Financial industry and trading — companies use ML in fraud investigations
and credit checks.
Machine learning implementations are classified into three major categories,
depending on the nature of the learning “signal” or “response” available to a
learning system which is as follows:-
1. Supervised learning: When an algorithm learns from example data and
associated target responses that can consist of numeric values or string
labels, such as classes or tags, in order to later predict the correct response
when posed with new examples comes under the category of supervised
learning. This approach is indeed similar to human learning under the
supervision of a teacher. The teacher provides good examples for the
student to memorize, and the student then derives general rules from these
specific examples.
2. Unsupervised learning: Whereas when an algorithm learns from plain
examples without any associated response, leaving to the algorithm to
determine the data patterns on its own. This type of algorithm tends to
restructure the data into something else, such as new features that may
represent a class or a new series of un-correlated values. They are quite
useful in providing humans with insights into the meaning of data and new
useful inputs to supervised machine learning algorithms.
As a kind of learning, it resembles the methods humans use to figure out
that certain objects or events are from the same class, such as by observing
the degree of similarity between objects. Some recommendation systems
that we find on the web in the form of marketing automation are based on
this type of learning.
3. Reinforcement learning: When we present the algorithm with examples
that lack labels, as in unsupervised learning. However, we can accompany
an example with positive or negative feedback according to the solution the
algorithm proposes comes under the category of Reinforcement learning,
which is connected to applications for which the algorithm must make
decisions (so the product is prescriptive, not just descriptive, as in
unsupervised learning), and the decisions bear consequences. In the human
world, it is just like learning by trial and error.
Errors help we learn because they have a penalty added (cost, loss of time,
regret, pain, and so on), teaching us that a certain course of action is less
likely to succeed than others. An interesting example of reinforcement
learning occurs when computers learn to play video games by themselves.
In this case, an application presents the algorithm with examples of specific
situations, such as having the gamer stuck in a maze while avoiding an
enemy. The application lets the algorithm know the outcome of actions it
takes, and learning occurs while trying to avoid what it discovers to be
dangerous and to pursue survival. We can have a look at how the company
Google Deep Mind has created a reinforcement learning program that plays
old Atari’s video games. When watching the video, notice how the program
is initially clumsy and unskilled but steadily improves with training until it
becomes a champion.
Statistical-based learning
Statistical Learning is Artificial Intelligence is a set of tools for machine
learning that uses statistics and functional analysis. In simple words, Statistical
learning is understanding from training data and predicting on unseen data.
Statistical learning is used to build predictive models based on the data.
Statistical learning can be used to build applications for computer vision, text
analytics, voice recognition, etc.
Naïve Bayes Classifier Algorithm
o Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional
training dataset.
o Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of
the probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which
can be described as:
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
o Naïve: It is called Naïve because it assumes that the occurrence of a
certain feature is independent of the occurrence of other features. Such as
if the fruit is identified on the bases of color, shape, and taste, then red,
spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending
on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem
Working of Naïve Bayes' Classifier:
Working of Naïve Bayes' Classifier can be understood with the help of the
below example:
Suppose we have a dataset of weather conditions and corresponding target
variable "Play". So using this dataset we need to decide that whether we should
play or not on a particular day according to the weather conditions. So to solve
this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Frequency table for the Weather Conditions:
Likelihood table weather condition:
Weather No Yes
Overcast 0 5 5/14= 0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Hence, on a Sunny day, Player can play the game.
Advantages of Naïve Bayes Classifier:
o Naïve Bayes is one of the fast and easy ML algorithms to predict a class
of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other
Algorithms.
o It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
o Naive Bayes assumes that all features are independent or unrelated, so it
cannot learn the relationship between features.
Applications of Naïve Bayes Classifier:
o It is used for Credit Scoring.
o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is
an eager learner.
o It is used in Text classification such as Spam filtering and Sentiment
analysis.
Genetic algorithm (GA)
A genetic algorithm is a search-based algorithm used for solving optimization
problems in machine learning. This algorithm is important because it solves
difficult problems that would take a long time to solve. It has been used in
various real-life applications such as data centres, electronic circuit design,
code-breaking, image processing, and artificial creativity.
The following are some of the basic terminologies that can help us to
understand genetic algorithms:
Population: This is a subset of all the probable solutions that can solve
the given problem.
Chromosomes: A chromosome is one of the solutions in the population.
Gene: This is an element in a chromosome.
Allele: This is the value given to a gene in a specific chromosome.
Fitness function: This is a function that uses a specific input to produce
an improved output. The solution is used as the input while the output is
in the form of solution suitability.
Genetic operators: In genetic algorithms, the best individuals mate to
reproduce an offspring that is better than the parents. Genetic operators
are used for changing the genetic composition of this next generation.
A genetic algorithm (GA) is a heuristic search algorithm used to solve search
and optimization problems. This algorithm is a subset of evolutionary
algorithms, which are used in computation. Genetic algorithms employ the
concept of genetics and natural selection to provide solutions to problems.
These algorithms have better intelligence than random search
algorithms because they use historical data to take the search to the best
performing region within the solution space.
GAs is also based on the behaviour of chromosomes and their genetic structure.
Every chromosome plays the role of providing a possible solution. The fitness
function helps in providing the characteristics of all individuals within the
population. The greater the fitness function, the better the solution.
Advantages of genetic algorithm
It has excellent parallel capabilities.
It can optimize various problems such as discrete functions, multi-
objective problems, and continuous functions.
It provides answers that improve over time.
A genetic algorithm does not need derivative information.
Genetic algorithms use the evolutionary generational cycle to produce high-
quality solutions. They use various operations that increase or replace the
population to provide an improved fit solution.
Genetic algorithms follow the following phases to solve complex optimization
problems:
Initialization
The genetic algorithm starts by generating an initial population. This initial
population consists of all the probable solutions to the given problem. The most
popular technique for initialization is the use of random binary strings.
Fitness assignment
The fitness function helps in establishing the fitness of all individuals in the
population. It assigns a fitness score to every individual, which further
determines the probability of being chosen for reproduction. The higher the
fitness score, the higher the chances of being chosen for reproduction.
Selection
In this phase, individuals are selected for the reproduction of offspring. The
selected individuals are then arranged in pairs of two to enhance reproduction.
These individuals pass on their genes to the next generation.
The main objective of this phase is to establish the region with high chances of
generating the best solution to the problem (better than the previous generation).
The genetic algorithm uses the fitness proportionate selection technique to
ensure that useful solutions are used for recombination.
Reproduction
This phase involves the creation of a child population. The algorithm employs
variation operators that are applied to the parent population. The two main
operators in this phase include crossover and mutation.
1. Crossover: This operator swaps the genetic information of two parents to
reproduce an offspring. It is performed on parent pairs that are selected
randomly to generate a child population of equal size as the parent
population.
2. Mutation: This operator adds new genetic information to the new child
population. This is achieved by flipping some bits in the chromosome.
Mutation solves the problem of local minimum and enhances
diversification. The following image shows how mutation is done.
Replacement
Generational replacement takes place in this phase, which is a replacement of
the old population with the new child population. The new population consists
of higher fitness scores than the old population, which is an indication that an
improved solution has been generated.
Termination
After replacement has been done, a stopping criterion is used to provide the
basis for termination. The algorithm will terminate after the threshold fitness
solution has been attained. It will identify this solution as the best solution in the
population.
Application areas
Genetic algorithms are applied in the following fields:
Transport: Genetic algorithms are used in the traveling salesman
problem to develop transport plans that reduce the cost of travel and the
time taken. They are also used to develop an efficient way of delivering
products.
DNA Analysis: They are used in DNA analysis to establish the DNA
structure using spectrometric information.
Multimodal Optimization: They are used to provide multiple optimum
solutions in multimodal optimization problems.
Aircraft Design: They are used to develop parametric aircraft designs.
The parameters of the aircraft are modified and upgraded to provide
better designs.
Economics: They are used in economics to describe various models such
as the game theory, cobweb model, asset pricing, and schedule
optimization.
Limitations of genetic algorithms
They are not effective in solving simple problems.
Lack of proper implementation may make the algorithm converge to a
solution that is not optimal.
The quality of the final solution is not guaranteed.
Repetitive calculation of fitness values may make some problems to
experience computational challenges.
Neural Networks
Neural networks reflect the behaviour of the human brain, allowing computer
programs to recognize patterns and solve common problems in the fields of AI,
machine learning, and deep learning.
Neural networks are artificial systems that were inspired by biological neural
networks. These systems learn to perform tasks by being exposed to various
datasets and examples without any task-specific rules. The idea is that the
system generates identifying characteristics from the data they have been
passed without being programmed with a pre-programmed understanding of
these datasets.
Neural networks are based on computational models for threshold logic.
Threshold logic is a combination of algorithms and mathematics. Neural
networks are based either on the study of the brain or on the application of
neural networks to artificial intelligence. The work has led to improvements in
finite automata theory.
The idea of ANNs is based on the belief that working of human brain by
making the right connections can be imitated using silicon and wires as
living neurons and dendrites.
The human brain is composed of 86 billion nerve cells called neurons. They
are connected to other thousand cells by Axons. Stimuli from external
environment or inputs from sensory organs are accepted by dendrites. These
inputs create electric impulses, which quickly travel through the neural
network. A neuron can then send the message to other neuron to handle the
issue or does not send it forward.
ANNs are composed of multiple nodes, which imitate biological neurons of
human brain. The neurons are connected by links and they interact with each
other. The nodes can take input data and perform simple operations on the data.
The result of these operations is passed to other neurons. The output at each
node is called its activation or node value.
Each link is associated with weight. ANNs are capable of learning, which takes
place by altering weight values. The following illustration shows a simple
ANN −
The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural network, we
have to understand what a neural network consists of. In order to define a neural
network that consists of a large number of artificial neurons, which are termed
units arranged in a sequence of layers. Lets us look at various types of layers
available in an artificial neural network.
Artificial Neural Network primarily consists of three layers:
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by
the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer.
The artificial neural network takes input (Xi) and computes the weighted(Wi)
sum of the inputs and includes a bias(B). This computation is represented in the
form of a transfer function.
It determines weighted total is passed as an input to an activation function to
produce the output. Activation functions choose whether a node should fire or
not. Only those who are fired make it to the output layer. There are distinctive
activation functions available that can be applied upon the sort of task we are
performing.
Artificial Neural Networks (ANN) and Biological Neural
Networks (BNN) - Difference
Artificial Neural
Characteristics Biological(Real) Neural Network
Network
Faster in processing Slower in processing information.
Speed information. Response The response time is in
time is in nanoseconds. milliseconds.
Processing Serial processing. Massively parallel processing.
Size Less size & [Link] A highly complex and dense
&Complexity does not perform network of Interconnected neurons
complex pattern containing neurons of the order of
recognition tasks. 1011with 1015 of interconnections.
A highly complex and dense
Information storage is
network of interconnected neurons
replaceable means
Storage containing neurons of the order of
replacing new data with
1011 with 1015 of
an old one.
interconnections.
Information storage is adaptable
Fault intolerant. Corrupt
means new information is added by
information cannot
Fault tolerance adjusting the interconnection
retrieve in case of failure
strengths without destroying old
of the system.
Information.
There is a control unit for
Control No specific control mechanism
controlling computing
Mechanism external to the computing task.
activities
Mathematical model of an artificial neural network (ANN).
The following diagram represents the general model of ANN followed by its
processing.
For the above general model of artificial neural network, the net input can be
calculated as follows −
yin=x1*w1+x2*w2+x3*w3…+xm*wm
i.e., Net input yin=∑mi*xi.
The output can be calculated by applying the activation function over the net
input.
Y=F(yin)
Output = function (net input calculated)
Types of ANN
There are many types of neural networks available or that might be in the
development stage. They can be classified depending on their: Structure,
Data flow, Neurons used and their density, Layers and their depth activation
filters etc.
Feed Forward Neural Networks
This neural network is one of the simplest forms of ANN, where the data or the
input travels in one direction. The data passes through the input nodes and exit
on the output nodes. This neural network may or may not have the hidden
layers. In simple words, it has a front propagated wave and no back propagation
by using a classifying activation function usually.
In the figure below is a Single layer feed-forward network. Here, the sum of the
products of inputs and weights are calculated and fed to the output. The output
is considered if it is above a certain value i.e. threshold (usually 0) and the
neuron fires with an activated output (usually 1) and if it does not fire, the
deactivated value is emitted (usually -1).
Recurrent Neural Networks
As the name suggests, a feedback network has feedback paths, which means
the signal can flow in both directions using loops. This makes it a non-linear
dynamic system, which changes continuously until it reaches a state of
equilibrium. It may be divided into the following types −
Recurrent networks − they are feedback networks with closed loops.
Following are the two types of recurrent networks.
Fully recurrent network − It is the simplest neural network architecture
because all nodes are connected to all other nodes and each node works
as both input and output.
Jordan network − It is a closed loop network in which the output will
go to the input again as feedback as shown in the following diagram.
Applications of Recurrent Neural Networks
Text processing like auto suggests grammar checks, etc.
Text to speech processing
Image tagger
Sentiment Analysis
Translation
Designed to save the output of a layer, Recurrent Neural Network is fed back
to the input to help in predicting the outcome of the layer. The first layer is
typically a feed forward neural network followed by recurrent neural network
layer where some information it had in the previous time-step is remembered
by a memory function. Forward propagation is implemented in this case. It
stores information required for it’s future use. If the prediction is wrong, the
learning rate is employed to make small changes. Hence, making it gradually
increase towards making the right prediction during the back propagation.
Advantages of Recurrent Neural Networks
1. Model sequential data where each sample can be assumed to be
dependent on historical ones is one of the advantage.
2. Used with convolution layers to extend the pixel effectiveness.
Disadvantages of Recurrent Neural Networks
1. Gradient vanishing and exploding problems
2. Training recurrent neural nets could be a difficult task
3. Difficult to process long sequential data using ReLU as an activation
function.
Single Layer ANN
A single-layer neural network represents the most simple form of neural
network, in which there is only one layer of input nodes that send weighted
inputs to a subsequent layer of receiving nodes, or in some cases, one receiving
node. This single-layer design was part of the foundation for systems which
have now become much more complex.
One of the early examples of a single-layer neural network was called a
“perceptron.” The perceptron would return a function based on inputs, again,
based on single neurons in the physiology of the human brain. In some senses,
perceptron models are much like “logic gates” fulfilling individual functions: A
perceptron will either send a signal, or not, based on the weighted inputs.
Another type of single-layer neural network is the single-layer binary linear
classifier, which can isolate inputs into one of two categories.
Single-layer neural networks can also be thought of as part of a class of feed
forward neural networks, where information only travels in one direction,
through the inputs, to the output. Again, this defines these simple networks in
contrast to immensely more complicated systems, such as those that use back
propagation or gradient descent to function.
Multi-Layer ANN
A multi-layer neural network contains more than one layer of artificial neurons
or nodes. They differ widely in design. It is important to note that while single-
layer neural networks were useful early in the evolution of AI, the vast majority
of networks used today have a multi-layer model.
Multi-layer neural networks can be set up in numerous ways. Typically, they
have at least one input layer, which sends weighted inputs to a series of hidden
layers, and an output layer at the end. These more sophisticated setups are also
associated with nonlinear builds using sigmoid and other functions to direct the
firing or activation of artificial neurons. While some of these systems may be
built physically, with physical materials, most are created with software
functions that model neural activity.
Convolutional neural networks (CNNs), so useful for image processing and
computer vision, as well as recurrent neural networks, deep networks and deep
belief systems are all examples of multi-layer neural networks. CNNs, for
example, can have dozens of layers that work sequentially on an image. All of
this is central to understanding how modern neural networks function.
Learning by Training ANN
Learning, in artificial neural network, is the method of modifying the weights
of connections between the neurons of a specified network. Learning in ANN
can be classified into three categories namely supervised learning,
unsupervised learning, and reinforcement learning.
Supervised Learning
As the name suggests, this type of learning is done under the supervision of a
teacher. This learning process is dependent.
During the training of ANN under supervised learning, the input vector is
presented to the network, which will give an output vector. This output vector
is compared with the desired output vector. An error signal is generated, if
there is a difference between the actual output and the desired output vector.
On the basis of this error signal, the weights are adjusted until the actual output
is matched with the desired output.
Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a
teacher. This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of
similar type are combined to form clusters. When a new input pattern is
applied, then the neural network gives an output response indicating the class
to which the input pattern belongs.
There is no feedback from the environment as to what should be the desired
output and if it is correct or incorrect. Hence, in this type of learning, the
network itself must discover the patterns and features from the input data, and
the relation for the input data over the output.
Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen
the network over some critic information. This learning process is similar to
supervised learning;however, we might have very less information.
During the training of network under reinforcement learning, the network
receives some feedback from the environment. This makes it somewhat similar
to supervised learning. However, the feedback obtained here is evaluative not
instructive, which means there is no teacher as in supervised learning. After
receiving the feedback, the network performs adjustments of the weights to get
better critic information in future.
Supervised vs Unsupervised Learning:
Neural networks learn via supervised learning; supervised machine learning
involves an input variable x and output variable y. The algorithm learns from a
training dataset. With each correct answer, algorithms iteratively make
predictions on the data. The learning stops when the algorithm reaches an
acceptable level of performance.
Unsupervised machine learning has input data X and no corresponding output
variables. The goal is to model the underlying structure of the data for
understanding more about the data. The keywords for supervised machine
learning are classification and regression. For unsupervised machine learning,
the keywords are clustering and association.
Applications of Neural Networks
Neural Networks are regulating some key sectors including finance, healthcare,
and automotive. As these artificial neurons function in a way similar to the
human brain. They can be used for image recognition, character recognition and
stock market predictions. The diverse applications of neural networks are as
follows:
1. Facial Recognition
Facial Recognition Systems are serving as robust systems of surveillance.
Recognition Systems matches the human face and compares it with the digital
images. They are used in offices for selective entries. The systems thus
authenticate a human face and match it up with the list of IDs that are present in
its database.
Convolutional Neural Networks (CNN) are used for facial recognition and
image processing. Large number of pictures are fed into the database for
training a neural network. The collected images are further processed for
training.
Sampling layers in CNN are used for proper evaluations. Models are optimized
for accurate recognition results.
2. Stock Market Prediction
Investments are subject to market risks. It is nearly impossible to predict the
upcoming changes in the highly volatile stock market. The forever changing
bullish and bearish phases were unpredictable before the advent of neural
networks. But well what changed it all? Neural Networks of course…
To make a successful stock prediction in real time a Multilayer Perceptron
MLP is employed. MLP comprises multiple layers of nodes, each of these
layers is fully connected to the succeeding nodes. Stock’s past performances,
annual returns, and non profit ratios are considered for building the MLP model.
3. Social Media
No matter how cliche it may sound, social media has altered the normal boring
course of life. Artificial Neural Networks are used to study the behaviours of
social media users. Data shared everyday via virtual conversations is tacked up
and analysed for competitive analysis.
Neural networks duplicate the behaviours of social media users. Post analysis
of individuals' behaviours via social media networks the data can be linked to
people’s spending habits. Multilayer Perceptron ANN is used to mine data
from social media applications.
MLP forecasts social media trends, it uses different training methods like Mean
Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean
Squared Error (MSE). MLP takes into consideration several factors like user’s
favourite instagram pages, bookmarked choices etc. These factors are
considered as inputs for training the MLP model.
In the ever changing dynamics of social media applications, artificial neural
networks can definitely work as the best fit model for user data analysis.
4. Aerospace
Aerospace Engineering is an expansive term that covers developments in
spacecraft and aircraft. Fault diagnosis, high performance auto piloting,
securing the aircraft control systems, and modeling key dynamic simulations are
some of the key areas that neural networks have taken over. Time delay Neural
networks can be employed for modelling non linear time dynamic systems.
Time Delay Neural Networks are used for position independent feature
recognition. The algorithm thus built based on time delay neural networks can
recognize patterns. (Recognizing patterns are automatically built by neural
networks by copying the original data from feature units).
Other than this TNN are also used to provide stronger dynamics to the NN
models. As passenger safety is of utmost importance inside an aircraft,
algorithms built using the neural network systems ensures the accuracy in the
autopilot system. As most of the autopilot functions are automated, it is
important to ensure a way that maximizes the security.
5. Defence
Defence is the backbone of every country. Every country’s state in the
international domain is assessed by its military operations. Neural Networks
also shape the defence operations of technologically advanced countries. The
United States of America, Britain, and Japan are some countries that use
artificial neural networks for developing an active defence strategy.
Neural networks are used in logistics, armed attack analysis, and for object
location. They are also used in air patrols, maritime patrol, and for controlling
automated drones. The defence sector is getting the much needed kick of
artificial intelligence to scale up its technologies.
Convolutional Neural Networks(CNN), are employed for determining the
presence of underwater mines. Underwater mines are the underpass that serve as
an illegal commute route between two countries. Unmanned Airborne Vehicle
(UAV), and Unmanned Undersea Vehicle (UUV) these autonomous sea
vehicles use convolutional neural networks for the image processing.
Convolutional layers form the basis of Convolutional Neural Networks. These
layers use different filters for differentiating between images. Layers also have
bigger filters that filter channels for image extraction.
6. Healthcare
The age old saying goes like “Health is Wealth”. Modern day individuals are
leveraging the advantages of technology in the healthcare sector. Convolutional
Neural Networks are actively employed in the healthcare industry for X ray
detection, CT Scan and ultrasound.
As CNN is used in image processing, the medical imaging data retrieved from
aforementioned tests is analyzed and assessed based on neural network
models. Recurrent Neural Network (RNN) is also being employed for the
development of voice recognition systems.
Voice recognition systems are used these days to keep track of the patient’s
data. Researchers are also employing Generative Neural Networks for drug
discovery. Matching different categories of drugs is a hefty task, but generative
neural networks have broken down the hefty task of drug discovery. They can
be used for combining different elements which forms the basis of drug
discovery.
7. Signature Verification and Handwriting Analysis
Signature Verification , as the self explanatory term goes, is used for verifying
an individual’s signature. Banks, and other financial institutions use signature
verification to cross check the identity of an individual.
Usually a signature verification software is used to examine the signatures. As
cases of forgery are pretty common in financial institutions, signature
verification is an important factor that seeks to closely examine the authenticity
of signed documents.
Artificial Neural Networks are used for verifying the signatures. ANN are
trained to recognize the difference between real and forged signatures. ANNs
can be used for the verification of both offline and online signatures.
For training an ANN model, varied datasets are fed in the database. The data
thus fed help the ANN model to differentiate. ANN model employs image
processing for extraction of features.
Handwriting analysis plays an integral role in forensics. The analysis is further
used to evaluate the variations in two handwritten documents. The process of
spilling words on a blank sheet is also used for behavioural
analysis. Convolutional Neural Networks (CNN) are used for handwriting
analysis and handwriting verification.
8. Weather Forecasting
The forecasts done by the meteorological department were never accurate
before artificial intelligence came into force. Weather Forecasting is primarily
undertaken to anticipate the upcoming weather conditions beforehand. In the
modern era, weather forecasts are even used to predict the possibilities of
natural disasters.
Multilayer Perceptron (MLP), Convolutional Neural Network (CNN) and
Recurrent Neural Networks (RNN) areused for weather forecasting.
Traditional ANN multilayer models can also be used to predict climatic
conditions 15 days in advance. A combination of different types of neural
network architecture can be used to predict air temperatures.
Various inputs like air temperature, relative humidity, wind speed and solar
radiations were considered for training neural network based
models. Combination models (MLP+CNN), (CNN+RNN) usually works
better in the case of weather forecasting.
Hebbian Learning
Donald Hebb introduced the NEUROSCIENTIFIC concept of Hebbian
learning in his 1949 publication of The Organization of Behaviour. Also known
as Hebb’s Rule or Cell Assembly Theory, Hebbian Learning attempts to
connect the psychological and neurological underpinnings of learning.
The basis of the theory is when our brains learn something new, neurons are
activated and connected with other neurons, forming a neural network. These
connections start off weak, but each time the stimulus is repeated, the
connections grow stronger and stronger, and the action becomes more intuitive.
A good example is the act of learning to drive. When we start out, everything
we do is incredibly deliberate. We remind our self to turn on our indicator, to
check our blind spot, and so on. However, after years of experience, these
processes become so automatic that we perform them without even thinking.
According to Hebb’s rule, the weights are found to increase proportionately to
the product of input and output. It means that in a Hebb network if two neurons
are interconnected then the weights associated with these neurons can be
increased by changes in the synaptic gap. This network is suitable for bipolar
data. The Hebbian learning rule is generally applied to logic gates.
The weights are updated as:
W (new) = w (old) + x*y
Training Algorithm for Hebbian Learning Rule
The training steps of the algorithm are as follows:
Initially, the weights are set to zero, i.e. w =0 for all inputs i =1 to n and n
is the total number of input neurons.
Let s be the output. The activation function for inputs is generally set as
an identity function.
The activation function for output is also set to y= t.
The weight adjustments and bias are adjusted to:
The steps 2 to 4 are repeated for each input vector and output.
Example of Hebbian Learning Rule
Let us implement logical AND function with bipolar inputs using Hebbian
Learning
Input Input Bias Target
X1 X2 b y
1 1 1 1
1 -1 1 -1
-1 1 1 -1
-1 -1 1 -1
X1 and X2 are inputs, b is the bias taken as 1, the target value is the output of
logical AND operation over inputs.
1) Initially, the weights are set to zero and bias is also set as zero.
W1=w2=b=0
2) First input vector is taken as [x1 x2 b] = [1 1 1] and target value is 1.
The new weights will be:
3) The above weights are the final new weights. When the second input is
passed, these become the initial weights.
4) Take the second input = [1 -1 1]. The target is -1.
5) Similarly, the other inputs and weights are calculated.
The table below shows all the input:
Input Target Weight Bias New
Bias
s Output Changes Changes Weights
X1 X2 b y ∆w1 ∆w2 ∆b W1 W2 b
1 1 1 1 1 1 1 1 1 1
1 -1 1 -1 -1 1 -1 0 2 0
-1 1 1 -1 1 -1 -1 1 1 -
1
-1 -1 1 -1 1 1 -1 2 2 -
2
Hebb Net for AND Function
Perceptron Learning Algorithm
Perceptron Networks are single-layer feed-forward networks. These are also
called Single Perceptron Networks. The Perceptron consists of an input layer, a
hidden layer, and output layer.
The input layer is connected to the hidden layer through weights, which may
be(-1, +1 or 0). The activation function used is a binary step function for the
input layer and the hidden layer.
The output is
Y= f (y)
The activation function is:
The weight updating takes place between the hidden layer and the output layer
to match the target output. The error is calculated based on the actual output and
the desired output.
If the output matches the target then no weight updating takes place. The
weights are initially set to 0 or 1 and adjusted successively till an optimal
solution is found.
The weights in the network can be set to any values initially. The Perceptron
learning will converge to weight vector that gives correct output for all input
training pattern and this learning happens in a finite number of steps.
The Perceptron rule can be used for both binary and bipolar inputs.
Learning Rule for Single Output Perceptron
1) Let there be “n” training input vectors and x (n) and t (n) are associated with
the target values.
2) Initialize the weights and bias. Set them to zero for easy calculation.
3) Let the learning rate be 1.
4) The input layer has identity activation function so x (i)= s ( i).
5) To calculate the output of the network:
6) The activation function is applied over the net input to obtain an output.
7) Now based on the output, compare the desired target value (t) and the actual
output.
8) Continue the iteration until there is no weight change. Stop once this
condition is achieved.
Learning Rule for Multiple Output Perceptron
1) Let there be “n” training input vectors and x (n) and t (n) are associated with
the target values.
2) Initialize the weights and bias. Set them to zero for easy calculation.
3) Let the learning rate be 1.
4) The input layer has identity activation function so x (i)= s ( i).
5) To calculate the output of each output vector from j= 1 to m, the net input is:
6) The activation function is applied over the net input to obtain an output.
7) Now based on the output, compare the desired target value (t) and the actual
output and make weight adjustments.
w is the weight vector of the connection links between ith input and jth output
neuron and t is the target output for the output unit j.
8) Continue the iteration until there is no weight change. Stop once this
condition is achieved.
Example of Perceptron Learning Rule
Implementation of AND function using a Perceptron network for bipolar inputs
and output.
The input pattern will be x1, x2 and bias b. Let the initial weights be 0 and bias
be 0. The threshold is set to zero and the learning rate is 1.
AND Gate
X1 X2 Target
1 1 1
1 -1 -1
-1 1 -1
-1 -1 -1
1) X1=1 , X2= 1 and target output = 1
W1=w2=wb=0 and x1=x2=b=1, t=1
Net input= y =b + x1*w1+x2*w2 = 0+1*0 +1*0 =0
As threshold is zero therefore:
From here we get, output = 0. Now check if output (y) = target (t).
y = 0 but t= 1 which means that these are not same, hence weight updating takes
place.
The new weights are 1, 1, and 1 after the first input vector is presented.
2) X1= 1 X2= -1 , b= 1 and target = -1, W1=1 ,W2=2, Wb=1
Net input= y =b + x1*w1+x2*w2 = 1+1*1 + (-1)*1 =1
The net output for input= 1 will be 1 from:
Therefore again, target = -1 does not match with the actual output =1. Weight
updates take place.
Now new weights are w1 = 0 w2 =2 and wb =0
Similarly, by continuing with the next set of inputs, we get the following table:
Input Bias Target Net Input Calculated Output Weight Changes New Weights
X1 X2 b t yin Y ∆w1 ∆w2 ∆b W1 W2 wb
EPOCH 1
1 1 1 1 0 0 1 1 1 1 1 1
1 -1 1 -1 1 1 -1 1 -1 0 2 0
-1 1 1 -1 2 1 1 -1 -1 1 1 -1
Input Bias Target Net Input Calculated Output Weight Changes New Weights
-1 -1 1 -1 -3 -1 0 0 0 1 1 -1
EPOCH 2
1 1 1 1 1 1 0 0 0 1 1 -1
1 -1 1 -1 -1 -1 0 0 0 1 1 -1
-1 1 1 -1 -1 -1 0 0 0 1 1 -1
-1 -1 1 -1 -3 -1 0 0 0 1 1 -1
The EPOCHS are the cycle of input patterns fed to the system until there is no
weight change required and the iteration stops.
Back-propagation Learning
Back-propagation (backward propagation) is an important mathematical tool for
improving the accuracy of predictions in data mining and machine learning.
Essentially, back-propagation is an algorithm used to calculate derivatives
quickly.
Artificial neural networks use back-propagation as a learning algorithm to
compute a gradient descent with respect to weights. Desired outputs are
compared to achieve system outputs, and then the systems are tuned by
adjusting connection weights to narrow the difference between the two as much
as possible. The algorithm gets its name because the weights are updated
backwards, from output towards input.
The difficulty of understanding exactly how changing weights and biases affects
the overall behaviour of an artificial neural network was one factor that held
back wider application of neural network applications, arguably until the early
2000s when computers provided the necessary insight. Today, back-propagation
algorithms have practical applications in many areas of artificial intelligence
(AI), including optical character recognition (OCR), natural language
processing (NLP) and image processing.
Because back-propagation requires a known, desired output for each input value
in order to calculate the loss function gradient, it is usually classified as a type
of supervised machine learning. Along with classifiers such as
Naïve Bayesian filters and decision trees, the back-propagation algorithm has
emerged as an important part of machine learning applications that
involve predictive analytics.
Back-propagation is the essence of neural network training. It is the method of
fine-tuning the weights of a neural network based on the error rate obtained in
the previous epoch (i.e., iteration). Proper tuning of the weights allows you to
reduce error rates and make the model reliable by increasing its generalization.
Back-propagation in neural network is a short form for “backward propagation
of errors.” It is a standard method of training artificial neural networks. This
method helps calculate the gradient of a loss function with respect to all the
weights in the network.
How Back-propagation Algorithm Works
The Back propagation algorithm in neural network computes the gradient of the
loss function for a single weight by the chain rule. It efficiently computes one
layer at a time, unlike a native direct computation. It computes the gradient, but
it does not define how the gradient is used. It generalizes the computation in the
delta rule.
Consider the following Back propagation neural network example diagram to
understand:
How Back-propagation Algorithm Works
1. Inputs X, arrive through the pre-connected path
2. Input is modelled using real weights W. Usually the weights are
randomly selected.
3. Calculate the output for every neuron from the input layer, to the hidden
layers, to the output layer.
4. Calculate the error in the outputs
ErrorB= Actual Output – Desired Output
5. Travel back from the output layer to the hidden layer to adjust the
weights such that the error is decreased.
Keep repeating the process until the desired output is achieved
Why Need Back-propagation?
Most prominent advantages of Back-propagation are:
Back-propagation is fast, simple and easy to program
It has no parameters to tune apart from the numbers of input
It is a flexible method as it does not require prior knowledge about the
network
It is a standard method that generally works well
It does not need any special mention of the features of the function to be
learned.
What is a Feed Forward Network?
A feed-forward neural network is an artificial neural network where the nodes
never form a cycle. This kind of neural network has an input layer, hidden
layers, and an output layer. It is the first and simplest type of artificial neural
network.
Types of Back-propagation Networks
Two Types of Back-propagation Networks are:
Static Back-propagation
Recurrent Back-propagation
Static back-propagation:
It is one kind of back-propagation network which produces a mapping of a
static input for static output. It is useful to solve static classification issues like
optical character recognition.
Recurrent Back-propagation:
Recurrent Back propagation in data mining is fed forward until a fixed value is
achieved. After that, the error is computed and propagated backward.
The main difference between both of these methods is: that the mapping is rapid
in static back-propagation while it is non-static in recurrent back-propagation.