0% found this document useful (0 votes)
17 views12 pages

Understanding Artificial Neural Networks

Uploaded by

bluecrystal4475
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views12 pages

Understanding Artificial Neural Networks

Uploaded by

bluecrystal4475
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Artificial Neural Networks (ANNs)

Artificial Neural Networks (ANNs) are computer systems designed to mimic how the human
brain processes information. Just like the brain uses neurons to process data and make
decisions, ANNs use artificial neurons to analyze data, identify patterns and make predictions.
These networks consist of layers of interconnected neurons that work together to solve
complex problems. The key idea is that ANNs can "learn" from the data they process, just as
our brain learns from experience. They are used in various applications from recognizing
images to making personalized recommendations.

• An artificial neural network is a crude way of trying to simulate the human brain
(digitally)
• Human brain – Approx. 10 billion neurons
• Each neuron connected with thousands of others
• Parts of neuron
– Cell body
– Dendrites – receive input signal
– Axons – Give output
• ANN – made up of artificial neurons
– Digitally modeled biological neuron
• Each input into the neuron has its own weight associated with it
• As each input enters the nucleus it's multiplied by its weight.

• For n inputs and n weights – weights multiplied by input and summed


z = x1w1+x2w2+x3w3... +xnwn
• An activation function (typically non-linear) is then applied to the weighted sum plus a
bias
• The combination of summation and activation function is called a node.
Key Components of an ANN
 Input Layer: This is where the network receives information. For example, in an image
recognition task, the input could be an image.
 Hidden Layers: These layers process the data received from the input layer. The more
hidden layers there are, the more complex patterns the network can learn and understand.
Each hidden layer transforms the data into more abstract information.
 Output Layer: This is where the final decision or prediction is made. For example, after
processing an image, the output layer might decide whether it’s a cat or a dog.

Working of Artificial Neural Networks


ANNs work by learning patterns in data through a process called training. During training, the
network adjusts itself to improve its accuracy by comparing its predictions with the actual
results.
Lets see how the learning process works:
 Input Layer: Data such as an image, text or number is fed into the network through the
input layer.
 Hidden Layers: Each neuron in the hidden layers performs some calculation on the input,
passing the result to the next layer. The data is transformed and abstracted at each layer.
 Output Layer: After passing through all the layers, the network gives its final prediction
like classifying an image as a cat or a dog.
The process of backpropagation is used to adjust the weights between neurons. When the
network makes a mistake, the weights are updated to reduce the error and improve the next
prediction.

Training and Testing:


 During training, the network is shown examples like images of cats and learns to recognize
patterns in them.
 After training, the network is tested on new data to check its performance. The better the
network is trained, the more accurately it will predict new data.

How do Artificial Neural Networks learn?

 Artificial Neural Networks (ANNs) learn by training on a set of data. For example, to teach
an ANN to recognize a cat, we show it thousands of images of cats. The network processes
these images and learns to identify the features that define a cat.
 Once the network has been trained, we test it by providing new images to see if it can
correctly identify cats. The network’s prediction is then compared to the actual label
(whether it's a cat or not). If it makes an incorrect prediction, the network adjusts by fine-
tuning the weights of the connections between neurons using a process called
backpropagation. This involves correcting the weights based on the difference between the
predicted and actual result.
 This process repeats until the network can accurately recognize a cat in an image with
minimal error. Essentially, through constant training and feedback, the network becomes
better at identifying patterns and making predictions.

Both dendrites and axons are parts of a neuron (nerve cell) — the basic unit of the nervous
system.
They help in sending and receiving messages in the body.

Dendrites:

 Look like: Short, branch-like structures around the cell body.


 Function: They receive signals (messages) from other neurons and carry them toward the
cell body.
 Think of it as: The neuron’s “input wires.”

Axons:

 Look like: A long, single fiber extending from the cell body.
 Function: It carries electrical signals away from the cell body to other neurons, muscles,
or glands.
 Think of it as: The neuron’s “output wire.”

What is an Activation Function?

An activation function is a mathematical function used in artificial neural networks to decide


whether a neuron should be activated (fired) or not.

It takes the input signal (a number) and transforms it into an output that can be used by the next
layer in the network.
Why We Use Activation Functions
Purpose Explanation
Real-world data and problems are complex and non-linear. Without
1. Introduce non-
activation functions, a neural network would behave like a simple linear
linearity
model and couldn’t learn complex patterns.

2. Control neuron It limits or shapes the output (e.g., between 0 and 1 or -1 and 1),
output making learning stable.

3. Helps in learning Enables the network to learn relationships like speech, images, and
complex tasks patterns.

4. Allows Activation functions make it possible to calculate gradients, which are


backpropagation needed to train the model.
Activation functions are important in neural networks because they introduce non-linearity and
helps the network to learn complex patterns. Lets see some common activation functions used
in ANNs:

1. Sigmoid Function: Outputs values between 0 and 1. It is used in binary classification tasks
like deciding if an image is a cat or not.

2. ReLU (Rectified Linear Unit): A popular choice for hidden layers, it returns the input if
positive and zero otherwise. It helps to solve the vanishing gradient problem.

3. Tanh (Hyperbolic Tangent): Similar to sigmoid but outputs values between -1 and 1. It is
used in hidden layers when a broader range of outputs is needed.

4. Softmax: Converts raw outputs into probabilities used in the final layer of a network for
multi-class classification tasks.

5. Leaky ReLU: A variant of ReLU that allows small negative values for inputs helps in
preventing “dead neurons” during training.

These functions help the network decide whether to activate a neuron helps it to recognize
patterns and make predictions.

Gradient Descent Algorithm in Machine Learning

Gradient descent is the backbone of the learning process for various algorithms, including
linear regression, logistic regression, support vector machines, and neural networks which serves
as a fundamental optimization technique to minimize the cost function of a model by iteratively
adjusting the model parameters to reduce the difference between predicted and actual values,
improving the model's performance. Let's see it's role in machine learning:

1. Training Machine Learning Models


Neural networks are trained using Gradient Descent (or its variants) in combination
with backpropagation. Backpropagation computes the gradients of the loss function with respect
to each parameter (weights and biases) in the network by applying the chain rule. The process
involves:
 Forward Propagation: Computes the output for a given input by passing data through the
layers.
 Backward Propagation: Uses the chain rule to calculate gradients of the loss with respect to
each parameter (weights and biases) across all layers.

Gradients are then used by Gradient Descent to update the parameters layer-by-layer, moving
toward minimizing the loss function.
2. Minimizing the Cost Function

The algorithm minimizes a cost function, which quantifies the error or loss of the model's
predictions compared to the true labels

Different Variants of Gradient Descent


There are several variants of gradient descent that differ in the way the step size or learning
rate is chosen and the way the updates are made. Here are some popular variants:
1. Batch Gradient Descent
In batch gradient descent, To update the model parameter values like weight and bias, the
entire training dataset is used to compute the gradient and update the parameters at each
iteration. This can be slow for large datasets but may lead to a more accurate model. It is
effective for convex or relatively smooth error manifolds because it moves directly toward an
optimal solution by taking a large step in the direction of the negative gradient of the cost
function. However, it can be slow for large datasets because it computes the gradient and
updates the parameters using the entire training dataset at each iteration. This can result in
longer training times and higher computational costs.
2. Stochastic Gradient Descent (SGD)
In SGD, only one training example is used to compute the gradient and update the parameters
at each iteration. This can be faster than batch gradient descent but may lead to more noise in
the updates.

3. Mini-batch Gradient Descent


In Mini-batch gradient descent a small batch of training examples is used to compute the
gradient and update the parameters at each iteration. This can be a good compromise between
batch gradient descent and Stochastic Gradient Descent, as it can be faster than batch gradient
descent and less noisy than Stochastic Gradient Descent.

Backpropagation in Neural Network

Backpropagation, short for Backward Propagation of Errors, is a key algorithm used to train
neural networks by minimizing the difference between predicted and actual outputs. It works
by propagating errors backward through the network, using the chain rule of calculus to
compute gradients and then iteratively updating the weights and biases. Combined with
optimization techniques like gradient descent, backpropagation enables the model to reduce
loss across epochs and effectively learn complex patterns from data.
Back Propagation plays a critical role in how neural networks improve over time. Here's why:
1. Efficient Weight Update: It computes the gradient of the loss function with respect to
each weight using the chain rule making it possible to update weights efficiently.
2. Scalability: The Back Propagation algorithm scales well to networks with multiple layers
and complex architectures making deep learning feasible.
3. Automated Learning: With Back Propagation the learning process becomes automated
and the model can adjust itself to optimize its performance.

Working of Back Propagation Algorithm


The Back Propagation algorithm involves two main steps: the Forward Pass and the Backward
Pass.
1. Forward Pass Work
In forward pass the input data is fed into the input layer. These inputs combined with their
respective weights are passed to hidden layers. For example in a network with two hidden
layers (h1 and h2) the output from h1 serves as the input to h2. Before applying an activation
function, a bias is added to the weighted inputs.
Each hidden layer computes the weighted sum (`a`) of the inputs then applies an activation
function like ReLU (Rectified Linear Unit) to obtain the output (`o`). The output is passed to
the next layer where an activation function such as softmax converts the weighted outputs into
probabilities for classification.
2. Backward Pass
In the backward pass the error (the difference between the predicted and actual output) is
propagated back through the network to adjust the weights and biases. One common method
for error calculation is the Mean Squared Error (MSE) given by:
MSE = (Predicted Output−Actual Output) 2
Once the error is calculated the network adjusts weights using gradients which are computed
with the chain rule. These gradients indicate how much each weight and bias should be
adjusted to minimize the error in the next iteration. The backward pass continues layer by layer
ensuring that the network learns and improves its performance. The activation function through
its derivative plays a crucial role in computing these gradients during Back Propagation.

Types of Artificial Neural Network


1. Feedforward Neural Network (FNN)
Feedforward Neural Networks are one of the simplest types of ANNs. In this network, data
flows in one direction from the input layer to the output layer, passing through one or more
hidden layers. There are no loops or cycles means the data doesn’t return to any earlier layers.
This type of network does not use backpropagation and is mainly used for basic classification
and regression tasks.

2. Convolutional Neural Network (CNN)


Convolutional Neural Networks (CNNs) are designed to process data that has a grid-like
structure such as images. It include convolutional layers that apply filters to extract important
features from the data such as edges or textures. This makes CNNs effective in image and
speech recognition as they can identify patterns and structures in complex data.

3. Radial Basis Function Network (RBFN)


Radial Basis Function Networks are designed to work with data that can be modeled in a radial
or circular way. These networks consist of two layers: one that maps input to radial basis
functions and another that finds the output. They are used for classification and regression
tasks especially when the data represents an underlying pattern or trend.

4. Recurrent Neural Network (RNN)


Recurrent Neural Networks are designed to handle sequential data such as time-series or text.
Unlike other networks, RNNs have feedback loops that allow information to be passed back
into previous layers, giving the network memory. This feature helps RNNs to make predictions
based on the context provided by previous data helps in making them ideal for tasks like
speech recognition, language modeling and forecasting.

You might also like