0% found this document useful (0 votes)
79 views7 pages

Feed Forward Neural Network Overview

The document discusses feed forward neural networks and deep learning concepts. It provides details on: 1) The architecture of feed forward neural networks, including input, hidden, and output layers connected in a forward direction without loops. 2) How backpropagation works by calculating gradients to fine-tune weights and reduce errors through multiple iterations. 3) Common loss functions used in neural networks like mean squared error, likelihood, and log loss, and how they evaluate model performance. 4) Gradient descent optimization algorithms and types including batch, stochastic, and mini-batch gradient descent. 5) The importance of the sigmoid activation function in allowing neural networks to learn non-linear and complex problems.

Uploaded by

Mrunal Bhilare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views7 pages

Feed Forward Neural Network Overview

The document discusses feed forward neural networks and deep learning concepts. It provides details on: 1) The architecture of feed forward neural networks, including input, hidden, and output layers connected in a forward direction without loops. 2) How backpropagation works by calculating gradients to fine-tune weights and reduce errors through multiple iterations. 3) Common loss functions used in neural networks like mean squared error, likelihood, and log loss, and how they evaluate model performance. 4) Gradient descent optimization algorithms and types including batch, stochastic, and mini-batch gradient descent. 5) The importance of the sigmoid activation function in allowing neural networks to learn non-linear and complex problems.

Uploaded by

Mrunal Bhilare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Week – 5 (Deep Learning)

Q. 1) Explain the architecture of Feed Forward Neural Network or


Multilayer Perceptron. (12 marks)

Ans: - Feed Forward Neural Networks, also known as Deep Feed Forward Networks or
Multilayer Perceptrons. For example, Convolutional and Recurrent Neural Networks
(which are used extensively in computer vision applications) are based on these
networks. Search engines, machine translation, and mobile applications all rely on deep
learning technologies. It works by stimulating the human brains in terms of identifying
and creating patterns from various types of input. A feed forward neural network is a
key component of this fantastic technology since it aids software developers with
pattern recognition and classification, non-linear regression, and function
approximation.

A feed forward neural network is a type of artificial neural network in which nodes
connections do not form a loop. Often referred to as a multilayered network or neurons,
feed forward neural networks are so named because all information flows in a forward
manner only. The data enters the input nodes, travels through the hidden layers, and
eventually exits the output nodes. The network is devoid of links that would allow the
information exiting the output node to be sent back into the network. The purpose of
feed forward neural networks is to approximate functions.

Here’s how it works

There is a classifier using the formula y = f*(x)

This assigns the value of input x to the category y.

The feed forward network will map y = f(x; θ). It then memorizes the value of θ that
most closely approximates the function.

Fig: - Feed Forward Neural Network


A Feed Forward Neural Network’s Layers:

The following are the components of a feed forward neural network:

Input Layer:

It contains the neurons that receive input. The data is subsequently passed on the next
tier. The input layer’s total number of neurons is equal to the number of variables in the
dataset.

Hidden Layer:

This is the intermediate layer, which is concealed between the input and output layers.
This layer has a large number of neurons that perform alterations on the inputs. They
then communicate with the output layer.

Output Layer:

It is the last layer and is depending on the model’s construction. Additionally, the output
layer is the expected feature, as you are aware of the desired outcome.

Neurons weights:

Weights are used to describe the strength of a connection between neurons. The range
of a weight’s value is from 0 to 1.
Q. 2) What is Backpropagation & How Backpropagation algorithm works?
(6 marks)

Ans: - Backpropagation is the essence of neural network training. It is the method of


fine-tuning the weights of a neural network based on the error rate obtained in the
previous epoch (i.e., iteration). Proper tuning of the weights allows you to reduce error
rates and make the model reliable by increasing its generalization.

Backpropagation in neural network is a short form for “backpropagation of errors”. It is


a standard method of training artificial neural networks. This method helps to calculate
the gradient of a loss function with respect to all the weights in the network.

The Backpropagation algorithm in neural network computes the gradient of the loss
function for a single weight by the chain rule. It efficiently computes one layer at a time,
unlike a native direct computation. It computes the gradient, but it does not define how
the gradient is used. It generalizes the computation in the delta rule.

Consider the following Backpropagation neural network example diagram to


understand:

Fig: - Working of Backpropagation Algorithm

1. Inputs X, arrive through the preconnected path


2. Input is modeled using real weights W. The weights are usually randomly
selected.
3. Calculate the output for every neuron from the input layer, to the hidden layers,
to the output layer.
4. Calculate the error in the outputs.

ErrorB= Actual Output – Desired Output


5. Travel back from the output layer to the hidden layer to adjust the weights such
that the error is decreased.

Keep repeating the process until the desired output is achieved.


Q.3) What is Loss Function? Explain types of loss function. (6 marks)

Ans: - At its core, a loss function is incredibly simple: It’s a method of evaluating how
well your algorithm models your dataset. If your predictions are totally off, your loss
function will output a higher number. If they’re pretty good, it’ll output a higher
number. If they’re pretty good, it’ll output a lower number. As you change pieces of your
algorithm to try and improve your model, your loss function will tell you if you’re
getting anywhere.

Types of loss functions: -

A few of the most popular loss functions currently being used, from simple to more
complex are: -

1. Mean square error:

Mean squared error (MSE) is the workhorse of basic loss functions; it’s easy to
understand and implement and generally works pretty well. To calculate MSE, you
take the difference between your predictions and the ground truth, square it, and
average it out across the whole dataset.

2. Likelihood loss:

The likelihood function is also relatively simple, and is commonly used in


classification problems. The function takes the predicted probability for each input
example and multiplies them. And although the output isn’t exactly human-
interpretable, it’s useful for comparing models.

For example, consider a model that outputs probabilities of [0.4, 0.6, 0.9, 0.1] for the
ground truth labels of [0, 1, 1, 0]. The likelihood loss would be computed as

(0.6) * (0.6) * (0.9) * (0.9) = 0.2916.

Since the model outputs probabilities for TRUE (or 1) only, when the ground truth
label is 0 we take (1-p) as the probability. In other words, we multiply the model’s
outputted probabilities together for the actual outcomes.

3. Log loss (Cross Entropy Loss):

Log loss is a loss function also used frequently in classification problems, and is one
of the most popular measures for kaggle competitions. It’s just a straightforward
modification of the likelihood function with logarithms.

This is actually exactly the same formula as the regular likelihood function, but with
logarithms added in. You can see that when the actual class is 1, the second half of the
function disappears, and when the actual class is 0, the first half drops. That way, we
just end up multiplying the log of the actual predicted probability for the ground truth
class.

The cool thing about the log loss function is that is has a kick: It penalizes heavily for
being very confident and very wrong. The graph below is for when the true label =1, and
you can see that it skyrockets as the predicted probability for label = 0 approaches 1.

Q. 4) What is Gradient descent? Explain the types of Gradient descent.

(3 marks)

Ans: - Gradient descent is an optimization algorithm which is commonly-used to train


machine learning models and neural networks. Training data helps these models learn
over time, and the cost function within gradient descent specifically acts as a barometer,
gauging its accuracy with each iteration of parameter updates. Until the function is close
to or equal to zero, the model will continue to adjust its parameters to yield the smallest
possible error.

Types of Gradient Descent: -

1. Batch gradient descent :


Batch gradient descent sums the error for each point in a training set, updating the
model only after all training examples have been evaluated. This process referred to
as a training epoch. While this batching provides computation efficiency, it can still
have a long processing time for large training datasets as it still needs to store all of
the data into memory. Batch gradient descent also usually produces a stable error
gradient and convergence, but sometimes that convergence point isn’t the most
ideal, finding the local minimum versus the global one.
2. Stochastic gradient descent :
Stochastic gradient descent (SGD) runs a training epoch for each example within the
dataset and it updates each training example's parameters one at a time. Since you
only need to hold one training example, they are easier to store in memory. While
these frequent updates can offer more detail and speed, it can result in losses in
computational efficiency when compared to batch gradient descent. Its frequent
updates can result in noisy gradients, but this can also be helpful in escaping the
local minimum and finding the global one.

3. Mini-batch gradient descent :


Mini-batch gradient descent combines concepts from both batch gradient descent
and stochastic gradient descent. It splits the training dataset into small batch
sizes and performs updates on each of those batches. This approach strikes a
balance between the computational efficiency of batch gradient descent and the
speed of stochastic gradient descent.

Q. 5) Why the Sigmoid function is important in neural networks?

(3 marks)

Ans: - If we use a linear activation function in a neural network, then this model can only
learn linearly separable problems. However, with the addition of just one hidden layer
and a sigmoid activation function in the hidden layer, the neural network can easily
learn a non-linearly separable problem. Using a non-linear function produces non-linear
boundaries and hence, the sigmoid function can be used in neural networks for learning
complex decision functions. The only non-linear function that can be used as an
activation function in a neural network is one which is monotonically increasing. So for
example, sin(x) or cos(x) cannot be used as activation functions. Also, the activation
function should be defined everywhere and should be continuous everywhere in the
space of real numbers. The function is also required to be differentiable over the entire
space of real numbers.

Typically a back propagation algorithm uses gradient descent to learn the weights of a
neural network. To derive this algorithm, the derivative of the activation function is
required. The fact that the sigmoid function is monotonic, continuous and differentiable
everywhere, coupled with the property that its derivative can be expressed in terms of
itself makes it easy to derive the update equations for learning the weights in a neural
network when using back propagation algorithm.

Common questions

Powered by AI

Non-linear activation functions like the sigmoid are crucial in neural networks because they enable the network to learn non-linear decision boundaries, which are essential for solving complex problems that are not linearly separable . The sigmoid function is monotonic, continuous, and differentiable across real numbers, making it suitable for use with backpropagation as it provides the necessary non-linearity and allows for the calculation of gradients needed to adjust weights . Using linear functions would limit the network to learning only linearly separable problems, thus nonlinear functions like sigmoid expand the learning capability of neural networks .

Selecting a loss function for a neural network model requires considering the model's objectives, the nature of data, and the problem domain. For regression tasks, loss functions like mean squared error are suitable due to their simplicity and the way they heavily penalize large errors . In contrast, for classification tasks, log loss or cross-entropy can be more appropriate due to the need to evaluate predicted probabilities against categorical distributions and their sensitivity to confidence in predictions . Furthermore, computational efficiency and interpretability should also be considered, as they can impact model training and evaluation balance .

Mini-batch gradient descent offers advantages over both stochastic gradient descent and batch gradient descent by combining their strengths. It achieves a balance between the convergence speed of stochastic gradient descent, which processes one example at a time, and the stable convergence of batch gradient descent, which uses the entire dataset . Mini-batch processes smaller batches of data, improving computational efficiency over full batch and reducing variance compared to stochastic, which can help in finding the global minimum more effectively and with less computation time .

Batch gradient descent differs from stochastic gradient descent primarily in computational efficiency and convergence behavior. Batch gradient descent calculates the gradient using all samples in the training set at once, leading to stable convergence but with higher computational costs and memory usage as it requires storing the entire dataset . In contrast, stochastic gradient descent updates model parameters for each training example, which introduces more noise in the convergence path but allows the algorithm to escape local minima and potentially find a global minimum faster .

Log loss, also known as cross-entropy loss, is particularly utilized in classification contexts. It provides a nuanced measure by penalizing confident but incorrect predictions more heavily than less confident ones . This characteristic makes it suitable for evaluating models in competitive settings, such as Kaggle competitions, where precision is crucial . Its reliance on predicted probabilities for true classes, employing logarithms to calculate penalties, aligns well with the probabilistic interpretation of classification problems, making it a favored choice .

Mean Squared Error (MSE) is a fundamental loss function in machine learning due to its simplicity and effectiveness. It calculates the average of squared differences between predictions and actual values, providing a clear measure of model accuracy . This makes it versatile and easy to implement, serving as a reliable indicator of how well a model performs by penalizing larger errors more heavily, thus guiding the model to minimize these discrepancies .

In feed forward neural networks, layers play distinct roles that contribute to the network's functionality. The input layer receives data and connects it to the network . Hidden layers, situated between input and output layers, execute transformations on the input data, each layer potentially adding complexity to the model by introducing non-linear processing capabilities . Finally, the output layer produces the network's prediction based on the processed input, effectively finalizing the classification, regression, or function approximation the network is tasked with .

Feed Forward Neural Networks, also known as Deep Feed Forward Networks or Multilayer Perceptrons, have an architecture that enables pattern recognition and classification by allowing information to flow forward from input nodes through hidden layers to output nodes without forming loops . This structure approximates functions through a classifier, mapping inputs to categories, and memorizing parameters that closely approximate the function . The distinct layers—input, hidden, and output—facilitate the transformation and processing of data, which is instrumental in tasks such as non-linear regression and function approximation .

Tuning of weights in neural networks is essential because it optimizes the model's ability to accurately represent the underlying structure of the input data, minimizing errors between predicted and actual outcomes . Backpropagation facilitates this process by calculating the gradient of the loss function with respect to each weight through the chain rule, enabling the adjustment of weights to decrease errors, thereby improving model accuracy and generalization . This systematic fine-tuning is crucial for achieving a model that not only learns efficiently but also generalizes well to new data .

Backpropagation improves the training of neural networks by enabling the fine-tuning of weights based on the error rate from the previous iteration . This process reduces error rates and increases model reliability through better generalization. It computes the gradient of the loss function with respect to all the weights using an efficient layer-wise method, guided by the chain rule, avoiding direct native computation . Consequently, adjustments can be made layer by layer to achieve the desired output, enhancing model accuracy and performance .

You might also like