0% found this document useful (0 votes)

14 views21 pages

Feedforward Neural Network Overview

The document explains the architecture and functioning of feedforward neural networks, detailing their structure, activation functions, and training processes. It covers various types of activation functions, their significance, and the impact on model performance, along with a brief overview of supervised, unsupervised, and reinforcement learning. Additionally, it discusses regularization techniques in machine learning to prevent overfitting and improve model generalization.

Uploaded by

Magam Vijitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views21 pages

Feedforward Neural Network Overview

Uploaded by

Magam Vijitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Unit-1

Q1)Explain the architecture and working of a feed-forward neural network.





Feedforward Neural Network (FNN) is a type of artificial neural network in which

information flows in a single direction i.e from the input layer through hidden layers to the
output layer without loops or feedback. It is mainly used for pattern recognition tasks like
image and speech classification.

For example in a credit scoring system, banks use an FNN which analyze users
financial profiles such as income, credit history and spending habits to determine their
creditworthiness.
Each piece of information flows through the network’s layers where various calculations are
made to produce a final score.
Structure of a Feedforward Neural Network
Feedforward Neural Networks have a structured layered design where data flows sequentially
through each layer.
1. Input Layer: The input layer consists of neurons that receive the input data. Each neuron
in the input layer represents a feature of the input data.
2. Hidden Layers: One or more hidden layers are placed between the input and output
layers. These layers are responsible for learning the complex patterns in the data. Each
neuron in a hidden layer applies a weighted sum of inputs followed by a non-linear
activation function.
3. Output Layer: The output layer provides the final output of the network. The number of
neurons in this layer corresponds to the number of classes in a classification problem or
the number of outputs in a regression problem.

Each connection between neurons in these layers has an associated weight that is adjusted
during the training process to minimize the error in predictions.
Feed Forward Neural Network
Activation Functions
Activation functions introduce non-linearity into the network enabling it to learn and model
complex data patterns.
Common activation functions include:
 Sigmoid: σ(x)=1/1+e−x.
 Tanh: tanh(x)=ex-e−x/ex+e−xx
 ReLU: ReLU(x)=max(0,x)
Training a Feedforward Neural Network
Training a Feedforward Neural Network involves adjusting the weights of the neurons to
minimize the error between the predicted output and the actual output. This process is
typically performed using back propagation and gradient descent.
1. Forward Propagation: During forward propagation the input data passes through the
network and the output is calculated.
2. Loss Calculation: The loss (or error) is calculated using a loss function such as Mean
Squared Error (MSE) for regression tasks or Cross-Entropy Loss for classification tasks.
3. Backpropagation: In backpropagation the error is propagated back through the network
to update the weights. The gradient of the loss function with respect to each weight is
calculated and the weights are adjusted using gradient descent.

Gradient Descent
Gradient Descent is an optimization algorithm used to minimize the loss function by
iteratively updating the weights in the direction of the negative gradient. Common variants of
gradient descent include:

 Batch Gradient Descent: Updates weights after computing the gradient over the entire
dataset.
 Stochastic Gradient Descent (SGD): Updates weights for each training example
individually.
 Mini-batch Gradient Descent: It Updates weights after computing the gradient over a
small batch of training examples.
Evaluation of Feedforward neural network
Evaluating the performance of the trained model involves several metrics:
 Accuracy: The proportion of correctly classified instances out of the total instances.
 Precision: The ratio of true positive predictions to the total predicted positives.
 Recall: The ratio of true positive predictions to the actual positives.
 F1 Score: The harmonic mean of precision and recall, providing a balance between the
two.
 Confusion Matrix: A table used to describe the performance of a classification model,
showing the true positives, true negatives, false positives and false negatives.
Implementation of Feedforward Neural Network
This code demonstrates the process of building, training and evaluating a neural network
model using TensorFlow and Keras to classify handwritten digits from the MNIST dataset.
The model architecture is defined using the Sequential consisting of:
 a Flatten layer to convert the 2D image input into a 1D array
 a Dense layer with 128 neurons and ReLU activation
 a final Dense layer with 10 neurons and softmax activation to output probabilities for
each digit class.
Model is compiled with
 Adam optimizer
 Sparse Categorical Crossentropy loss function
 Sparse Categorical Accuracy metric
 Then trained for 5 epochs on the training data

import tensorflow as tf
from [Link] import Sequential
from [Link] import Dense, Flatten
from [Link] import Adam
from [Link] import SparseCategoricalCrossentropy
from [Link] import SparseCategoricalAccuracy

# Load and prepare the MNIST dataset

mnist = [Link]
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build the model

model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
# Compile the model
[Link](optimizer=Adam(),
loss=SparseCategoricalCrossentropy(),
metrics=[SparseCategoricalAccuracy()])
# Train the model
[Link](x_train, y_train, epochs=5)
# Evaluate the model
test_loss, test_acc = [Link](x_test, y_test)
print(f'\nTest accuracy: {test_acc}')
Output:
Test accuracy: 0.9767000079154968
____________________________*********_______________________

[Link] is an activation function? Explain different types with examples.

An activation function in a neural network is a mathematical function applied to the output

of a neuron. It introduces non-linearity, enabling the model to learn and represent complex
data patterns. Without it, even a deep neural network would behave like a simple linear
regression model.
Activation functions decide whether a neuron should be activated based on the weighted
sum of inputs and a bias term. They also make back propagation possible by providing
gradients for weight updates.
Activation Functions in neural Networks

Why Non-Linearity is Important

 Real-world data is rarely linearly separable.
 Non-linear functions allow neural networks to form curved decision boundaries , making
them capable of handling complex patterns (e.g., classifying apples vs. bananas under
varying colors and shapes).
 They ensure networks can model advanced problems like image recognition, NLP and
speech processing.
Mathematical Example
Consider a neural network with:
 Inputs: i1, i2
 Hidden layer: neurons h1 and h2
 Output layer: one neuron (output)
 Weights: w1, w2, w3, w4, w5, w6
 Biases: b1 for hidden layer, b2 for output layer
neural network

The hidden layer outputs are:

h1=i1.w1+i2.w3+b1
h2=i1.w2+i2.w4+b2
The output before activation is:
output=h1.w5+h2.w6+bias
Without activation, these are linear equations.
To introduce non-linearity, we apply a sigmoid activation:
σ(x)=1/1+e−x
final output=σ(h1.w5+h2.w6+bias)
This gives the final output of the network after applying the sigmoid activation function in
output layers, introducing the desired non-linearity.
Types of Activation Functions in Deep Learning
1. Linear Activation Function
Linear Activation Function resembles straight line define by y=x. No matter how many
layers the neural network contains if they all use linear activation functions the output is a
linear combination of the input.
 The range of the output spans from(−∞ to +∞).
 Linear activation function is used at just one place i.e. output layer.
 Using linear activation across all layers makes the network's ability to learn complex
patterns limited.
Linear activation functions are useful for specific tasks but must be combined with non-
linear functions to enhance the neural network’s learning and predictive capabilities.
Linear Activation Function or Identity Function returns the input as the output

2. Non-Linear Activation Functions

1. Sigmoid Function
Sigmoid Activation Function is characterized by 'S' shape. It is mathematically defined as
A=1/1+e−x. This formula ensures a smooth and continuous output that is essential for
gradient-based optimization methods.
 It allows neural networks to handle and model complex patterns that linear equations
cannot.
 The output ranges between 0 and 1, hence useful for binary classification.
 The function exhibits a steep gradient when x values are between -2 and 2. This
sensitivity means that small changes in input x can cause significant changes in output y
which is critical during the training process.

Sigmoid or Logistic Activation Function Graph

2. Tanh Activation Function
Tanh function(hyperbolic tangent function) is a shifted version of the sigmoid, allowing it
to stretch across the y-axis. It is defined as:
f(x)=tanh(x)=(2/1+e−2x)-1.
Alternatively, it can be expressed using the sigmoid function:
tanh(x)=2×sigmoid(2x)−1
 Value Range: Outputs values from -1 to +1.
 Non-linear: Enables modeling of complex data patterns.
 Use in Hidden Layers : Commonly used in hidden layers due to its zero-centered output,
facilitating easier learning for subsequent layers.

T
anh Activation Function

3. ReLU (Rectified Linear Unit) Function

ReLU activation is defined by A(x)=max(0,x), this means that if the input x is positive,
ReLU returns x, if the input is negative, it returns 0.
 Value Range: [0,∞), meaning the function only outputs non-negative values.
 Nature: It is a non-linear activation function, allowing neural networks to learn complex
patterns and making backpropagation more efficient.
 Advantage over other Activation: ReLU is less computationally expensive than tanh and
sigmoid because it involves simpler mathematical operations. At a time only a few
neurons are activated making the network sparse making it efficient and easy for
computation.
ReL
U Activation Function

d) Leaky ReLU
f(x)={x, x>0
αx, x≤0
 Leaky ReLU is similar to ReLU but allows a small negative slope (αα, e.g., 0.01)
instead of zero.
 Solves the “dying ReLU” problem, where neurons get stuck with zero outputs.
 Range: (−∞,∞).
 Preferred in some cases for better gradient flow.
Leaky ReLU Activation Function

3. Exponential Linear Units

1. Softmax Function
Softmax function is designed to handle multi-class classification problems. It transforms
raw output scores from a neural network into probabilities. It works by squashing the
output values of each class into the range of 0 to 1 while ensuring that the sum of all
probabilities equals 1.
 Softmax is a non-linear activation function.
 The Softmax function ensures that each class is assigned a probability, helping to
identify which class the input belongs to.
Softmax Activation Function

2. SoftPlus Function
Softplus function is defined mathematically as: A(x)=log(1+ex).
This equation ensures that the output is always positive and differentiable at all points
which is an advantage over the traditional ReLU function.
 Nature: The Softplus function is non-linear.
 Range: The function outputs values in the range (0,∞), similar to ReLU, but without the
hard zero threshold that ReLU has.
 Smoothness: Softplus is a smooth, continuous function, meaning it avoids the sharp
discontinuities of ReLU which can sometimes lead to problems during optimization.
Softpl
us Activation Function

Impact of Activation Functions on Model Performance

The choice of activation function has a direct impact on the performance of a neural
network in several ways:
1. Convergence Speed: Functions like ReLU allow faster training by avoiding the
vanishing gradient problem while Sigmoid and Tanh can slow down convergence in
deep networks.
2. Gradient Flow: Activation functions like ReLU ensure better gradient flow, helping
deeper layers learn effectively. In contrast Sigmoid can lead to small gradients,
hindering learning in deep layers.
3. Model Complexity: Activation functions like Softmax allow the model to handle
complex multi-class problems, whereas simpler functions like ReLU or Leaky ReLU
are used for basic layers.
__________________________ ******___________________________

[Link] short notes on supervised, unsupervised, and reinforcement learning.

Supervised, Unsupervised, and Reinforcement Learning (Short Notes with Examples)

1. Supervised Learning

Supervised learning is a type of machine learning where the model is trained using labeled
data. Each training example consists of an input and a corresponding correct output. The goal
is to learn a mapping from inputs to outputs so that the model can predict outcomes for new
data.
Common tasks: Classification and Regression
Examples:

 Predicting student results (pass/fail) based on marks and attendance

 Email spam detection (spam or not spam)
 House price prediction using features like area, location, and number of rooms

Algorithms: Linear Regression, Logistic Regression, Decision Tree, KNN, SVM

2. Unsupervised Learning

Unsupervised learning uses unlabeled data. The model tries to find hidden patterns,
structures, or relationships in the data without any predefined output.

Common tasks: Clustering and Association

Examples:

 Customer segmentation in marketing

 Grouping students based on performance
 Market basket analysis (items frequently bought together)

Algorithms: K-Means Clustering, Hierarchical Clustering, Apriori Algorithm, PCA

3. Reinforcement Learning

Reinforcement learning is a learning method where an agent interacts with an environment

and learns by trial and error. The agent receives rewards or penalties based on its actions
and aims to maximize the total reward over time.

Key elements: Agent, Environment, Action, Reward

Examples:

 Game playing (Chess, Ludo, Video games)

 Robot navigation
 Traffic signal control systems

Algorithms: Q-Learning, SARSA, Deep Q-Network (DQN)

✅ Summary:

 Supervised Learning: Learns from labeled data

 Unsupervised Learning: Finds patterns in unlabeled data
 Reinforcement Learning: Learns by interacting with the environment using rewards

___________________________*******_____________________________________

Regularization in Machine Learning

Last Updated : 11 Dec, 2025





Regularization is a technique used in machine learning to prevent overfitting, which

otherwise causes models to perform poorly on unseen data. By adding a penalty for
complexity, regularization encourages simpler and more generalizable models.
 Prevents overfitting: Adds constraints to the model to reduce the risk of memorizing noise
in the training data.
 Improves generalization: Encourages simpler models that perform better on new, unseen
data.
Regularization in Machine Learning

Types of Regularization
There are mainly 3 types of regularization techniques, each applying penalties in different
ways to control model complexity and improve generalization.
1. Lasso Regression
A regression model which uses the L1 Regularization technique is called LASSO (Least
Absolute Shrinkage and Selection Operator) regression. It adds the absolute value of
magnitude of the coefficient as a penalty term to the loss function(L). This penalty can shrink
some coefficients to zero which helps in selecting only the important features and ignoring
the less important ones.

Cost=1/n∑ (yi−yi^)2+λ∑ ∣wi∣

n m

i=1 i=1
Where
 m: Number of Features
 n: Number of Examples
 yyi: Actual Target Value
 y^i: Predicted Target Value
Note: These formulas apply to linear models. In neural networks, the number of weights is
much larger than the number of features, but the same regularization principles (L1, L2) still
apply on all weights.
Lets see how to implement this using python:
 X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42) :
Generates a regression dataset with 100 samples, 5 features and some noise.
 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) :
Splits the data into 80% training and 20% testing sets.
 lasso = Lasso(alpha=0.1): Creates a Lasso regression model with regularization strength
alpha set to 0.1.
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from [Link] import make_regression
from [Link] import mean_squared_error

X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

lasso = Lasso(alpha=0.1)
[Link](X_train, y_train)

y_pred = [Link](X_test)

mse = mean_squared_error(y_test, y_pred)

print(f"Mean Squared Error: {mse}")

print("Coefficients:", lasso.coef_)
Output:

Lasso Regression

The output shows the model's prediction error and the importance of features with some
coefficients reduced to zero due to L1 regularization.
2. Ridge Regression
A regression model that uses the L2 regularization technique is called Ridge regression. It
adds the squared magnitude of the coefficient as a penalty term to the loss function(L). It
handles multicollinearity by shrinking the coefficients of correlated features instead of
eliminating them.
n m
Cost=1/n∑ (yi−yi^)2+λ∑ wi2
i=1 i=1
Where,
 n: Number of examples or data points
 m: Number of features i.e predictor variables
 yi: Actual target value for the ith example
 y^i: Predicted target value for the ith example
 wi: Coefficients of the features
 λ: Regularization parameter that controls the strength of regularization
Lets see how to implement this using python:
 ridge = Ridge(alpha=1.0): Creates a Ridge regression model with regularization strength
alpha set to 1.0.
from sklearn.linear_model import Ridge
from [Link] import make_regression
from sklearn.model_selection import train_test_split
from [Link] import mean_squared_error

X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

ridge = Ridge(alpha=1.0)
[Link](X_train, y_train)
y_pred = [Link](X_test)

mse = mean_squared_error(y_test, y_pred)

print("Mean Squared Error:", mse)
print("Coefficients:", ridge.coef_)
Output:

Ridge Regression

The output shows the MSE showing model performance. Lower MSE means better accuracy.
The coefficients reflect the regularized feature weights.
3. Elastic Net Regression
Elastic Net Regression is a combination of both L1 as well as L2 regularization. That shows
that we add the absolute norm of the weights as well as the squared measure of the weights.
With the help of an extra hyperparameter that controls the ratio of the L1 and L2
regularization.

Cost=1/n∑ (yi−yi^)2+λ((1−α)∑ /wi∣+α∑ wi2

n m m

i=1 i=1 i=1

Where
 n: Number of examples (data points)
 m: Number of features (predictor variables)
 yi: Actual target value for the ith example
 y^i: Predicted target value for the ith example
 wi: Coefficients of the features
 λ: Regularization parameter that controls the strength of regularization
 α: Mixing parameter where 0≤α≤10≤α≤1 and α= 1 corresponds to Lasso (L1)
regularization, αα= 0 corresponds to Ridge (L2) regularization and Values between 0 and
1 provide a balance of both L1 and L2 regularization
Lets see how to implement this using python:
 model = ElasticNet(alpha=1.0, l1_ratio=0.5) : Creates an Elastic Net model with
regularization strength alpha=1.0 and L1/L2 mixing ratio 0.5.
from sklearn.linear_model import ElasticNet
from [Link] import make_regression
from sklearn.model_selection import train_test_split
from [Link] import mean_squared_error

X, y = make_regression(n_samples=100, n_features=10, noise=0.1, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = ElasticNet(alpha=1.0, l1_ratio=0.5)

[Link](X_train, y_train)

y_pred = [Link](X_test)
mse = mean_squared_error(y_test, y_pred)

print("Mean Squared Error:", mse)

print("Coefficients:", model.coef_)
Output:
Elastic Net Regression

The output shows MSE which measures how far off predictions are from actual values (lower
is better) and coefficients show feature importance.
Benefits of Regularization
Now, let’s see various benefits of regularization which are as follows:
 Prevents Overfitting: Regularization helps models focus on underlying patterns instead of
memorizing noise in the training data.
 Enhances Performance: Prevents excessive weighting of outliers or irrelevant features
helps in improving overall model accuracy.
 Stabilizes Models: Reduces sensitivity to minor data changes which ensures consistency
across different data subsets.
 Prevents Complexity: Keeps model from becoming too complex which is important for
limited or noisy data.
 Handles Multicollinearity: Reduces the magnitudes of correlated coefficients helps in
improving model stability.
 Promotes Consistency: Ensures reliable performance across different datasets which
reduces the risk of large performance shifts.
Model Selection for Machine Learning
Last Updated : 06 Aug, 2025





Machine learning (ML) is a field that enables computers to learn patterns from data and make
predictions without being explicitly programmed. However, one of the most crucial aspects
of machine learning is selecting the right model for a given problem. This process is called
model selection. The choice of model significantly affects the accuracy, efficiency and
reliability of predictions. A bad model can cause overfitting or underfitting and sometimes
even lead to increased computational costs.
In this article, we are going to deeply explore into the process of model selection, its
importance and techniques used to determine the best-performing machine learning model for
different problems.
Importance of Model Selection
Model selection is a key step in machine learning because it affects how well a system can
learn from data and make accurate predictions. Different models have different ways of
processing data and choosing the right one ensures that the system works efficiently. A
simple model cannot capture details and has poor accuracy, while a model too complex might
overfit that is doing very well on training data but fails on new data. The goal is to find a
model that learns patterns effectively without being too simple or too complex.
 Proper model selection involves experimenting with different models and comparing their
performance using evaluation metrics such as accuracy, precision, recall or mean squared
error. These metrics help in determining which model is best suited for a given task.
 Apart from performance metrics, other factors such as training time, dataset size and
available computing power also play a crucial role in choosing the right model.
 Selecting an appropriate model not only improves prediction accuracy but also enhances
efficiency, making the system faster and more reliable. This ensures that AI-driven
applications perform well in real-world scenarios.
Steps in Model Selection
Understanding the Problem and Data
Before selecting a model, it is important to first analyze the problem we are trying to solve.
The initial step is to determine whether it is a regression problem, where the goal is to predict
continuous values like house prices. If the task involves predicting categorical labels, such as
distinguishing between spam and non-spam emails, it falls under classification problem. On
the other hand, if the objective is to group similar data points, like segmenting customers
based on behavior, then it is a clustering problem. Understanding the type of problem helps in
choosing the most suitable machine learning model.
Another important point is a bit about the nature of the dataset itself. One has to check for
missing values, the number of numerical and categorical variables and the distribution of
data. Understanding the type of problem and the dataset helps in choosing the most suitable
machine learning model.
Selecting Suitable Models
After understanding the problem, we then choose a best model that should solve the problem.
Different types of models work better for different kinds of problems:
 For Regression: Linear Regression, Decision Trees, Random Forest, Neural Networks.
 For Classification: Logistic Regression, Support Vector Machines (SVM), k-Nearest
Neighbors (k-NN), Neural Networks.
 For Clustering: k-Means, Hierarchical Clustering, DBSCAN.
Model Evaluation
Once we have identified the right models, we must rank each one according to how well it
does the job. The most common method is to split the dataset into two parts.
 Training Set: The data used to train a machine learning model by learning patterns and
relationships.
 Testing Set: This checks how well a model performs over new, unseen data.
We use k-fold cross-validation to further improve the evaluation. In k-fold cross-validation,
the data is split into k subsets. The model is trained on k-1 subsets and tested on the
remaining one, repeating the process k times. This way, our evaluation is not biased by a
particular train-test split.
Different machine learning problems require different evaluation metrics.
 For Regression Problems: We make use of Mean Squared Error (MSE), Mean Absolute
Error (MAE) and R-squared.
 For Classification Problems: We make use of Accuracy, Precision, Recall and F1-score.
After evaluating the models, we compare them to identify the one that satisfies performance
and computational efficiency.
Model Selection Techniques in Machine Learning
Grid Search
One of the simplest and most commonly used model selection techniques is grid search. In
this approach, systematically different combinations of hyperparameters are tried and that
gives the best performance chosen. It can be effective, but the main drawback will be
computationally intensive, especially for complex models and many parameters.
Random Search
Similar to grid search, random search doesn't check all possible combinations. Instead, it
randomly chooses a subset of the hyperparameter combinations. The random search method
often runs much faster than the grid search method and yet achieves equally good results.
Bayesian Optimization
Bayesian optimization is a smarter approach to model selection. Instead of just randomly
searching for the best hyperparameters, it uses probability models to predict which
parameters are likely to perform best and focuses on evaluating those. This method is
efficient and often finds better results than grid or random search.
Cross-Validation Based Selection
This method involves using cross-validation to evaluate multiple models and selecting the
one with the best average performance. Instead of relying on a single train-test split, cross-
validation divides the dataset into multiple parts and trains the model on different subsets.
This helps to ensure that the model’s performance is not just due to a specific split of data. By
averaging the results from different splits, we get how well the model will perform on new,
unseen data. This approach reduces the risk of overfitting and helps in selecting a good
model.

Dropout in Neural Networks

Last Updated : 12 Jul, 2025





The concept of Neural Networks is inspired by the neurons in the human brain and scientists
wanted a machine to replicate the same process. This craved a path to one of the most
important topics in Artificial Intelligence. A Neural Network (NN) is based on a collection of
connected units or nodes called artificial neurons, which loosely model the neurons in a
biological brain. Since such a network is created artificially in machines, we refer to that as
Artificial Neural Networks (ANN). This article assumes that you have a decent knowledge of
ANN. More about ANN can be found here. Now, let us go narrower into the details
of Dropout in ANN.

[Link](
rate
)

# rate: Float between 0 and 1.

# The fraction of the input units to drop.
Problem: When a fully-connected layer has a large number of neurons, co-adaptation is more
likely to happen. Co-adaptation refers to when multiple neurons in a layer extract the same,
or very similar, hidden features from the input data. This can happen when the connection
weights for two different neurons are nearly identical.

This poses two different problems to our model:

 Wastage of machine's resources when computing the same output.
 If many neurons are extracting the same features, it adds more significance to those
features for our model. This leads to overfitting if the duplicate extracted features are
specific to only the training set.
Solution to the problem: As the title suggests, we use dropout while training the NN to
minimize co-adaptation. In dropout, we randomly shut down some fraction of a layer’s
neurons at each training step by zeroing out the neuron values. The fraction of neurons to be
zeroed out is known as the dropout rate, rd rd . The remaining neurons have their values
multiplied by 11−rd 1−rd1 so that the overall sum of the neuron values remains the
same.
The two images represent dropout applied to a layer of 6 units, shown at multiple training
steps. The dropout rate is 1/3, and the remaining 4 neurons at each training step have their
value scaled by x1.5. Thereby, we are choosing a random sample of neurons rather than
training the whole network at once. This ensures that the co-adaptation is solved and they
learn the hidden features better.
Why dropout works?
 By using dropout, in every iteration, you will work on a smaller neural network than the
previous one and therefore, it approaches regularization.
 Dropout helps in shrinking the squared norm of the weights and this tends to a reduction
in overfitting.
Dropout can be applied to a network using TensorFlow APIs as follows:

[Link](
rate
)

# rate: Float between 0 and 1.

# The fraction of the input units to drop.

Feedforward Neural Network
No ratings yet
Feedforward Neural Network
5 pages
Introduction to Deep Learning Concepts
No ratings yet
Introduction to Deep Learning Concepts
138 pages
Information Flow in Feed Forward Networks
No ratings yet
Information Flow in Feed Forward Networks
41 pages
Feed Forward Neural Networks Explained
No ratings yet
Feed Forward Neural Networks Explained
19 pages
Feedforward Neural Network
No ratings yet
Feedforward Neural Network
16 pages
Understanding RMSProp in Deep Learning
No ratings yet
Understanding RMSProp in Deep Learning
131 pages
Introduction To ANN
No ratings yet
Introduction To ANN
85 pages
Feedforward Neural Network
No ratings yet
Feedforward Neural Network
4 pages
Neural Networks: Basics and Training Techniques
No ratings yet
Neural Networks: Basics and Training Techniques
38 pages
Multi-Layer Feed-Forward Neural Networks
No ratings yet
Multi-Layer Feed-Forward Neural Networks
4 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
27 pages
Feedforward Neural Networks Overview
No ratings yet
Feedforward Neural Networks Overview
58 pages
Understanding Deep Neural Networks
No ratings yet
Understanding Deep Neural Networks
24 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Unit - 1 - DL Intro
No ratings yet
Unit - 1 - DL Intro
18 pages
Understanding Multilayer Feedforward Networks
No ratings yet
Understanding Multilayer Feedforward Networks
9 pages
Understanding Shallow Neural Networks
No ratings yet
Understanding Shallow Neural Networks
44 pages
Understanding Feedforward Neural Networks
No ratings yet
Understanding Feedforward Neural Networks
44 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
48 pages
Overfitting vs Underfitting Explained
No ratings yet
Overfitting vs Underfitting Explained
35 pages
Neural Network
No ratings yet
Neural Network
16 pages
Understanding Perceptrons and MLPs
No ratings yet
Understanding Perceptrons and MLPs
13 pages
Deep Learning: Feed Forward Neural Networks
No ratings yet
Deep Learning: Feed Forward Neural Networks
18 pages
Nerural Network Practical 1
No ratings yet
Nerural Network Practical 1
6 pages
DL Notes
No ratings yet
DL Notes
13 pages
Understanding Feedforward Neural Networks
No ratings yet
Understanding Feedforward Neural Networks
25 pages
Artificial Neural Network Unit1
No ratings yet
Artificial Neural Network Unit1
20 pages
Neural Networks V2 FFU
No ratings yet
Neural Networks V2 FFU
32 pages
Feed Forward Neural Networks Explained
No ratings yet
Feed Forward Neural Networks Explained
10 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
138 pages
MLP Basics: Structure and Training
No ratings yet
MLP Basics: Structure and Training
34 pages
ANN-SL-II Lab Manual
No ratings yet
ANN-SL-II Lab Manual
30 pages
Understanding Feedforward Neural Networks
No ratings yet
Understanding Feedforward Neural Networks
119 pages
Introduction to Artificial Neural Networks
No ratings yet
Introduction to Artificial Neural Networks
18 pages
Lec 2 PDF
No ratings yet
Lec 2 PDF
43 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
19 pages
Neural Networks for Deep Learning Overview
No ratings yet
Neural Networks for Deep Learning Overview
31 pages
Neural Network Activation Functions
No ratings yet
Neural Network Activation Functions
29 pages
Deep Learning Interview Questions Guide
No ratings yet
Deep Learning Interview Questions Guide
28 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
57 pages
Understanding Feed-Forward Neural Networks
No ratings yet
Understanding Feed-Forward Neural Networks
46 pages
Gradient-Based Learning in Deep Learning
100% (1)
Gradient-Based Learning in Deep Learning
12 pages
Understanding Activation Functions in ANN
No ratings yet
Understanding Activation Functions in ANN
31 pages
Deep Learning Fundamentals and Techniques
No ratings yet
Deep Learning Fundamentals and Techniques
20 pages
Understanding Feed Forward Neural Networks
No ratings yet
Understanding Feed Forward Neural Networks
34 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
40 pages
MLP and Activation Functions Overview
No ratings yet
MLP and Activation Functions Overview
34 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
6 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
4 pages
Machine Learning vs. Deep Learning Explained
No ratings yet
Machine Learning vs. Deep Learning Explained
24 pages
Overview of Binary Step Activation Function
No ratings yet
Overview of Binary Step Activation Function
7 pages
Module 1 - ANN
No ratings yet
Module 1 - ANN
110 pages
Deep Learning Fundamentals Overview
No ratings yet
Deep Learning Fundamentals Overview
137 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
19 pages
Unit1 ANN Week3
No ratings yet
Unit1 ANN Week3
54 pages
Web Interface Unit 3
No ratings yet
Web Interface Unit 3
41 pages
Overview of Data Structures and Types
No ratings yet
Overview of Data Structures and Types
14 pages
Understanding RMSE in Machine Learning
No ratings yet
Understanding RMSE in Machine Learning
75 pages
Stacks and Queues: Concepts and Operations
No ratings yet
Stacks and Queues: Concepts and Operations
27 pages
Indian Airforce Cybercrime Overview
No ratings yet
Indian Airforce Cybercrime Overview
33 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
11 pages
Dubai Medical Test
No ratings yet
Dubai Medical Test
2 pages
Research and Analytics Leader Resume
No ratings yet
Research and Analytics Leader Resume
3 pages
Result of B.tech - 4th Semester Computer Engg. 21102024
No ratings yet
Result of B.tech - 4th Semester Computer Engg. 21102024
15 pages
CBSE Polynomial Practice Paper Guide
No ratings yet
CBSE Polynomial Practice Paper Guide
27 pages
gMLP: MLPs vs Transformers in AI
No ratings yet
gMLP: MLPs vs Transformers in AI
12 pages
IITBuZz 30 MCQ Domain Wise Assessment
No ratings yet
IITBuZz 30 MCQ Domain Wise Assessment
3 pages
Simultaneous Diagonalization in Quantum Mechanics
No ratings yet
Simultaneous Diagonalization in Quantum Mechanics
1 page
Sankshema Scholarship Overview 1999-2001
No ratings yet
Sankshema Scholarship Overview 1999-2001
13 pages
Teaching Ezra Pound's "A Girl" Poem
No ratings yet
Teaching Ezra Pound's "A Girl" Poem
3 pages
Fingerspelling Advanced Teachers Guide
No ratings yet
Fingerspelling Advanced Teachers Guide
119 pages
Mentoring Tool
No ratings yet
Mentoring Tool
23 pages
Scan API
No ratings yet
Scan API
183 pages
Template Ijiep 2025
No ratings yet
Template Ijiep 2025
6 pages
HR Practices in IT & ITES Organizations
No ratings yet
HR Practices in IT & ITES Organizations
77 pages
English Diagnostic Test for Students
No ratings yet
English Diagnostic Test for Students
4 pages
Clemens, 2017 PDF
No ratings yet
Clemens, 2017 PDF
17 pages
Mizan Tepi University Exam Programs List
No ratings yet
Mizan Tepi University Exam Programs List
7 pages
Education's Role in Nigeria's Economic Growth
No ratings yet
Education's Role in Nigeria's Economic Growth
103 pages
The Little Prince: A Timeless Tale of Love
No ratings yet
The Little Prince: A Timeless Tale of Love
2 pages
RPMS-PPST Evaluation Checklist 2021-2022
No ratings yet
RPMS-PPST Evaluation Checklist 2021-2022
7 pages
Four-Wheeler Service Assistant Curriculum
No ratings yet
Four-Wheeler Service Assistant Curriculum
36 pages
CBTE Training Module for Jordan Teachers
No ratings yet
CBTE Training Module for Jordan Teachers
13 pages
Leander Dsouza: MCA Data Analyst Resume
No ratings yet
Leander Dsouza: MCA Data Analyst Resume
1 page
February/March 2024 Test Schedule
No ratings yet
February/March 2024 Test Schedule
2 pages
Understanding School as a Social Organization
No ratings yet
Understanding School as a Social Organization
14 pages
Banana Genome Evolution Insights
No ratings yet
Banana Genome Evolution Insights
34 pages
Computational Thermal Fluid Science Syllabus
No ratings yet
Computational Thermal Fluid Science Syllabus
3 pages
CAT Games and Tournaments Questions
No ratings yet
CAT Games and Tournaments Questions
26 pages
Child Food Poverty: 2024 Nutrition Data
No ratings yet
Child Food Poverty: 2024 Nutrition Data
38 pages
Vanished! Full
100% (1)
Vanished! Full
87 pages

Feedforward Neural Network Overview

Uploaded by

Feedforward Neural Network Overview

Uploaded by

Unit-1

Q1)Explain the architecture and working of a feed-forward neural network.

Feedforward Neural Network (FNN) is a type of artificial neural network in which

# Load and prepare the MNIST dataset

# Build the model

[Link] is an activation function? Explain different types with examples.

An activation function in a neural network is a mathematical function applied to the output

Why Non-Linearity is Important

The hidden layer outputs are:

2. Non-Linear Activation Functions

Sigmoid or Logistic Activation Function Graph

3. ReLU (Rectified Linear Unit) Function

3. Exponential Linear Units

Impact of Activation Functions on Model Performance

[Link] short notes on supervised, unsupervised, and reinforcement learning.

Supervised, Unsupervised, and Reinforcement Learning (Short Notes with Examples)

 Predicting student results (pass/fail) based on marks and attendance

Algorithms: Linear Regression, Logistic Regression, Decision Tree, KNN, SVM

Common tasks: Clustering and Association

 Customer segmentation in marketing

Algorithms: K-Means Clustering, Hierarchical Clustering, Apriori Algorithm, PCA

Reinforcement learning is a learning method where an agent interacts with an environment

Key elements: Agent, Environment, Action, Reward

 Game playing (Chess, Ludo, Video games)

Algorithms: Q-Learning, SARSA, Deep Q-Network (DQN)

 Supervised Learning: Learns from labeled data

Regularization in Machine Learning

Regularization is a technique used in machine learning to prevent overfitting, which

Cost=1/n∑ (yi−yi^)2+λ∑ ∣wi∣

X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)

mse = mean_squared_error(y_test, y_pred)

X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)

mse = mean_squared_error(y_test, y_pred)

Cost=1/n∑ (yi−yi^)2+λ((1−α)∑ /wi∣+α∑ wi2

i=1 i=1 i=1

X, y = make_regression(n_samples=100, n_features=10, noise=0.1, random_state=42)

model = ElasticNet(alpha=1.0, l1_ratio=0.5)

print("Mean Squared Error:", mse)

Dropout in Neural Networks

# rate: Float between 0 and 1.

This poses two different problems to our model:

# rate: Float between 0 and 1.

You might also like