BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
Name Adwait Patkhedkar
UID no. 2023300174
Division C
Batch B
Course Code CS303
Experiment 5: To implement Supervised Learning Algorithm (BPN)
Program
AIM: To implement Supervised Learning Algorithm (Backpropagation
Network)
THEORY:
Supervised learning is a type of machine learning where a model is
trained using labeled data. The training dataset consists of
input–output pairs, where the input features are mapped to a known
target or label. The goal of supervised learning is to learn a function
that can predict the correct output for new, unseen inputs.
It is mainly categorized into:
● Classification – predicting discrete labels (e.g., spam or not
spam).
● Regression – predicting continuous values (e.g., house prices).
The performance of a supervised model is evaluated using metrics
such as accuracy, precision, recall, or mean squared error, depending
on the problem type.
Back Propagation is also known as "Backward Propagation of Errors"
is a method used to train neural networks . Its goal is to reduce the
difference between the model’s predicted output and the actual output
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
by adjusting the weights and biases in the network.
It works iteratively to adjust weights and bias to minimize the cost
function. In each epoch the model adapts these parameters by reducing
loss by following the error gradient. It often uses optimization
algorithms like gradient descent or stochastic gradient descent. The
algorithm computes the gradient using the chain rule from calculus
allowing it to effectively navigate complex layers in the neural
network to minimize the cost function.
In BPN, the process works in two main phases:
1. Forward Pass – Inputs are multiplied by weights, added with
biases, and passed through activation functions across layers to
generate the final output.
2. Backward Pass – The error between predicted and actual output
is computed and propagated backward through the network. Using
the chain rule, gradients are calculated to update weights and
biases.
By repeating these steps over many iterations, the network gradually
minimizes error and improves prediction accuracy.
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
SIGMOID ACTIVATION FUNCTION
The sigmoid function is a commonly used activation function in neural
networks. It has an S-shaped curve and is defined as
It maps any real-valued input into the range (0, 1), making it useful
for problems where outputs need to be interpreted as probabilities.
The function is smooth and differentiable, which allows
gradient-based optimization methods like backpropagation to work
effectively.
Bipolar Sigmoid activation function
The bipolar sigmoid activation function is a smooth, S-shaped
nonlinear function that maps inputs to the range (–1, +1). It is useful
for neural networks because it introduces nonlinearity, provides
continuous differentiability for backpropagation, and helps
represent both positive and negative activations, improving
convergence compared to the standard sigmoid.
It is just the scaled version of tanh(x)
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
The graph of bipolar sigmoid over real domain
ALGORITHM
Initialize weights and biases
Forward Pass
For each layer, compute net input:
a_j = sum(w_ij * x_i) + b_j
Apply bipolar sigmoid activation function:
o_j = (1 - exp(-a_j)) / (1 + exp(-a_j))
Pass output to the next layer until final output is obtained.
Error Calculation
Compute error at the output layer:
E = 1/2 * sum((y - y_hat)^2)
where y = actual output, y_hat = predicted output (use -1 and +1 for
bipolar targets)
Backward Pass
Error term (delta) at output layer:
delta_j = (y_j - y_hat_j) * 0.5 * (1 - y_hat_j^2)
Error term for hidden layers:
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
delta_h = (sum(delta_j * w_hj)) * 0.5 * (1 - o_h^2)
Weight Update
Update each weight using learning rate eta:
w_ij(new) = w_ij(old) + eta * delta_j * o_i
Update bias:
b_j(new) = b_j(old) + eta * delta_j
Repeat steps Forward Pass → Error Calculation → Backward Pass →
Weight Update for all training samples until error converges or
maximum epochs are reached.
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
CODE:
import numpy as np
import pandas as pd
import [Link] as plt
class VerboseNeuralNetwork:
def __init__(self):
# Initialize weights based on the network diagram
# Input to hidden layer weights
self.W1 = [Link]([[0.4, 0.1], # weights from
x1,x2 to z1
[0.6, 0.4]]) # weights from
x1,x2 to z2
# Hidden to output layer weights
self.W2 = [Link]([[-0.1, 0.5]]) # weights from
z1,z2 to Y
# Bias weights
self.b1 = [Link]([0.3, -0.1]) # bias to z1, z2
self.b2 = [Link]([-0.3]) # bias to Y
# Learning rate
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
self.learning_rate = 0.5
# Store training history
self.training_history = []
self.epoch_losses = [] # Store loss for each epoch
for plotting
def bipolar_sigmoid(self, x):
"""Bipolar sigmoid activation function: tanh(x) =
(e^x - e^-x)/(e^x + e^-x)"""
# Clip x to prevent overflow
x = [Link](x, -500, 500)
return [Link](x)
def bipolar_sigmoid_derivative(self, x):
"""Derivative of bipolar sigmoid function: 1 -
tanh^2(x)"""
return 1 - x**2
def forward(self, X):
"""Forward propagation with detailed intermediate
values"""
# Input to hidden layer
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
self.z1_input = [Link](X, self.W1) + self.b1
self.z1_output = self.bipolar_sigmoid(self.z1_input)
# Hidden to output layer
self.y_input = [Link](self.z1_output, self.W2.T) +
self.b2
self.y_output = self.bipolar_sigmoid(self.y_input)
return self.y_output
def backward(self, X, y_true, y_pred):
"""Backward propagation with weight changes
tracking"""
m = [Link][0] # number of samples
# Store old weights for change calculation
old_W1 = [Link]()
old_W2 = [Link]()
old_b1 = [Link]()
old_b2 = [Link]()
# Calculate output layer error
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
output_error = y_pred - y_true
output_delta = output_error *
self.bipolar_sigmoid_derivative(y_pred)
# Calculate hidden layer error
hidden_error = output_delta.dot(self.W2)
hidden_delta = hidden_error *
self.bipolar_sigmoid_derivative(self.z1_output)
# Update weights and biases
# Output layer updates
self.W2 -= self.learning_rate *
(self.z1_output.[Link](output_delta)).T / m
self.b2 -= self.learning_rate * [Link](output_delta,
axis=0) / m
# Hidden layer updates
self.W1 -= self.learning_rate *
[Link](hidden_delta) / m
self.b1 -= self.learning_rate * [Link](hidden_delta,
axis=0) / m
# Calculate weight changes
delta_W1 = self.W1 - old_W1
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
delta_W2 = self.W2 - old_W2
delta_b1 = self.b1 - old_b1
delta_b2 = self.b2 - old_b2
return delta_W1, delta_W2, delta_b1, delta_b2
def train_verbose(self, X, y, epochs, show_every=1):
"""Train with verbose output showing all
intermediate values"""
print("Training Neural Network with Bipolar Sigmoid
Activation\n")
print("=" * 150)
for epoch in range(epochs):
epoch_data = []
total_loss = 0
# Store weights before training for this epoch
epoch_W1 = [Link]()
epoch_W2 = [Link]()
epoch_b1 = [Link]()
epoch_b2 = [Link]()
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
# Forward propagation for all samples
y_pred = [Link](X)
# Calculate loss
loss = [Link]((y_pred - y) ** 2)
total_loss = loss
# Backward propagation
delta_W1, delta_W2, delta_b1, delta_b2 =
[Link](X, y, y_pred)
# Store detailed information for each sample
for i in range(len(X)):
# Calculate intermediate values for this
sample
zin1 = X[i][0] * epoch_W1[0][0] + X[i][1] *
epoch_W1[1][0] + epoch_b1[0]
zin2 = X[i][0] * epoch_W1[0][1] + X[i][1] *
epoch_W1[1][1] + epoch_b1[1]
z1 = [Link](zin1)
z2 = [Link](zin2)
yin = z1 * epoch_W2[0][0] + z2 *
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
epoch_W2[0][1] + epoch_b2[0]
y_out = [Link](yin)
# Predicted output
y_predicted = 1 if y_out > 0 else -1
row_data = {
'Epoch': epoch + 1,
'Sample': i + 1,
'x1': X[i][0],
'x2': X[i][1],
't': y[i][0],
'zin1': f"{zin1:.4f}",
'zin2': f"{zin2:.4f}",
'z1': f"{z1:.4f}",
'z2': f"{z2:.4f}",
'yin': f"{yin:.4f}",
'Y': f"{y_out:.4f}",
'Y_pred': y_predicted,
'bias_z1': f"{epoch_b1[0]:.4f}",
'bias_z2': f"{epoch_b1[1]:.4f}",
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
'bias_Y': f"{epoch_b2[0]:.4f}",
'Loss': f"{loss:.6f}"
epoch_data.append(row_data)
# Show results for this epoch
if epoch % show_every == 0:
print(f"\nEPOCH {epoch + 1}")
print("-" * 150)
# Create DataFrame for better formatting
df = [Link](epoch_data)
print(df.to_string(index=False))
print(f"\nWeight Changes:")
print(f"ΔW1 (input to hidden):")
print(f" [{delta_W1[0][0]:+.6f},
{delta_W1[0][1]:+.6f}]")
print(f" [{delta_W1[1][0]:+.6f},
{delta_W1[1][1]:+.6f}]")
print(f"ΔW2 (hidden to output):
[{delta_W2[0][0]:+.6f}, {delta_W2[0][1]:+.6f}]")
print(f"Δb1 (hidden biases):
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
[{delta_b1[0]:+.6f}, {delta_b1[1]:+.6f}]")
print(f"Δb2 (output bias):
[{delta_b2[0]:+.6f}]")
print(f"\nCurrent Weights:")
print(f"W1: {self.W1}")
print(f"W2: {self.W2}")
print(f"b1: {self.b1}")
print(f"b2: {self.b2}")
print(f"Total Loss: {total_loss:.6f}")
print("=" * 150)
# Store history
self.training_history.extend(epoch_data)
self.epoch_losses.append(total_loss) # Store
loss for plotting
# Check for convergence
if total_loss < 1e-6:
print(f"\nConverged at epoch {epoch + 1}
with loss {total_loss:.8f}")
break
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
def predict(self, X):
"""Make predictions"""
output = [Link](X)
return [Link](output > 0, 1, -1)
def test_network(self):
"""Test the final trained network"""
print("\n" + "="*50)
print("FINAL NETWORK TESTING")
print("="*50)
X_test = [Link]([[-1, -1], [1, 1], [-1, 1], [1,
-1]])
y_test = [Link]([[-1], [-1], [1], [1]])
print("\nFinal Test Results:")
print("-" * 80)
print("x1 x2 Target Y_output Y_predicted
Correct")
print("-" * 80)
correct = 0
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
for i in range(len(X_test)):
y_out = [Link](X_test[i:i+1])[0][0]
y_pred = 1 if y_out > 0 else -1
is_correct = "✓" if y_pred == y_test[i][0] else
"✗"
if y_pred == y_test[i][0]:
correct += 1
print(f"{X_test[i][0]:2} {X_test[i][1]:2}
{y_test[i][0]:2} {y_out:+.4f} {y_pred:2}
{is_correct}")
print("-" * 80)
print(f"Accuracy: {correct}/{len(X_test)} =
{correct/len(X_test)*100:.1f}%")
def plot_training_progress(self):
"""Plot training loss and weight evolution over
epochs"""
if not self.epoch_losses:
print("No training data available for
plotting.")
return
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
# Create subplots
fig, ((ax1, ax2), (ax3, ax4)) = [Link](2, 2,
figsize=(15, 10))
[Link]('Neural Network Training Progress',
fontsize=16)
epochs = range(1, len(self.epoch_losses) + 1)
# 1. Loss curve
[Link](epochs, self.epoch_losses, 'b-',
linewidth=2, marker='o', markersize=3)
ax1.set_title('Training Loss Over Time')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Mean Squared Error')
[Link](True, alpha=0.3)
ax1.set_yscale('linear') # Using actual scale as
requested
plt.tight_layout()
[Link]()
# Additional plot: Decision boundary visualization
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
# Training data (XOR problem)
X = [Link]([[-1, -1],
[1, 1],
[-1, 1],
[1, -1]])
y = [Link]([[-1],
[-1],
[1],
[1]])
# Create and train the neural network
nn = VerboseNeuralNetwork()
print("Initial Network Configuration:")
print(f"W1 (input to hidden):\n{nn.W1}")
print(f"W2 (hidden to output):\n{nn.W2}")
print(f"b1 (hidden bias): {nn.b1}")
print(f"b2 (output bias): {nn.b2}")
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
print(f"Learning rate: {nn.learning_rate}")
# Train with verbose output (show every epoch for first 10
epochs, then every 5th)
print("\nStarting training...")
nn.train_verbose(X, y, epochs=100, show_every=1)
# Test the final network
nn.test_network()
# Plot training progress and results
print("\nGenerating training progress plots...")
nn.plot_training_progress()
# Create summary statistics
print("\n" + "="*50)
print("TRAINING SUMMARY")
print("="*50)
if nn.training_history:
df_history = [Link](nn.training_history)
final_epoch = df_history['Epoch'].max()
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
final_loss = float(df_history[df_history['Epoch'] ==
final_epoch]['Loss'].iloc[0])
print(f"Total epochs trained: {final_epoch}")
print(f"Final loss: {final_loss:.8f}")
print(f"Network successfully learned the XOR-like
function!")
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
RESULT:
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering
CONCLUSION:
In my implementation, I successfully built and trained a neural network using
the backpropagation algorithm. I observed that through a repeating cycle of a
forward pass to make a prediction and a backward pass to correct for the
error, my model systematically learned from the dataset. By plotting the
average error after each epoch, I confirmed a consistent and steady decrease
over time, which showed the learning process in action. After running the
training for a large number of epochs, I watched this error rate bottom out and
stabilize, which indicated to me that the model had successfully converged,
finally mastering the non-linear pattern required to solve the problem and
validating the effectiveness of the algorithm.
BHARATIYA VIDYA BHAVAN’S
SARDAR PATEL INSTITUTE OF TECHNOLOGY
Bhavan’s Campus, Munshi Nagar, Andheri (West), Mumbai – 400058-India
Department of Computer Engineering