0% found this document useful (0 votes)
16 views8 pages

XOR Backpropagation Neural Network

The document provides a Python implementation of a Back Propagation Network to solve the XOR function using binary inputs and outputs. It includes the definition of the sigmoid activation function, the training process over 10,000 epochs, and displays the network's predictions, weights, biases, and accuracy after training. The final accuracy achieved by the model is 100%.

Uploaded by

shankar
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

XOR Backpropagation Neural Network

The document provides a Python implementation of a Back Propagation Network to solve the XOR function using binary inputs and outputs. It includes the definition of the sigmoid activation function, the training process over 10,000 epochs, and displays the network's predictions, weights, biases, and accuracy after training. The final accuracy achieved by the model is 100%.

Uploaded by

shankar
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

7.

Write a program to show Back Propagation Network for XOR


function with Binary Input and Output

import numpy as np
[Link](42)
def sigmoid(x):
return 1.0 / (1.0 + [Link](-x))
def sigmoid_deriv(y):
# y is already sigmoid(x)
return y * (1.0 - y)
X = [Link]([[0, 0],
[0, 1],
[1, 0],
[1, 1]], dtype=float)
Y = [Link]([[0], [1], [1], [0]], dtype=float)
input_size = 2
hidden_size = 2
output_size = 1
lr = 0.5 # learning rate
epochs = 10000 # number of training iterations
W1 = [Link](-1.0, 1.0, (input_size, hidden_size))
B1 = [Link]((1, hidden_size))
W2 = [Link](-1.0, 1.0, (hidden_size, output_size))
B2 = [Link]((1, output_size))
for epoch in range(epochs):
Z1 = [Link](X, W1) + B1 # (4, hidden_size)
A1 = sigmoid(Z1) # hidden activations
Z2 = [Link](A1, W2) + B2 # (4, 1)
A2 = sigmoid(Z2) # output activations
loss = [Link](0.5 * (Y - A2) ** 2)
dA2 = A2 - Y # derivative of MSE wrt A2
dZ2 = dA2 * sigmoid_deriv(A2) # (4,1)
dW2 = [Link](A1.T, dZ2) / [Link][0]
dB2 = [Link](dZ2, axis=0, keepdims=True)
dA1 = [Link](dZ2, W2.T) # (4, hidden_size)
dZ1 = dA1 * sigmoid_deriv(A1)
dW1 = [Link](X.T, dZ1) / [Link][0]
dB1 = [Link](dZ1, axis=0, keepdims=True)
W2 -= lr * dW2
B2 -= lr * dB2
W1 -= lr * dW1
B1 -= lr * dB1
if epoch % 1000 == 0 or epoch == epochs - 1:
print(f"Epoch {epoch:5d} Loss: {loss:.6f}")
print("\nTrained predictions on XOR inputs:")
Z1 = [Link](X, W1) + B1
A1 = sigmoid(Z1)
Z2 = [Link](A1, W2) + B2
A2 = sigmoid(Z2)
print([Link]((X, Y, A2, [Link](A2))))
print("\nWeights and biases:")
print("W1:\n", W1)
print("B1:\n", B1)
print("W2:\n", W2)
print("B2:\n", B2)
preds = [Link](A2)
accuracy = [Link](preds == Y)
print(f"\nAccuracy (after rounding): {accuracy * 100:.1f}%")

Output:

Epoch 0 Loss: 0.143150

Epoch 1000 Loss: 0.124974

Epoch 2000 Loss: 0.124862

Epoch 3000 Loss: 0.124477

Epoch 4000 Loss: 0.121065

Epoch 5000 Loss: 0.070273

Epoch 6000 Loss: 0.018014


Epoch 7000 Loss: 0.008040

Epoch 8000 Loss: 0.004882

Epoch 9000 Loss: 0.003431

Epoch 9999 Loss: 0.002619

Trained predictions on XOR inputs:

[[0. 0. 0. 0.07022023 0. ]

[0. 1. 1. 0.92223364 1. ]

[1. 0. 1. 0.92231458 1. ]

[1. 1. 0. 0.06270003 0. ]]

Weights and biases:

W1:

[[-5.50637624 5.67926358]

[ 5.69407248 -5.46834499]]

B1:

[[2.81493644 2.78774586]]

W2:

[[-6.15130337]

[-6.15441065]]

B2:

[[9.01782245]]

Accuracy (after rounding): 100.0%

Explanation:

import numpy as np
Imports the NumPy library and gives it the short name np, so you can use
NumPy functions and arrays with np..
[Link](42)
Sets the random-number generator seed to 42 so any subsequent random
numbers (like initial weights) are reproducible every run.

def sigmoid(x):
Starts the definition of a function named sigmoid that takes one argument
x.

return 1.0 / (1.0 + [Link](-x))


Computes the sigmoid activation σ(x) = 1 / (1 + e^{-x}) elementwise for
input x; maps real values into the range (0,1).

def sigmoid_deriv(y):
Starts the definition of a function named sigmoid_deriv that expects y,
which should already be sigmoid(x).

return y * (1.0 - y)
Returns the derivative of sigmoid with respect to its input, using the
identity σ'(x) = σ(x) * (1 - σ(x)). This expects y = σ(x).

X = [Link]([[0, 0],
[0, 1],
[1, 0],
[1, 1]], dtype=float)
Creates the input matrix X as a NumPy array with four rows (samples) and
two columns (features). Each row is one XOR input pair. dtype=float
ensures numeric (floating point) math.

Y = [Link]([[0], [1], [1], [0]], dtype=float)


Creates the target/output column vector Y with four rows corresponding to
XOR outputs. It has shape (4,1) and is float for gradient math.

input_size = 2
Stores the number of input neurons/features (2) in a variable used for
shaping weights.

hidden_size = 2
Stores the number of hidden neurons (2). Two hidden units are sufficient
to represent XOR.

output_size = 1
Stores the number of output neurons (1) — the network predicts a single
scalar per input.

lr = 0.5 # learning rate


Sets the learning rate lr to 0.5; this scales how big each gradient descent
update is. The comment labels it.
epochs = 10000 # number of training iterations
Sets the number of training iterations (full passes over the dataset) to
10,000. The comment explains its meaning.

W1 = [Link](-1.0, 1.0, (input_size, hidden_size))


Initializes the input-to-hidden weight matrix W1 with random values
uniformly drawn from -1.0 to 1.0. Its shape is (2,2): rows correspond to
input features, columns to hidden neurons.

B1 = [Link]((1, hidden_size))
Initializes the hidden-layer bias B1 as a row vector of zeros with shape
(1,2). This will broadcast across the 4 samples when added.

W2 = [Link](-1.0, 1.0, (hidden_size, output_size))


Initializes the hidden-to-output weight matrix W2 randomly in [-1,1] with
shape (2,1): rows are hidden units, column is the single output unit.

B2 = [Link]((1, output_size))
Initializes the output-layer bias B2 as zeros with shape (1,1).

for epoch in range(epochs):


Begins the training loop that will run epochs times; epoch counts from 0 to
epochs-1. Each iteration performs one forward and backward pass over
the full dataset (batch gradient descent).

Z1 = [Link](X, W1) + B1 # (4, hidden_size)


Computes the pre-activation of the hidden layer: Z1 = X · W1 + B1.
[Link](X, W1) multiplies shape (4,2) × (2,2) → (4,2); adding B1 (1,2) uses
broadcasting to add the bias to every sample.

A1 = sigmoid(Z1) # hidden activations


Applies the sigmoid activation elementwise to Z1, producing hidden-layer
activations A1 with shape (4,2).

Z2 = [Link](A1, W2) + B2 # (4, 1)


Computes the pre-activation of the output layer: Z2 = A1 · W2 + B2.
Shapes: (4,2) × (2,1) → (4,1); add bias B2 (1,1) via broadcasting.

A2 = sigmoid(Z2) # output activations


Applies sigmoid to Z2 to get the network's predicted outputs A2 (4,1),
values in (0,1).

loss = [Link](0.5 * (Y - A2) ** 2)


Computes the scalar loss (Mean Squared Error): for each sample compute
0.5*(target - output)^2, then take the mean across samples. The 0.5
simplifies derivatives.
dA2 = A2 - Y # derivative of MSE wrt A2
Computes the derivative of loss w.r.t. the network output A2. For MSE
0.5*(Y-A2)^2, derivative is A2 - Y. Shape (4,1).

dZ2 = dA2 * sigmoid_deriv(A2) # (4,1)


Applies the chain rule: derivative w.r.t. pre-activation Z2 equals dA2 *
σ'(Z2). Since sigmoid_deriv expects the sigmoid output, we pass A2.
Elementwise multiply gives shape (4,1).

dW2 = [Link](A1.T, dZ2) / [Link][0]


Computes the gradient of the loss w.r.t. W2: A1^T · dZ2 yields shape
(2,1). Dividing by [Link][0] (4) averages the gradient across samples
(batch gradient).

dB2 = [Link](dZ2, axis=0, keepdims=True)


Computes gradient of the loss w.r.t. bias B2 by averaging dZ2 across
samples, resulting in shape (1,1). keepdims=True preserves 2D shape for
broadcasting consistency.

dA1 = [Link](dZ2, W2.T) # (4, hidden_size)


Backpropagates the gradient to the hidden activations: dA1 = dZ2 ·
W2^T. Shapes: (4,1) × (1,2) → (4,2). This represents how changes in
hidden activations change loss.

dZ1 = dA1 * sigmoid_deriv(A1)


Applies elementwise multiplication with the derivative of the sigmoid to
get gradient w.r.t. pre-activation Z1. sigmoid_deriv(A1) returns shape
(4,2), so dZ1 is (4,2).

dW1 = [Link](X.T, dZ1) / [Link][0]


Computes gradient w.r.t. W1 as X^T · dZ1 with shapes (2,4) × (4,2) →
(2,2), then divides by 4 to average over samples.

dB1 = [Link](dZ1, axis=0, keepdims=True)


Computes gradient w.r.t. hidden bias B1 by averaging dZ1 across samples,
returning shape (1,2).

W2 -= lr * dW2
Updates the output weights W2 by subtracting the learning-rate-scaled
gradient (gradient descent step).

B2 -= lr * dB2
Updates the output bias B2 similarly.

W1 -= lr * dW1
Updates input-to-hidden weights W1 with gradient descent.

B1 -= lr * dB1
Updates hidden bias B1 with gradient descent.
if epoch % 1000 == 0 or epoch == epochs - 1:
Checks whether to print progress: either every 1000 epochs, or the very
last epoch.

print(f"Epoch {epoch:5d} Loss: {loss:.6f}")


If the condition is met, prints the current epoch number and the loss
formatted to 6 decimal places so you can observe training progress.

print("\nTrained predictions on XOR inputs:")


After training finishes, prints a header line announcing that final
predictions follow.

Z1 = [Link](X, W1) + B1
Recomputes hidden pre-activations using the final trained W1 and B1.

A1 = sigmoid(Z1)
Computes final hidden activations.

Z2 = [Link](A1, W2) + B2
Computes final output pre-activations.

A2 = sigmoid(Z2)
Computes the final network outputs for the training inputs (raw values in
(0,1)).

print([Link]((X, Y, A2, [Link](A2))))


Horizontally stacks and prints the input X, target Y, raw output A2, and
rounded output [Link](A2) (0 or 1). This shows inputs, expected
outputs, predicted probabilities, and final discrete predictions side-by-side.

print("\nWeights and biases:")


Prints a header announcing that the learned weights and biases will be
displayed.

print("W1:\n", W1)
Prints the final W1 matrix (input→hidden weights).

print("B1:\n", B1)
Prints the final B1 bias row for the hidden layer.

print("W2:\n", W2)
Prints the final W2 matrix (hidden→output weights).

print("B2:\n", B2)
Prints the final B2 bias scalar for the output layer.

preds = [Link](A2)
Rounds the final raw outputs A2 to 0 or 1 and stores them in preds.
accuracy = [Link](preds == Y)
Computes accuracy as the mean of the boolean array preds == Y.
Booleans convert to 1/0, so the mean is the fraction of correct predictions.

print(f"\nAccuracy (after rounding): {accuracy * 100:.1f}%")


Prints the accuracy as a percentage with one decimal place

You might also like