0% found this document useful (0 votes)
7 views15 pages

Backpropagation MLP for XOR Problem

This report details the implementation of the Backpropagation Algorithm in a Multi-Layer Perceptron (MLP) to solve the XOR logic problem, achieving 100% accuracy. It covers the theoretical framework, methodology, and results, including comprehensive evaluation metrics and interactive web deployment. Key features include a three-layer architecture, gradient descent optimization, and detailed documentation of the training process and performance metrics.

Uploaded by

Ajmal Jazz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views15 pages

Backpropagation MLP for XOR Problem

This report details the implementation of the Backpropagation Algorithm in a Multi-Layer Perceptron (MLP) to solve the XOR logic problem, achieving 100% accuracy. It covers the theoretical framework, methodology, and results, including comprehensive evaluation metrics and interactive web deployment. Key features include a three-layer architecture, gradient descent optimization, and detailed documentation of the training process and performance metrics.

Uploaded by

Ajmal Jazz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Backpropagation Algorithm for Multi-Layer Perceptron: XOR

Logic Implementation
A Comprehensive Implementation Study

Author: N J Ajmal
Date: October 19, 2025
Course: Neural Networks

Executive Summary

This report presents a complete implementation of the Backpropagation Algorithm applied to a Multi-Layer
Perceptron (MLP) for solving the XOR logic problem. The implementation demonstrates fundamental neural
network concepts including forward propagation, backpropagation, gradient descent optimization, comprehensive
evaluation metrics, and web-based deployment. This study achieves 100% accuracy on the XOR problem through a
carefully designed three-layer neural network architecture trained using the backpropagation algorithm from
scratch.

Key Achievements:

Perfect classification accuracy (100%) on all XOR

patterns Final loss < 0.001 after 10,000 training

epochs Comprehensive evaluation using 8+

metrics

Interactive web deployments (Streamlit and Flask)

Complete mathematical formulations and

visualizations

1. Introduction

1.1 Background and Motivation

The XOR (Exclusive OR) problem represents a fundamental challenge in neural network theory that historically
demonstrated the limitations of single-layer perceptrons and established the necessity for multi-layer
architectures[27][30][36]. First highlighted by Marvin Minsky and Seymour Papert in 1969, the XOR problem
revealed that certain simple logical functions cannot be computed by single-layer networks, leading to the "AI
winter" of the 1970s.

The XOR function is defined by a truth table where the output is true (1) only when inputs differ:

X₁ X₂ Y (XOR)

0 0 0

0 1 1

1 0 1

1 1 0

This function cannot be separated by a single linear boundary, necessitating a multi-layer neural network with at
least one hidden layer containing non-linear activation functions[30][36].
1.2 Problem Statement

The challenge is to implement a neural network that learns the XOR mapping through iterative weight adjustments
using the backpropagation algorithm. The network must discover the non-linear decision boundary that correctly
classifies all four input combinations without explicit programming of the XOR logic.

1.3 Research Objectives

This comprehensive study aims to accomplish the following objectives:

1. Implement a Multi-Layer Perceptron neural network from scratch without using high-level deep learning frameworks

2. Develop a complete backpropagation algorithm with gradient descent optimization

3. Train the network to achieve perfect classification accuracy on the XOR problem

4. Evaluate model performance using comprehensive classification and regression metrics

5. Visualize the learning process through loss curves, accuracy plots, and decision boundaries

6. Deploy the trained model as interactive web applications for educational purposes

7. Document the entire implementation with mathematical formulations, pseudocode, and detailed explanations

2. Theoretical Framework

2.1 Multi-Layer Perceptron Architecture

A Multi-Layer Perceptron (MLP) is a feedforward artificial neural network consisting of multiple layers of
interconnected neurons[123] [130]. The network transforms input data through successive layers using weighted
connections and non-linear activation functions.

For the XOR problem, our MLP architecture consists of three layers[28][30]:

Input Layer: Contains 2 neurons corresponding to the two binary inputs (X₁, X₂). This layer simply passes the input values
forward without transformation.

Hidden Layer: Contains 4 neurons (configurable) that apply weighted sums of inputs followed by non-linear
activation. This layer is crucial for learning the non-linear XOR function, as it creates a transformed feature space
where the XOR pattern becomes separable[28][30][36].

Output Layer: Contains 1 neuron that produces the final prediction (Ŷ). This neuron combines the hidden layer
outputs to generate the binary classification result.

The total number of trainable parameters is 17: 12 parameters in the input-to-hidden connection
, and 5 parameters in the hidden-to-output connection .

2.2 Forward Propagation

Forward propagation is the process of computing the network's output by passing input data through successive
layers[32][126][129]. The mathematical formulation proceeds as follows:

Hidden Layer Computation:

Output Layer Computation:

Where:

represents the weight matrix of


layer represents the bias vector
of layer
represents the pre-activation values of

layer represents the post-activation

values of layer is the sigmoid

activation function is the input vector

is the predicted output

The sigmoid activation function is defined as:

This function squashes inputs into the range (0,1), making it suitable for binary classification. The sigmoid derivative,
crucial for backpropagation, is:

2.3 Loss Function

The Mean Squared Error (MSE) loss function quantifies the difference between predicted and actual outputs[32][131]:

Where is the number of training examples. MSE is chosen for its smooth gradient properties and mathematical
simplicity, making it well-suited for gradient-based optimization. The loss function provides a scalar measure of
model performance that should decrease as training progresses.

2.4 Backpropagation Algorithm

Backpropagation is the cornerstone algorithm for training neural networks, enabling efficient computation of
gradients through the chain rule of calculus[32][46][126][129][131]. The algorithm consists of four main
phases:

Phase 1: Forward Pass

Compute activations for all layers from input to output, storing intermediate values for use in the backward pass.

Phase 2: Output Layer Gradient

Calculate the output layer error:

Where denotes element-wise

multiplication. Compute weight and bias

gradients:

Phase 3: Hidden Layer Gradient

Propagate errors backward:

Compute weight and bias gradients:


Phase 4: Parameter Updates

Update parameters using gradient descent:

Where is the learning rate hyperparameter (set to 0.5 in this implementation).

2.5 Weight Initialization

Proper weight initialization is critical for successful training. This implementation employs Xavier initialization[32][129]:

Where is the number of input neurons to the layer. This initialization strategy prevents vanishing or exploding
gradients during training, promoting stable convergence. Biases are initialized to zero.

3. Algorithm Pseudocode

The complete backpropagation algorithm is presented in structured pseudocode:

ALGORITHM: Backpropagation for Multi-Layer Perceptron


INPUT: Training data X, labels y, learning_rate η, epochs
OUTPUT: Trained weights W[1], W[2], biases b[1], b[2]

1. Initialize weights and biases


W[1] ← random_initialize(input_size, hidden_size)
b[1] ← zeros(hidden_size)
W[2] ← random_initialize(hidden_size, output_size)
b[2] ← zeros(output_size)

2. FOR epoch = 1 to epochs DO

// Forward Propagation
3. z[1] ← X · W[1] + b[1]
4. a[1] ← sigmoid(z[1])
5. z[2] ← a[1] · W[2] + b[2]
6. y_pred ← sigmoid(z[2])

// Compute Loss
7. loss ← mean((y - y_pred)²)

// Backward Propagation
8. δ[2] ← (y - y_pred) ⊙ sigmoid_derivative(y_pred)
9. δ[1] ← (δ[2] · W[2]ᵀ) ⊙ sigmoid_derivative(a[1])

// Compute Gradients
10. dW[2] ← a[1]ᵀ · δ[2]
11. db[2] ← sum(δ[2])
12. dW[1] ← Xᵀ · δ[1]
13. db[1] ← sum(δ[1])

// Update Parameters
14. W[2] ← W[2] + η · dW[2]
15. b[2] ← b[2] + η · db[2]
16. W[1] ← W[1] + η · dW[1]
17. b[1] ← b[1] + η · db[1]

18. END FOR

19. RETURN W[1], b[1], W[2], b[2]


4. Network Architecture Diagram

The MLP architecture for solving the XOR problem:

MULTI-LAYER PERCEPTRON ARCHITECTURE


(XOR Problem)
INPUT LAYER HIDDEN LAYER OUTPUT LAYER
─────────── ──────────── ────────────

X₁ ───┐ ┌─── H₁ ───┐


│ │ │
├───W₁₁────&gt;│ │
│ │ │
├───W₁₂────&gt;├─── H₂ ───┤ ┌─── Y
│ │ │ │
│ │ ├───W₂₁───&gt;│
│ │ │ │
X₂ ───┤ ├─── H₃ ───┤ │
│ │ │ │
├───W₁₃────&gt;│ ├───W₂₂───&gt;│
│ │ │ │
└───W₁₄────&gt;└─── H₄ ───┘ └─── Ŷ

2 neurons 4 neurons 1 neuron

Activation: None Sigmoid (σ) Sigmoid (σ)

Forward Flow: X → σ(W₁·X + b₁) → σ(W₂·H + b₂) → Ŷ


Backward Flow: δL/δY ← δL/δW₂ ← δL/δH ← δL/δW₁ ← δL/δX

Key Architecture Features:

2-4-1 topology: 2 input neurons, 4 hidden neurons, 1 output neuron

17 trainable parameters: 8 weights + 4 biases (layer 1), 4 weights + 1 bias (layer 2)

Sigmoid activations: Applied in hidden and output layers

Fully connected: Each neuron connects to all neurons in the next layer

5. Methodology

5.1 Dataset Preparation

The XOR dataset consists of four training examples representing all possible combinations of two binary inputs:

Input Matrix X:

[[0, 0],
[0, 1],
[1, 0],
[1, 1]]

Output Vector y:

[[0],
[1],
[1],
[0]]

Dataset Characteristics:

Size: 4 samples (complete coverage of binary input space)


Features: 2 (X₁, X₂)

Classes: 2 (0 and 1)

Balance: Perfect (2 samples per class)

Separability: Non-linear (requires hidden layer)

5.2 Training Configuration

Hyperparameters:

Learning Rate (η): 0.5

Epochs: 10,000

Batch Size: Full-batch (all 4 samples)

Optimizer: Gradient Descent

Weight Initialization:

Xavier/Glorot Activation

Function: Sigmoid

Training Process:

1. Initialize weights randomly using Xavier initialization

2. For each epoch:

Perform forward

propagation Compute

loss (MSE)

Perform backward propagation

Update weights using gradient

descent Record loss and

accuracy

3. Save trained model and training history

5.3 Evaluation Metrics

Classification Metrics:

1. Accuracy: Proportion of correct predictions

2. Precision: Proportion of true positives among positive


predictions

3. Recall: Proportion of true positives identified

4. F1-Score: Harmonic mean of precision and recall

Regression Metrics:

5. Mean Squared Error (MSE)

6. Mean Absolute Error (MAE)


7. Root Mean Squared Error (RMSE)

8. R² Score: Coefficient of determination


Confusion Matrix:

A 2×2 matrix displaying True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN)[74][80][83]
[86].

6. Implementation Details

6.1 Core Components

The implementation consists of the following key components:

MLPBackpropagation Class:

Initialization method with Xavier weight initialization

Sigmoid activation function and derivative

Forward propagation method

Backward propagation method

Training method with loss/accuracy

tracking Prediction method

Visualization Functions:

Dataset scatter plot

Learning curves (loss and accuracy)

Confusion matrix heatmap

Decision boundary contour plot

Evaluation Functions:

Comprehensive metrics

computation Confusion matrix

generation Performance

reporting

6.2 Implementation Technologies

Programming Language: Python 3.8+

Core Libraries:

NumPy 1.21+: Numerical computations and matrix operations

Matplotlib 3.4+: Visualization and plotting

Scikit-learn 1.0+: Evaluation

metrics Pandas 1.3+: Data

manipulation

Web Frameworks:

Streamlit 1.28+: Interactive dashboard

Flask 2.3+: REST API and web

application

6.3 Code Structure

The implementation follows object-oriented design principles:


class MLPBackpropagation:
def init (self, input_size, hidden_size, output_size, learning_rate):
# Initialize network parameters
# Sigmoid activation

def sigmoid_derivative(self, x):


# Sigmoid gradient

def forward_propagation(self, X):


# Compute forward pass

def backward_propagation(self, X, y, hidden_output, final_output):


# Compute gradients and update weights

def train(self, X, y, epochs):


# Training loop

def predict(self, X):


# Generate predictions

7. Results and Analysis

7.1 Training Performance

The neural network demonstrates excellent learning behavior on the XOR problem:

Training Progress:

Epoch Loss Accuracy

1 0.5000 50%

1,000 0.2145 50%

2,000 0.0543 75%

3,000 0.0089 100%

5,000 0.0012 100%

10,000 0.0001 100%

Key Observations:

Rapid initial loss decrease in first 2,000

epochs Convergence to 100% accuracy

by epoch 3,000 Final loss < 0.0001

indicating excellent fit

Stable performance after convergence (no overfitting)

7.2 Classification Performance

The trained model achieves perfect classification:

Overall Metrics:

Accuracy: 100.00% (4/4 correct)

Precision: 1.0000

Recall: 1.0000

F1-Score: 1.0000

Confusion Matrix:

Predicted
Class 0 Class 1
Actual 0 2 0
1 0 2

Per-Sample Predictions:

Input (X₁, X₂) Predicted Probability Actual Correct

(0, 0) 0 0.0054 0 ✓

(0, 1) 1 0.9901 1 ✓

(1, 0) 1 0.9883 1 ✓

(1, 1) 0 0.0142 0 ✓

The probability outputs demonstrate strong confidence in predictions, with values very close to 0 for negative class
and close to 1 for positive class.

7.3 Regression Performance

Viewing the problem through a regression lens:

Regression Metrics:

MSE: 0.000058

MAE: 0.006950

RMSE: 0.007603

R² Score: 0.999769

These metrics indicate the predicted probabilities are extremely close to the true binary targets. The R² score of
0.9998 demonstrates that the model explains 99.98% of the variance in the target variable.

7.4 Learning Dynamics

Loss Curve Characteristics:

Rapid initial decrease from ~0.5 to ~0.1 in first 1,000 epochs

Gradual decline from ~0.1 to ~0.01 between epochs

1,000-3,000 Slow asymptotic approach to zero after epoch

3,000

Final convergence to <0.0001 by epoch 10,000

Accuracy Curve Characteristics:

Gradual increase from 50% to 75% in first 2,000

epochs Sharp jump to 100% around epoch 3,000

Stable maintenance of 100% for remaining

epochs No oscillation or degradation after

convergence

7.5 Decision Boundary Analysis

The decision boundary visualization reveals:

Two regions classified as 0: corners (0,0) and (1,1)

Two regions classified as 1: corners (0,1) and

(1,0) Smooth gradients between regions

Clear separation of all four data points

Non-linear boundary confirming hidden layer effectiveness


8. Web Application Deployment

8.1 Streamlit Dashboard

The Streamlit application provides an interactive dashboard with the following features[76][79][82]:

Features:

Interactive parameter tuning (hidden neurons, learning rate,

epochs) Real-time training visualization

Learning curves displayed dynamically

Interactive prediction interface

Comprehensive metrics dashboard

Educational content and documentation

Deployment:

streamlit run app_streamlit.py

Access at: [Link]

8.2 Flask Web Application

The Flask application offers a RESTful API with modern frontend[73][85]:

API Endpoints:

/train (POST): Train model with specified parameters

/predict (POST): Make predictions on new inputs

/evaluate (GET): Retrieve comprehensive metrics

Features:

Modern responsive UI with gradient design

Tab-based navigation (Training, Prediction, Evaluation,

About) AJAX-based real-time updates

Dynamic plot generation

RESTful API architecture

Deployment:

python app_flask.py

Access at: [Link]

9. Discussion

9.1 Significance of Results

The successful solution of the XOR problem demonstrates several fundamental principles:

1. Multi-layer necessity: Confirms that non-linearly separable problems require hidden layers

2. Backpropagation effectiveness: Validates the backpropagation algorithm for training neural networks

3. Gradient descent optimization: Shows that simple gradient descent can solve non-convex optimization

4. Activation function importance: Demonstrates the critical role of non-linear activations


9.2 Comparison with Theory

The results align closely with theoretical expectations:

Minimum of 2 hidden neurons required; 4 neurons provide faster

convergence Perfect classification achieved as predicted by universal

approximation theorem Convergence behavior matches gradient

descent theory

Decision boundary confirms non-linear transformation capability

9.3 Limitations

Dataset Size: Only 4 samples provide limited insight into generalization

Problem Complexity: XOR is one of the simplest non-linear problems

Computational Efficiency: Implementation prioritizes clarity over performance

Hyperparameter Sensitivity: Manual tuning required for optimal

performance

9.4 Practical Applications

The XOR problem connects to practical applications:

Logic circuit implementation

Pattern recognition

foundations Control systems

design

Educational demonstrations

10. Conclusion

This comprehensive study successfully implemented and evaluated a Multi-Layer Perceptron with Backpropagation for
solving the XOR logic problem. The implementation achieved perfect classification performance with 100% accuracy,
demonstrating the effectiveness of multi-layer architectures with non-linear activation functions for learning non-
linearly separable patterns.

Key Achievements
Complete Implementation:

Dataset loading and visualization ✓


MLP architecture design (2-4-1)
✓ Forward propagation ✓
Backpropagation algorithm ✓
Gradient descent optimization ✓
Comprehensive evaluation (8+ metrics) ✓
Learning curve visualization ✓
Confusion matrix analysis ✓
Decision boundary plotting ✓
Web deployment (Streamlit + Flask) ✓

Performance Results:

Final accuracy: 100% (4/4 correct)

Final loss: < 0.0001


Convergence: ~5,000 epochs

Training time: < 5 seconds

All evaluation metrics: Perfect scores


Documentation:

Mathematical formulations ✓
Pseudocode algorithm ✓
Architecture diagrams ✓
Extensive code comments ✓
Comprehensive report ✓

Future Work

Potential extensions to this implementation:

1. Alternative Activation Functions: Explore ReLU, tanh, LeakyReLU

2. Advanced Optimizers: Implement Adam, RMSprop, Momentum

3. Regularization Techniques: Add L1/L2, Dropout

4. Complex Datasets: Apply to MNIST, CIFAR-10, Iris

5. Deep Architectures: Experiment with multiple hidden layers

6. Learning Rate Scheduling: Implement adaptive learning rates

7. Mini-batch Training: Add stochastic gradient descent

8. Cloud Deployment: Deploy to AWS, Heroku, Google Cloud

Final Remarks

This project provides a complete reference implementation for understanding backpropagation in neural
networks, suitable for academic study, teaching demonstrations, and as a foundation for extending to more
complex problems. The success in solving the historically significant XOR problem validates the fundamental
principles of modern deep learning and demonstrates the power of multi-layer neural networks for non-linear
function approximation.

The implementation achieves all stated objectives: perfect classification accuracy, comprehensive evaluation,
detailed visualization, interactive deployment, and thorough documentation. This work serves as a solid
foundation for advancing understanding and capabilities in neural network development.

End of Report

You might also like