0% found this document useful (0 votes)

15 views11 pages

MLP for MNIST Handwritten Digit Classification

Uploaded by

abdelrahman.oshebaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

MLP for MNIST Handwritten Digit Classification

Uploaded by

abdelrahman.oshebaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction

Handwritten digit classification stands as a foundational challenge in machine learning tasks and computer vision, with
transformative applications ranging from automated postal services to digitized document processing.(3) The Modified
National Institute of Standards and Technology (MNIST) dataset, it was introduced by LeCun et al. (1998), has served as a
benchmark for over two decades, offering 70,000 grayscale images of handwritten digits ranging from 0 to 9 that enclose
the complexities of real-world handwriting variations.(3) This task demands robust models capable of capturing complex
spatial patterns while remaining computationally tractable.(11)
Single layer perceptron has only the ability to classify the linearly sparable patterns and non-linearly separable cannot be
classified so from here originated the idea of Multi-Layer Perceptron's (MLPs), a class of fully connected feedforward
neural networks, emerged as an early and enduring solution(1) to this problem. By using non-linear activation function and
hierarchical layers, Multi-Layer Perceptron (MLPs) map flattened pixel intensities (784 inputs) to digit class probabilities
of 10 outputs from 0 to 9, learning through backpropagation to minimize prediction errors.(1)(3) Originated by (Rumelhart
et al. (1986)), backpropagation enabled Multi-Layer Perceptron (MLPs) to approximate complex functions, that establish
their usefulness in digits recognition.(1) although Convolutional Neural Networks (CNNs) later surpassed MLPs in accuracy
by exploiting spatial hierarchies (LeCun et al., 1998), but MLPs remain a pivotal in understanding neural network
fundamentals and benchmarking algorithmic innovation.(11)

Background
Multi-Layer Perceptron for MNIST Handwritten Digit Classification is a essential computer vision and machine learning
problem and has uses in postal automation sorting, bank check processing, and optical character recognition.(3)

A Multi-Layer Perceptron (MLP) is a fully connected feedforward network of neural type which is designed to mimic non-
linear relationships by way of its input, hidden, and output layer structure. Using non-linear activation functions (such as
sigmoid, tanh, or ReLU (in our case we have used ReLU because ReLU is fast, easy, adds non-linearity, has no vanishing
gradients, and promotes sparse, efficient activation.)), MLPs can learn sophisticated patterns in data. For MNIST, an MLP
accepts flattened pixel values (784-dimensional input vector) and produces probabilities for one of the 10 digits classes each
class is a number between 0 and 9.(3)

The network can learn by weights updating via backpropagation, a form of forms of optimization that decreases prediction
error through gradient descent. Multi-layer Perceptron (MLPs) are best applicable for MNIST as they have the capability to
express non-linear patterns but possess limitations such as being computationally costly, being hyperparameter settings-
sensitive (e.g., number of layers, learning rate, neurons), and have very poor chances of overfitting.(5)(7) In spite of these
limitations, MLPs are a starting point for neural networks and a basis for comparison to more sophisticated approaches such
as CNNs. (11)
This project involves using an MLP to identify handwritten digits in the MNIST dataset. The goal is to investigate the
impact of architectural choices, activation functions. By analyzing accuracy and training dynamics, the study aims to
demonstrate the feasibility of MLPs in this task and how to maximize their performance.(3)

Literature Review
The MNIST dataset, introduced by LeCun et al. (1998), revolutionized handwritten digit classification by providing a
standardized benchmark of 70,000 grayscale images (28x28 pixels) of digits 0–9. This dataset became a cornerstone for
evaluating machine learning models due to its accessibility and representation of real-world handwriting variations. While

1
originally designed to test convolutional neural networks (CNNs), MNIST also facilitated the exploration of Multi-Layer
Perceptrons (MLPs), establishing a baseline for comparing architectural innovations in neural networks. (3)

Rumelhart et al. (1986) pioneered backpropagation as a training algorithm for neural networks, enabling MLPs to learn
complex mappings through gradient descent. By minimizing prediction errors iteratively, this work laid the foundation for
applying MLPs to digit recognition and other pattern recognition tasks. Backpropagation became a cornerstone of neural
network training, allowing MLPs to model non-linear relationships in data through hidden layers.(1)

LeCun et al. (1989) demonstrated early applications of neural networks to handwritten character recognition, exploring both
MLPs and CNNs. Their work highlighted the potential of neural networks to automate digit classification, though
computational limitations of the era constrained scalability. This study set the stage for later advancements in optimizing
MLPs for image-based tasks like MNIST.(2)

In their 1998 work, LeCun et al. developed LeNet-5, a CNN architecture that achieved state-of-the-art accuracy on MNIST.
While LeNet-5 showcased the superiority of CNNs for spatial data, it also underscored MLPs’ role as a simpler, more
interpretable alternative for educational purposes and baseline comparisons.(3)

Simard et al. (2003) investigated preprocessing techniques such as normalization and deskewing to enhance MLP
performance on MNIST. Their findings revealed that while MLPs benefited from these methods, their dense connectivity
led to higher computational costs compared to CNNs, emphasizing the trade-off between simplicity and efficiency.(8)

Glorot and Bengio (2010) addressed training challenges in deep MLPs by proposing Xavier initialization, a weight
initialization method that stabilized gradient flow during backpropagation. This innovation mitigated vanishing/exploding
gradient issues, making MLPs more viable for training on datasets like MNIST and enabling deeper architectures.(5)

Kingma and Ba (2014) introduced the Adam optimizer, an adaptive learning rate algorithm that accelerated MLP
convergence on MNIST. By dynamically adjusting learning rates based on gradient moments, Adam improved training
efficiency and reliability, becoming a widely adopted optimization tool for neural networks.(10)
Goodfellow et al. (2016) critiqued MLPs’ limitations in handling high-dimensional data like images, noting their dense
connectivity led to parameter explosion and inefficiency compared to CNNs. Their analysis highlighted MLPs’ role as a
foundational model rather than a state-of-the-art solution for tasks requiring spatial invariance.(11)

Methodology
The Multi-Layer Perceptron (MLPs)learns through an iterative process that involves:
1. Forward propagation.
2. Loss calculation.
3. Backward propagation (gradient computation).
4. Parameter updates using Adam optimization.3
5. Performance evaluation.

Workflow and Mathematical Intuition

Epoch Initialization
2
Randomizes sample order to prevent sequence bias
Ensures different batch compositions each epoch
Batch Processing (Mini-Batch)
Divides 60,000 training samples into 1,875 batches (with batch_size=32)
A. Forward Pass
Layer-by-layer calculation
Dense Layer
Z = XW+b
ReLU Activation
A = max(0, Z)
Softmax Activation (final layer)

𝑒 𝑍𝑖
∑ 𝑒 𝑍𝑗
Output: Probability distribution for each of 10 classes

B. Loss Calculation
Categorical Cross-Entropy:
1
L = -𝑁 ∑𝑁 𝐶
𝑖=1 ∑𝑐=1 𝑦𝑖, 𝑐 log (𝑝𝑖, 𝑐)

Measures difference between predicted probabilities and true labels

C. Backward Pass
Output Layer Gradient
𝜕𝐿
=Ŷ−𝑌
𝜕𝑍 [𝐿]
Simplified derivative for softmax + cross-entropy:
Hidden Layer Gradients:
ReLU derivative:

3
𝜕𝐴
= 𝑖𝑓 𝑍 > 0 𝑜𝑢𝑡𝑝𝑢𝑡 1 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 0
𝜕𝑍
Chain rule application:

𝜕𝐿 1 [𝑙−1] 𝜕𝐿
= 𝐴
𝜕𝑊 [𝑙] 𝑚 𝜕𝑍 [𝑙]
D. Parameter Updates (Adam Optimization)
Update biased moment estimates
𝑚𝑡 = 𝛽1 . 𝑚𝑡−1 + (1 − 𝛽1 ). 𝑔𝑡

𝑢𝑡 = 𝛽2 . 𝑢𝑡−1 + (1 − 𝛽2 ). 𝑔𝑡2
Compute bias-corrected estimates
𝑚𝑡
𝑚^ 𝑡 =
1 − 𝛽1𝑡

Update parameters

Implementation
import numpy as np
import [Link] as plt
from [Link] import fetch_openml
from [Link] import OneHotEncoder
from sklearn.model_selection import train_test_split
from [Link] import confusion_matrix
import seaborn as sns
mnist = fetch_openml('mnist_784', version=1, parser='auto')
X = mnist["data"].to_numpy() / 255.0
y = mnist["target"].to_numpy().astype(int).reshape(-1, 1)

encoder = OneHotEncoder(sparse_output=False)
y = encoder.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)
class DenseLayer:
def __init__(self, n_inputs, n_neurons, activation=None):
[Link] = 0.01 * [Link](n_inputs, n_neurons)
[Link] = [Link]((1, n_neurons))
[Link] = activation

4
def forward(self, inputs):
[Link] = inputs
[Link] = [Link](inputs, [Link]) + [Link]
if [Link] == 'relu':
[Link] = [Link](0, [Link])
elif [Link] == 'softmax':
exp_values = [Link]([Link] - [Link]([Link], axis=1,
keepdims=True))
[Link] = exp_values / [Link](exp_values, axis=1, keepdims=True)
return [Link]

class NeuralNetwork:
def __init__(self):
[Link] = []

def add_layer(self, layer):

[Link](layer)

def forward(self, X):

for layer in [Link]:
X = [Link](X)
return X

def predict(self, X):

return [Link]([Link](X), axis=1)

class AdamOptimizer:
def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
[Link] = learning_rate
self.beta1 = beta1
self.beta2 = beta2
[Link] = epsilon
self.m = {}
self.v = {}
self.t = 0

def update(self, layers):

self.t += 1
for i, layer in enumerate(layers):
if i not in self.m:
self.m[i] = np.zeros_like([Link])
self.v[i] = np.zeros_like([Link])

self.m[i] = self.beta1 * self.m[i] + (1 - self.beta1) * [Link]

self.v[i] = self.beta2 * self.v[i] + (1 - self.beta2) * ([Link]
** 2)

5
m_hat = self.m[i] / (1 - self.beta1 ** self.t)
v_hat = self.v[i] / (1 - self.beta2 ** self.t)

[Link] -= [Link] * m_hat / ([Link](v_hat) + [Link])

[Link] -= [Link] * [Link]

def categorical_crossentropy(y_true, y_pred):

epsilon = 1e-15
y_pred = [Link](y_pred, epsilon, 1 - epsilon)
return -[Link](y_true * [Link](y_pred))

model = NeuralNetwork()
model.add_layer(DenseLayer(784, 128, activation='relu'))
model.add_layer(DenseLayer(128, 64, activation='relu'))
model.add_layer(DenseLayer(64, 10, activation='softmax'))

optimizer = AdamOptimizer(learning_rate=0.001)
epochs = 10
batch_size = 32
history = {'train_acc': [], 'test_acc': [], 'loss': []}
for epoch in range(epochs):
permutation = [Link](X_train.shape[0])
X_train_shuffled = X_train[permutation]
y_train_shuffled = y_train[permutation]
epoch_loss = []

for i in range(0, X_train.shape[0], batch_size):

X_batch = X_train_shuffled[i:i+batch_size]
y_batch = y_train_shuffled[i:i+batch_size]

output = [Link](X_batch)
loss = categorical_crossentropy(y_batch, output)
epoch_loss.append(loss)

error = output - y_batch

[Link][-1].dweights = [Link]([Link][-2].output.T, error)
[Link][-1].dbiases = [Link](error, axis=0, keepdims=True)

for l in range(len([Link])-2, -1, -1):

error = [Link](error, [Link][l+1].weights.T) *
([Link][l].output > 0)
inputs = X_batch if l == 0 else [Link][l-1].output
[Link][l].dweights = [Link](inputs.T, error)
[Link][l].dbiases = [Link](error, axis=0, keepdims=True)

[Link]([Link])

6
history['loss'].append([Link](epoch_loss))

train_preds = [Link](X_train)
test_preds = [Link](X_test)
history['train_acc'].append([Link](train_preds == [Link](y_train, axis=1)))
history['test_acc'].append([Link](test_preds == [Link](y_test, axis=1)))

print(f"Epoch {epoch+1}/{epochs} - Loss: {history['loss'][-1]:.4f} | "

f"Train Acc: {history['train_acc'][-1]:.4f} | Test Acc:
{history['test_acc'][-1]:.4f}")

test_predictions = [Link](X_test)
final_accuracy = [Link](test_predictions == [Link](y_test, axis=1))
print(f"\nFinal Test Accuracy: {final_accuracy:.4f}")
def plot_metrics(history):
[Link](figsize=(15, 5))

[Link](1, 2, 1)
[Link](history['train_acc'], label='Train Accuracy', marker='o')
[Link](history['test_acc'], label='Test Accuracy', marker='o')
[Link]('Accuracy Evolution', fontsize=14)
[Link]('Epochs'), [Link]('Accuracy')
[Link](), [Link](True)
[Link](range(epochs), range(1, epochs+1))

[Link](1, 2, 2)
[Link](history['loss'], label='Loss', color='red', marker='o')
[Link]('Training Loss', fontsize=14)
[Link]('Epochs'), [Link]('Loss')
[Link](), [Link](True)
[Link](range(epochs), range(1, epochs+1))

plt.tight_layout()
[Link]('training_metrics.png', dpi=300)
[Link]()
plot_metrics(history)
def plot_samples(X, y, preds, num=12):
[Link](figsize=(15, 7))
[Link]('Sample Predictions', fontsize=16)
indices = [Link](len(X), num)

for i, idx in enumerate(indices):

[Link](3, 4, i+1)
[Link](X[idx].reshape(28, 28), cmap='gray')
[Link](f"Pred: {preds[idx]}\nTrue: {[Link](y[idx])}",
color='green' if preds[idx] == [Link](y[idx]) else 'red')
[Link]('off')

7
plt.tight_layout()
[Link]('sample_predictions.png', dpi=300)
[Link]()
plot_samples(X_test, y_test, test_predictions)
def visualize_pixel_values(index=1):
img = X_test[index].reshape(28, 28)
true_label = [Link](y_test[index])
pred_label = [Link](X_test[index:index+1])[0]

fig = [Link](figsize=(8,8))
ax = fig.add_subplot(111)
[Link](img, cmap='gray')
ax.set_title(f"True: {true_label} | Pred: {pred_label}", fontsize=14)

width, height = [Link]

thresh = [Link]()/2.5

for x in range(width):
for y in range(height):
val = round(img[x][y], 2) if img[x][y] != 0 else 0
[Link](str(val), xy=(y,x),
horizontalalignment='center',
verticalalignment='center',
color='white' if img[x][y]<thresh else 'black',
fontsize=8)
[Link]('off')
[Link]()
visualize_pixel_values(index=1)
cm = confusion_matrix([Link](y_test, axis=1), test_predictions)
[Link](figsize=(12, 10))
[Link](cm, annot=True, fmt='d', cmap='Blues', cbar=False)
[Link]('Confusion Matrix', fontsize=16)
[Link]('Predicted'), [Link]('True')
[Link]('confusion_matrix.png', dpi=300)
[Link]()
[Link]('mnist_model.npz',
weights0=[Link][0].weights,
biases0=[Link][0].biases,
weights1=[Link][1].weights,
biases1=[Link][1].biases,
weights2=[Link][2].weights,
biases2=[Link][2].biases)

Results
The model achieved a final test accuracy of 97.10% and a cross-entropy loss of 0.0025. Training accuracy plateaued at
97.97%.
8
Figure 1. Shows the training and test metrics and the Training loss over 10 epochs.

Figure 2. Sample Predictions on the Test Set.

Each image shows a digit from the test set along with its predicted and true labels. Most predictions match the
true labels, indicating good model performance.

9
Figure 3. Confusion Matrix of digit classifier

Each row represents the actual digit, and each column represents the predicted digit. Diagonal values show correct
predictions, while off-diagonal values indicate misclassification.

Figure 4. Pixel Intensity Heatmap of a Digit "4".

This visualization shows grayscale intensity values (from 0 to 1) of each pixel in the image. The model correctly
predicted the digit as 4.

Conclusion
The Multi-Layer Perceptrons (MLPs) has achieved consistent approach in training and achieved test accuracy of
97.10%. Using Adam's optimizer and categorical cross-entropy loss function, the training and test metrics trained
over 10 epochs, shows efficient gradient update as well as consistent learning dynamics. The close alignment of
10
training accuracy (97.97%) and test accuracy suggests that the structure with 2 hidden layers one consisting of
128 neurons and another with 64 neurons and ReLU activation functions captures a balance between
generalization and capacity. Such performance suggests that the effective techniques like adaptive moment
estimation and softmax-based probability corrections for digit classification.

References
1. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back propagating errors.
Nature.
2. LeCun, Y., Boser, B., Denker, J. S., et al. (1989). Backpropagation applied to handwritten zip code recognition.
Neural Computation.
3. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document
recognition. Proceedings of the IEEE.
4. Simard, P. Y., Steinkraus, D., & Platt, J. C. (2003). Best practices for convolutional neural networks applied to
visual document analysis. ICDAR.
5. Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks.
AISTATS.
6. Tang, Y. (2013). Deep learning using linear support vector machines. ICML Workshop.
7. Srivastava, N., Hinton, G., Krizhevsky, A., et al. (2014). Dropout: A simple way to prevent neural networks from
overfitting. JMLR.
8. Simard, P. Y., Steinkraus, D., & Platt, J. C. (2003). Best practices for convolutional neural networks applied to
visual document analysis. ICDAR.
9. Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. NeurIPS.
10. Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. ICLR.
11. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
12. Cireşan, D., Meier, U., & Schmidhuber, J. (2012). Multi-column deep neural networks for image classification.
CVPR.

Multi-Layer Perceptron for MNIST
No ratings yet
Multi-Layer Perceptron for MNIST
24 pages
Case Study ML
No ratings yet
Case Study ML
7 pages
Digit Recognition with MLP Neural Network
No ratings yet
Digit Recognition with MLP Neural Network
12 pages
Understanding Multi-layer Perceptrons
No ratings yet
Understanding Multi-layer Perceptrons
25 pages
Introduction to ANNs with Keras
No ratings yet
Introduction to ANNs with Keras
66 pages
DL Lab Manual Full
No ratings yet
DL Lab Manual Full
33 pages
Understanding Multi-Layer Perceptron (MLP)
No ratings yet
Understanding Multi-Layer Perceptron (MLP)
11 pages
Neural Network Architectures Explained
No ratings yet
Neural Network Architectures Explained
18 pages
Assign Men 3
No ratings yet
Assign Men 3
5 pages
Deep Learning Techniques Overview
No ratings yet
Deep Learning Techniques Overview
123 pages
Evolution of Deep Learning Milestones
No ratings yet
Evolution of Deep Learning Milestones
50 pages
Machine Learning: Linear Models & MLPs
No ratings yet
Machine Learning: Linear Models & MLPs
17 pages
Introduction to Deep Learning Concepts
No ratings yet
Introduction to Deep Learning Concepts
78 pages
ANN for MNIST Handwritten Digit Recognition
No ratings yet
ANN for MNIST Handwritten Digit Recognition
5 pages
Implementing MLPs with Keras
100% (1)
Implementing MLPs with Keras
61 pages
Perceptron & MLP Implementation Guide
No ratings yet
Perceptron & MLP Implementation Guide
10 pages
ANN MLP Backpropagation Lab Report
No ratings yet
ANN MLP Backpropagation Lab Report
20 pages
MNIST Digit Recognition with Deep Learning
No ratings yet
MNIST Digit Recognition with Deep Learning
7 pages
A Simple Overvi-WPS Office
No ratings yet
A Simple Overvi-WPS Office
8 pages
Sec ML Week07 Multi Layer Perceptron
No ratings yet
Sec ML Week07 Multi Layer Perceptron
20 pages
Perceptron Basics in Neural Networks
No ratings yet
Perceptron Basics in Neural Networks
26 pages
Perceptron MNIST Digit Recognition Accelerator
No ratings yet
Perceptron MNIST Digit Recognition Accelerator
5 pages
Limitations of Single Layer Perceptron
No ratings yet
Limitations of Single Layer Perceptron
17 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
5 pages
Multi-Layer Perceptron Implementation Report
No ratings yet
Multi-Layer Perceptron Implementation Report
17 pages
MLP Architecture and Training Insights
No ratings yet
MLP Architecture and Training Insights
50 pages
Multi-Layer Perceptron Tutorial
No ratings yet
Multi-Layer Perceptron Tutorial
87 pages
OCR with Neural Networks and KNN Techniques
No ratings yet
OCR with Neural Networks and KNN Techniques
21 pages
22bit084 Final
No ratings yet
22bit084 Final
12 pages
Mlunit2 260211 094803
No ratings yet
Mlunit2 260211 094803
39 pages
Multilayer Perceptron Network Analysis
No ratings yet
Multilayer Perceptron Network Analysis
7 pages
MLP Implementation with TensorFlow
No ratings yet
MLP Implementation with TensorFlow
38 pages
Neural Ass
No ratings yet
Neural Ass
14 pages
Multi-Layer Perceptron Explained
No ratings yet
Multi-Layer Perceptron Explained
34 pages
Deep Learning and Reinforcement Learning Overview
No ratings yet
Deep Learning and Reinforcement Learning Overview
15 pages
Training CNNs for Handwritten Digit Recognition
No ratings yet
Training CNNs for Handwritten Digit Recognition
17 pages
Neural Networks for Handwritten Digit Recognition
No ratings yet
Neural Networks for Handwritten Digit Recognition
3 pages
Unit2 NNDL RSK
No ratings yet
Unit2 NNDL RSK
19 pages
Neural Networks Overview and Applications
No ratings yet
Neural Networks Overview and Applications
65 pages
Single Layer Perceptrons Overview
No ratings yet
Single Layer Perceptrons Overview
9 pages
Understanding Multi-Layer Perceptron (MLP)
No ratings yet
Understanding Multi-Layer Perceptron (MLP)
68 pages
Machine Learning: Neural Networks Overview
No ratings yet
Machine Learning: Neural Networks Overview
19 pages
Multilayer Perceptrons Overview
No ratings yet
Multilayer Perceptrons Overview
15 pages
Introduction to Deep Learning Concepts
No ratings yet
Introduction to Deep Learning Concepts
18 pages
Overview of Multilayer Perceptron Algorithm
0% (1)
Overview of Multilayer Perceptron Algorithm
3 pages
Perceptron: Foundations and Applications
No ratings yet
Perceptron: Foundations and Applications
13 pages
DL Cheat Sheet
No ratings yet
DL Cheat Sheet
57 pages
DL - Unit II
No ratings yet
DL - Unit II
84 pages
Artificial Neural Networks Overview
No ratings yet
Artificial Neural Networks Overview
14 pages
Multi-Layer Perceptron Overview Guide
No ratings yet
Multi-Layer Perceptron Overview Guide
15 pages
Understanding Deep Feedforward Networks
No ratings yet
Understanding Deep Feedforward Networks
47 pages
Multi-layer Perceptron Explained
No ratings yet
Multi-layer Perceptron Explained
202 pages
Lecture1-Review of Machine Learning and Introduction To Deep Learning
No ratings yet
Lecture1-Review of Machine Learning and Introduction To Deep Learning
28 pages
Understanding Multilayer Perceptrons
No ratings yet
Understanding Multilayer Perceptrons
3 pages
Understanding Perceptrons and Deep Learning
No ratings yet
Understanding Perceptrons and Deep Learning
39 pages
Associative Memory Networks Overview
No ratings yet
Associative Memory Networks Overview
26 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
18 pages
Ai Assignment
No ratings yet
Ai Assignment
6 pages
EE952 Exam Overview and Questions
No ratings yet
EE952 Exam Overview and Questions
2 pages
High-Paying PCM Career Options
No ratings yet
High-Paying PCM Career Options
11 pages
Python Trading Simulation with ML
No ratings yet
Python Trading Simulation with ML
5 pages
From Promise To Practice Towards The Realisation
No ratings yet
From Promise To Practice Towards The Realisation
12 pages
Car Damage Detection with CNNs
No ratings yet
Car Damage Detection with CNNs
4 pages
TensorFlow: A Comprehensive Overview
No ratings yet
TensorFlow: A Comprehensive Overview
75 pages
Autoencoders in Deep Learning
No ratings yet
Autoencoders in Deep Learning
28 pages
Architectural Drawings via Machine Learning
No ratings yet
Architectural Drawings via Machine Learning
10 pages
Azure Machine Learning Use Cases and Services
No ratings yet
Azure Machine Learning Use Cases and Services
10 pages
Transformer-Based Active Learning For Multi-Class
No ratings yet
Transformer-Based Active Learning For Multi-Class
21 pages
Future of AI: Innovations and Challenges
No ratings yet
Future of AI: Innovations and Challenges
2 pages
Machine Learning for Sleep Disorder Classification
No ratings yet
Machine Learning for Sleep Disorder Classification
12 pages
Insurance Fraud Detection with ML
No ratings yet
Insurance Fraud Detection with ML
54 pages
Personalized Travel Recommendation System
No ratings yet
Personalized Travel Recommendation System
24 pages
Deep Learning Lab Manual 2024-25
No ratings yet
Deep Learning Lab Manual 2024-25
23 pages
Machine-Learning Techniques For Predictive Analytics
No ratings yet
Machine-Learning Techniques For Predictive Analytics
53 pages
A Dual-Path Attention Fourier Convolutional Network For Human Motion Prediction
No ratings yet
A Dual-Path Attention Fourier Convolutional Network For Human Motion Prediction
5 pages
HKUST Engineering Undergraduate Admissions
No ratings yet
HKUST Engineering Undergraduate Admissions
15 pages
Data Mining and Predictive Analytics Guide
No ratings yet
Data Mining and Predictive Analytics Guide
17 pages
IJAISC: AI and Soft Computing Research
No ratings yet
IJAISC: AI and Soft Computing Research
2 pages
Scaler Neovarsity Master's Program Overview
No ratings yet
Scaler Neovarsity Master's Program Overview
25 pages
AI in Laboratory Automation Explained
No ratings yet
AI in Laboratory Automation Explained
5 pages
AI Solutions Architect in Hyderabad
0% (1)
AI Solutions Architect in Hyderabad
2 pages
Python Ensemble Learning Techniques
100% (2)
Python Ensemble Learning Techniques
21 pages
AI Adoption Challenges in Finance
No ratings yet
AI Adoption Challenges in Finance
10 pages
Data Science Bachelor's Curriculum 2023
No ratings yet
Data Science Bachelor's Curriculum 2023
6 pages
Machine Learning: Definitions and Applications
No ratings yet
Machine Learning: Definitions and Applications
2 pages

MLP for MNIST Handwritten Digit Classification

Uploaded by

MLP for MNIST Handwritten Digit Classification

Uploaded by

Introduction

Workflow and Mathematical Intuition

Measures difference between predicted probabilities and true labels

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

def add_layer(self, layer):

def forward(self, X):

def predict(self, X):

def update(self, layers):

self.m[i] = self.beta1 * self.m[i] + (1 - self.beta1) * [Link]

[Link] -= [Link] * m_hat / ([Link](v_hat) + [Link])

def categorical_crossentropy(y_true, y_pred):

for i in range(0, X_train.shape[0], batch_size):

error = output - y_batch

for l in range(len([Link])-2, -1, -1):

print(f"Epoch {epoch+1}/{epochs} - Loss: {history['loss'][-1]:.4f} | "

for i, idx in enumerate(indices):

width, height = [Link]

Figure 2. Sample Predictions on the Test Set.

Figure 4. Pixel Intensity Heatmap of a Digit "4".

You might also like