MLP for MNIST in TensorFlow 2

The document outlines a Python script that implements a multi-layer perceptron (MLP) using TensorFlow 2 for classifying handwritten digits from the MNIST dataset. It details the model architecture, training process, and evaluation metrics, achieving a test accuracy of approximately 97-98%. Additionally, it discusses data preprocessing, the use of the Adam optimizer, and visualizes training loss over epochs.

Uploaded by

palanivel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views10 pages

MLP for MNIST in TensorFlow 2

Uploaded by

palanivel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Multi-Layer Perceptron (MLP) Implementation in

TensorFlow 2

This Python script implements a multi-layer perceptron (MLP), a type of

feedforward neural network, using TensorFlow 2 to perform multi-class
classification on the MNIST dataset of handwritten digits (0-9). The MLP
consists of two hidden layers and an output layer, making it more complex than
a single-layer perceptron, allowing it to capture non-linear patterns in the data.
Below is a detailed breakdown of the code, its components, and the rationale
behind its design.
1. Importing Libraries
import tensorflow as tf
import numpy as np
import [Link] as plt
from [Link] import mnist
• tensorflow: The core library for building, training, and evaluating neural
networks in TensorFlow 2.
• numpy: Provides efficient numerical operations for array manipulation,
used here for preprocessing and data handling.
• [Link]: Used to visualize the training loss over epochs, aiding
in performance analysis.
• mnist: A built-in TensorFlow module providing access to the MNIST
dataset, which contains 60,000 training and 10,000 test grayscale images
(28x28 pixels) of handwritten digits (0-9).
2. Loading and Preprocessing the Data
mnist = [Link]
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(-1, 784).astype("float32") / 255.0

x_test = x_test.reshape(-1, 784).astype("float32") / 255.0
y_train = [Link].to_categorical(y_train, 10)
y_test = [Link].to_categorical(y_test, 10)
• Loading the MNIST Dataset:
o The mnist.load_data() function returns two tuples: (x_train,
y_train) for training (60,000 samples) and (x_test, y_test) for
testing (10,000 samples).
o Each image is a 28x28 grayscale array, and each label is an integer
(0-9) representing the digit.
• Preprocessing the Input Data (x_train, x_test):
o reshape(-1, 784): Flattens each 28x28 image into a 1D array of 784
values (28 × 28 = 784) to match the input requirements of a fully
connected layer.
o astype("float32"): Converts pixel values from integers to 32-bit
floats for numerical stability during gradient computations.
o / 255.0: Normalizes pixel values from the range [0, 255] to [0, 1],
which improves training convergence by scaling inputs to a
consistent range.
• Preprocessing the Labels (y_train, y_test):
o to_categorical(y_train, 10): Converts integer labels (e.g., 5) into
one-hot encoded vectors (e.g., [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]). This is
necessary for multi-class classification with categorical cross-
entropy loss.
• Purpose of Preprocessing:
o Flattening ensures compatibility with the dense layers.
o Normalization and type conversion enhance training stability and
efficiency.
o One-hot encoding aligns the labels with the model’s output format
(probabilities for 10 classes).
3. Defining Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 100
n_hidden_1 = 256 # 1st layer num features
n_hidden_2 = 256 # 2nd layer num features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
• learning_rate = 0.001: Controls the step size of weight updates during
optimization. A smaller learning rate ensures stable convergence but
may require more epochs.
• training_epochs = 20: Specifies 20 complete passes through the training
dataset, balancing training time and model performance.
• batch_size = 100: Divides the training data into mini-batches of 100
samples for gradient updates, offering a trade-off between
computational efficiency and gradient accuracy.
• n_hidden_1 = 256, n_hidden_2 = 256: Defines the number of neurons in
the first and second hidden layers, respectively. These layers enable the
model to learn complex, non-linear patterns.
• n_input = 784: Matches the flattened image size (28 × 28).
• n_classes = 10: Corresponds to the 10 digit classes (0-9).
• Design Choice:
o The choice of 256 neurons per hidden layer provides sufficient
capacity to model the complexity of MNIST digits while keeping
computational costs manageable.
o A batch size of 100 is a common choice for MNIST, balancing
memory usage and training stability.
4. Building the Model
model = [Link]([
[Link](n_hidden_1, activation='sigmoid',
input_shape=(n_input,)),
[Link](n_hidden_2, activation='sigmoid'),
[Link](n_classes)
])
• The model is constructed using Keras' Sequential API, which allows
stacking layers in a linear sequence:
o First Hidden Layer: A Dense (fully connected) layer with 256 units,
sigmoid activation, and an input shape of (784,). The sigmoid
function maps inputs to [0, 1], introducing non-linearity.
o Second Hidden Layer: Another Dense layer with 256 units and
sigmoid activation, further transforming the features learned in
the first layer.
o Output Layer: A Dense layer with 10 units (one per digit class),
outputting raw logits (no activation applied here, as softmax is
handled implicitly by the loss function).
• Architecture Overview:
o This MLP has three layers: two hidden layers (256 neurons each)
and one output layer (10 neurons).
o Unlike a single-layer perceptron, the hidden layers allow the
model to learn hierarchical feature representations, making it
capable of handling non-linearly separable data.
o The sigmoid activation is used for historical and simplicity reasons,
though modern MLPs often use ReLU for faster convergence.
• Parameters:
o First layer: (784 × 256 + 256 biases) = 200,960 parameters.
o Second layer: (256 × 256 + 256 biases) = 65,792 parameters.
o Output layer: (256 × 10 + 10 biases) = 2,570 parameters.
o Total: ~269,322 trainable parameters.
5. Compiling the Model
[Link](optimizer=[Link](learning_rate=learning_rat
e),
loss=[Link](from_logits=True),
metrics=['accuracy'])
• optimizer='Adam': Uses the Adam optimizer, an adaptive gradient-based
method that combines momentum and RMSProp. It is well-suited for
deep learning due to its efficiency and robustness.
• learning_rate=0.001: A standard choice for Adam, balancing speed and
stability.
• loss='CategoricalCrossentropy(from_logits=True)': Computes the
categorical cross-entropy loss between one-hot encoded labels and the
model’s logits output. The from_logits=True setting applies softmax
internally to convert logits to probabilities, ensuring numerical stability.
• metrics=['accuracy']: Tracks the classification accuracy (fraction of
correctly classified samples) during training and evaluation.
• Why Adam?: Compared to SGD (used in the single-layer perceptron),
Adam adapts the learning rate for each parameter, leading to faster
convergence, especially for deeper networks like this MLP.
6. Training the Model
avg_set = []
epoch_set = []
class LossHistory([Link]):
def on_epoch_end(self, epoch, logs=None):
avg_set.append(logs['loss'])
epoch_set.append(epoch + 1)
if epoch % 1 == 0:
print(f"Epoch: {epoch + 1:04d}, cost={logs['loss']:.9f}")
history = [Link](x_train, y_train,
batch_size=batch_size,
epochs=training_epochs,
verbose=0,
callbacks=[LossHistory()])

• Initialization:
o avg_set: Stores the average training loss for each epoch, used for
plotting.
o epoch_set: Stores epoch numbers (1 to 20) for plotting.
• Custom Callback (LossHistory):
o A custom Keras callback that runs at the end of each epoch.
o Records the epoch’s loss (logs['loss']) in avg_set.
o Stores the epoch number (epoch + 1) in epoch_set.
o Prints the epoch number and loss every epoch (e.g., Epoch: 0001,
cost=0.123456789).
• Training Process:
o [Link]: Trains the model for 20 epochs, processing the training
data in mini-batches of 100 samples.
o verbose=0: Disables default Keras logging to rely on the custom
callback for output.
o The training loop updates the model’s weights using
backpropagation and the Adam optimizer, minimizing the
categorical cross-entropy loss.
• Batch Processing:
o The training set (60,000 samples) is divided into 600 batches
(60,000 / 100 = 600).
o Each epoch processes all 600 batches, computing gradients and
updating weights for each batch.
• Output:
o The callback prints the loss for each epoch, providing insight into
the model’s learning progress.
o Example output: Epoch: 0001, cost=0.123456789, showing how
the loss decreases as training progresses.

7. Evaluating the Model

test_loss, test_accuracy = [Link](x_test, y_test, verbose=0)
print(f"MODEL accuracy: {test_accuracy:.4f}")
• Evaluation:
o [Link]: Computes the loss and accuracy on the test set
(10,000 samples).
o verbose=0: Suppresses detailed evaluation logs, returning only the
final metrics.
• Output:
o Prints the test accuracy (e.g., MODEL accuracy: 0.9780), indicating
the fraction of correctly classified test samples.
o The test loss is also computed but not printed, though it could be
accessed via test_loss.
• Significance:
o The test accuracy reflects the model’s generalization to unseen
data, a critical measure of performance.
o For MNIST, an MLP with this architecture typically achieves ~97-
98% accuracy, competitive but slightly below modern
convolutional neural networks (CNNs).
8. Plotting the Training Loss
[Link](epoch_set, avg_set, 'o', label='MLP Training phase')
[Link]('cost')
[Link]('epoch')
[Link]()
[Link]()
• Visualization:
o Uses Matplotlib to plot the training loss (avg_set) against epoch
numbers (epoch_set).
o The 'o' argument adds markers at each data point, making the plot
easier to interpret.
o Labels the y-axis as “cost” (loss) and the x-axis as “epoch.”
o Adds a legend to identify the plot as the “MLP Training phase.”
• Purpose:
o The plot visualizes the model’s learning curve, showing how the
loss decreases over epochs.
o A downward trend indicates successful training, while plateaus or
spikes could suggest issues like insufficient capacity or overfitting.
• Expected Outcome:
o The loss typically decreases steadily, reflecting the model’s ability
to fit the training data.
9. What the Code Does
The script implements a multi-layer perceptron to classify MNIST handwritten
digits (0-9). Key features include:
• Architecture: Two hidden layers (256 neurons each with sigmoid
activation) and an output layer (10 neurons, logits output).
• Training: Uses the Adam optimizer to minimize categorical cross-entropy
loss over 20 epochs with a batch size of 100.
• Data Processing: Flattens and normalizes MNIST images and converts
labels to one-hot encoded vectors.
• Evaluation: Measures test accuracy to assess generalization (typically
~97-98%).
• Visualization: Plots the training loss to monitor convergence.
• Improvements Over Single-Layer Perceptron:
o The addition of hidden layers enables the MLP to learn non-linear
relationships, unlike the single-layer perceptron, which is limited
to linearly separable data.
o The Adam optimizer and sigmoid activations improve training
efficiency and expressiveness compared to the simpler SGD-based
single-layer model.
10. Additional Context
• MNIST Dataset:
o A benchmark dataset in machine learning, consisting of 70,000
grayscale images (60,000 training, 10,000 test) of handwritten
digits.
o Each image is 28x28 pixels, and labels are integers (0-9).
o Widely used for testing classification algorithms due to its
simplicity and well-understood properties.
• Why MLP?:
o MLPs are suitable for structured data like MNIST and serve as a
foundational neural network model.
o While not as powerful as CNNs for image data, MLPs are simpler
to implement and understand, making them ideal for educational
purposes.
• Limitations:
o The use of sigmoid activations can lead to vanishing gradients,
slowing training compared to ReLU.
o The model may not achieve state-of-the-art accuracy (~99%+ with
CNNs) due to its inability to exploit spatial structure in images.
o Overfitting is a risk with deeper networks, though the MNIST
dataset is relatively robust to this with sufficient training data.
• Potential Improvements:
o Replace sigmoid with ReLU activation for faster convergence.
o Increase the number of epochs or adjust the learning rate for
better performance.
o Use a CNN for higher accuracy on image data.
11. Example Output
During training, the script produces output like:
Epoch: 0001, cost=0.623456789
Epoch: 0002, cost=0.345678901
...
Epoch: 0020, cost=0.098765432
Training phase finished
MODEL accuracy: 0.9780
The plot displays a decreasing loss curve, confirming that the model learns
effectively. The final test accuracy (~97-98%) indicates strong performance on
the MNIST classification task.

Handwritten Digit Classifier with MLP
No ratings yet
Handwritten Digit Classifier with MLP
7 pages
Deep Learning with TensorFlow & Keras
No ratings yet
Deep Learning with TensorFlow & Keras
42 pages
DL Manual
No ratings yet
DL Manual
17 pages
Single-Layer Perceptron for MNIST Classification
No ratings yet
Single-Layer Perceptron for MNIST Classification
6 pages
XOR Gate MLP Implementation Guide
No ratings yet
XOR Gate MLP Implementation Guide
30 pages
MLP for MNIST Digit Classification
No ratings yet
MLP for MNIST Digit Classification
48 pages
Handwritten Digit Classification with ANN
No ratings yet
Handwritten Digit Classification with ANN
8 pages
DLP Final PDF
No ratings yet
DLP Final PDF
36 pages
Lab 3
No ratings yet
Lab 3
5 pages
ML Iv
No ratings yet
ML Iv
24 pages
Neural Network Implementation Guide
No ratings yet
Neural Network Implementation Guide
22 pages
MLP for Handwritten Digit Recognition
No ratings yet
MLP for Handwritten Digit Recognition
21 pages
TensorFlow and Keras Implementation Guide
No ratings yet
TensorFlow and Keras Implementation Guide
44 pages
Week 2 Neural Network Assignment Guide
No ratings yet
Week 2 Neural Network Assignment Guide
10 pages
Handwritten Digit Recognition with CNN
No ratings yet
Handwritten Digit Recognition with CNN
24 pages
Build Image Classifier with TensorFlow
No ratings yet
Build Image Classifier with TensorFlow
13 pages
Perceptron and MLP for Classification
No ratings yet
Perceptron and MLP for Classification
25 pages
A Simple Overvi-WPS Office
No ratings yet
A Simple Overvi-WPS Office
8 pages
MLP for MNIST Digit Classification
No ratings yet
MLP for MNIST Digit Classification
2 pages
MLP Learning with TensorFlow Guide
No ratings yet
MLP Learning with TensorFlow Guide
8 pages
Feedforward Neural Network with Keras
No ratings yet
Feedforward Neural Network with Keras
58 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Multi-Layer Perceptron Implementation Report
No ratings yet
Multi-Layer Perceptron Implementation Report
17 pages
TensorFlow/Keras Vector Addition & Models
No ratings yet
TensorFlow/Keras Vector Addition & Models
58 pages
Neural Network for Cyber Traffic Analysis
No ratings yet
Neural Network for Cyber Traffic Analysis
4 pages
Deep Learning Model Management Guide
No ratings yet
Deep Learning Model Management Guide
8 pages
MNIST Handwritten Digit Classification
No ratings yet
MNIST Handwritten Digit Classification
3 pages
Deep Learning Lab Manual: TensorFlow
No ratings yet
Deep Learning Lab Manual: TensorFlow
84 pages
MNIST Digit Classification with MLP
No ratings yet
MNIST Digit Classification with MLP
22 pages
MNIST Digit Classification with MLP
No ratings yet
MNIST Digit Classification with MLP
88 pages
Neural Network Performance Analysis Guide
No ratings yet
Neural Network Performance Analysis Guide
30 pages
Neural Network Lab Experiments Guide
No ratings yet
Neural Network Lab Experiments Guide
15 pages
CCS355 Lab Manual: Neural Networks
No ratings yet
CCS355 Lab Manual: Neural Networks
12 pages
Keras Perceptron and MLP Guide
No ratings yet
Keras Perceptron and MLP Guide
9 pages
Keras Multilayer Perceptron Guide
No ratings yet
Keras Multilayer Perceptron Guide
9 pages
MNIST Dataset Neural Network Training
No ratings yet
MNIST Dataset Neural Network Training
2 pages
CCS355 Deep Learning Lab Manual
No ratings yet
CCS355 Deep Learning Lab Manual
26 pages
NNDL Record
No ratings yet
NNDL Record
40 pages
10 - DL Lab Manual Content
No ratings yet
10 - DL Lab Manual Content
29 pages
Neural Networks: XOR & MNIST Experiments
No ratings yet
Neural Networks: XOR & MNIST Experiments
23 pages
Harshvarhdan Deep Learning Lab Manual-29-47
No ratings yet
Harshvarhdan Deep Learning Lab Manual-29-47
19 pages
Deep Learening Lab Manual
No ratings yet
Deep Learening Lab Manual
35 pages
Lab Record
No ratings yet
Lab Record
57 pages
Neural Networks with TensorFlow/Keras
No ratings yet
Neural Networks with TensorFlow/Keras
10 pages
MNIST Digit Classification with MLP
No ratings yet
MNIST Digit Classification with MLP
98 pages
Advanced AI Practical: CNN & NLP Models
No ratings yet
Advanced AI Practical: CNN & NLP Models
42 pages
Neural Network for Handwritten Digit Classification
No ratings yet
Neural Network for Handwritten Digit Classification
30 pages
Neural Networks with Keras & TensorFlow
No ratings yet
Neural Networks with Keras & TensorFlow
14 pages
TensorFlow Vector Addition Guide
No ratings yet
TensorFlow Vector Addition Guide
29 pages
DNN and CNN Applications in Python
No ratings yet
DNN and CNN Applications in Python
38 pages
Hyperparameter Tuning for Neural Networks
No ratings yet
Hyperparameter Tuning for Neural Networks
9 pages
Neural Network for MNIST Digit Classification
No ratings yet
Neural Network for MNIST Digit Classification
14 pages
Python Neural Network Experiments
No ratings yet
Python Neural Network Experiments
11 pages
Neural Network Lab Experiments Guide
No ratings yet
Neural Network Lab Experiments Guide
17 pages
Image Augmentation Techniques in Python
No ratings yet
Image Augmentation Techniques in Python
18 pages
Build and Analyze a Neural Network
No ratings yet
Build and Analyze a Neural Network
23 pages
Lab 5
No ratings yet
Lab 5
4 pages
XOR and CNN Character Recognition Code
No ratings yet
XOR and CNN Character Recognition Code
57 pages
MLP for MNIST Digit Classification
No ratings yet
MLP for MNIST Digit Classification
43 pages
Power BI Desktop Sidebar Functions
No ratings yet
Power BI Desktop Sidebar Functions
4 pages
Understanding Prompt Engineering Basics
No ratings yet
Understanding Prompt Engineering Basics
6 pages
AI Chatbot Development Course Guide
No ratings yet
AI Chatbot Development Course Guide
3 pages
Introduction to Data Analysis Expressions (DAX)
No ratings yet
Introduction to Data Analysis Expressions (DAX)
3 pages
Clinical SAS Programming Workflow Guide
No ratings yet
Clinical SAS Programming Workflow Guide
3 pages
Clinical Study Report Overview
No ratings yet
Clinical Study Report Overview
14 pages
Power BI Basics: Quiz and Answers
No ratings yet
Power BI Basics: Quiz and Answers
3 pages
Python Fundamentals Study Notes
No ratings yet
Python Fundamentals Study Notes
10 pages
Power BI Workshop: Data, DAX, Visuals
No ratings yet
Power BI Workshop: Data, DAX, Visuals
2 pages
Power Query Data Transformation Tasks
No ratings yet
Power Query Data Transformation Tasks
1 page
Python Data Types Quick Reference Guide
No ratings yet
Python Data Types Quick Reference Guide
1 page
SAS INFILE Options and Usage
No ratings yet
SAS INFILE Options and Usage
7 pages
Power BI Beginner Workshop Overview
No ratings yet
Power BI Beginner Workshop Overview
6 pages
Power BI Developer Resume Sample
No ratings yet
Power BI Developer Resume Sample
4 pages
Essential Clinical Trial Documentation
50% (2)
Essential Clinical Trial Documentation
44 pages
Introduction to SAS Programming
No ratings yet
Introduction to SAS Programming
6 pages
Business Support Executive Profile
No ratings yet
Business Support Executive Profile
1 page
Python Best Interview Question Collection
0% (1)
Python Best Interview Question Collection
182 pages
Python for AI Workshop Guide
No ratings yet
Python for AI Workshop Guide
4 pages
Installing Hadoop Sandbox on Laptop
No ratings yet
Installing Hadoop Sandbox on Laptop
1 page
RDDs vs DataFrames in Spark SQL
No ratings yet
RDDs vs DataFrames in Spark SQL
11 pages
Statistical Protocol Development Guide
No ratings yet
Statistical Protocol Development Guide
40 pages
Power BI: Create Interactive Reports Easily
No ratings yet
Power BI: Create Interactive Reports Easily
5 pages
Power Query Data Transformation Guide
No ratings yet
Power Query Data Transformation Guide
62 pages
Overview of Clinical Trial Protocols
No ratings yet
Overview of Clinical Trial Protocols
42 pages
Validating Clinical Trial Data Excerpt
No ratings yet
Validating Clinical Trial Data Excerpt
27 pages
Biostatistics in Clinical Study Reports
No ratings yet
Biostatistics in Clinical Study Reports
44 pages
Effective Classroom Leadership Strategies
No ratings yet
Effective Classroom Leadership Strategies
11 pages
Kendriya Vidyalaya AI Sample Paper 417
No ratings yet
Kendriya Vidyalaya AI Sample Paper 417
33 pages
Art Class: Shoe Drawing Project Guide
No ratings yet
Art Class: Shoe Drawing Project Guide
1 page
Profile of Christian I. Abenir
No ratings yet
Profile of Christian I. Abenir
3 pages
AI Strategies for Business Growth
No ratings yet
AI Strategies for Business Growth
6 pages
Parent Counselling Report: Insights & Actions
No ratings yet
Parent Counselling Report: Insights & Actions
3 pages
English 10 Weekly Learning Plan
No ratings yet
English 10 Weekly Learning Plan
3 pages
Intermediate Algebra Course Syllabus
No ratings yet
Intermediate Algebra Course Syllabus
11 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
27 pages
Tandag Central Elementary School FGD Minutes
No ratings yet
Tandag Central Elementary School FGD Minutes
3 pages
ASL Introduction Unit Plan Summary
No ratings yet
ASL Introduction Unit Plan Summary
5 pages
Impact of Coordinatorship on Teacher Performance
No ratings yet
Impact of Coordinatorship on Teacher Performance
26 pages
Cognitive Training Boosts Dyslexic Readers
No ratings yet
Cognitive Training Boosts Dyslexic Readers
13 pages
FAPE and LRE in Special Education
No ratings yet
FAPE and LRE in Special Education
38 pages
ASVAB Preparation Strategies and Resources
No ratings yet
ASVAB Preparation Strategies and Resources
5 pages
Importance of Computers in Modern Life
No ratings yet
Importance of Computers in Modern Life
11 pages
Understanding Individual Learner Differences
No ratings yet
Understanding Individual Learner Differences
19 pages
Cognitive Learning Theory Explained
No ratings yet
Cognitive Learning Theory Explained
2 pages
Malaysia Education Challenges Analysis
No ratings yet
Malaysia Education Challenges Analysis
8 pages
Phonetics Lesson Plan: /ɪə/ Sound
No ratings yet
Phonetics Lesson Plan: /ɪə/ Sound
2 pages
Teachers' Activity Monitoring Calendar 2020
No ratings yet
Teachers' Activity Monitoring Calendar 2020
17 pages
Unit 3-Life Art From The Wright 3-ts Planning Guide-Grade 5
100% (1)
Unit 3-Life Art From The Wright 3-ts Planning Guide-Grade 5
1 page
English Language Training Results & Program
No ratings yet
English Language Training Results & Program
4 pages
Teaching Strategies in Medical Lab Science
No ratings yet
Teaching Strategies in Medical Lab Science
3 pages
Short AI Literacy Test for Students
No ratings yet
Short AI Literacy Test for Students
23 pages
Facilitating Human Learning Strategies
No ratings yet
Facilitating Human Learning Strategies
27 pages
Building a Corporate Learning Ecosystem
No ratings yet
Building a Corporate Learning Ecosystem
20 pages
Application for Music Educator Position
No ratings yet
Application for Music Educator Position
1 page
Environmental Factors in Language Learning
100% (2)
Environmental Factors in Language Learning
11 pages
Benefits of Play in Child Development
No ratings yet
Benefits of Play in Child Development
29 pages