Naveen Swamy
Deep Learning from Research to
Production using Apache MXNet*
Senior Software Engineer
Amazon AI
*
Outline
• Review of Deep Learning Concepts
• DL in Computer Vision
• Apache MXNet
• RNN using Apache MXNet
• Deep Learning Inference
Input layer
(Raw pixels)
Output
(object identity)
3rd hidden layer
(object parts)
2nd hidden layer
(corners & contours)
1st hidden layer
(edges)
• Originally inspired by human biological
neural systems.
• A system that learns important
features from experience.
• Layers of neurons learning concepts.
• Deep learning != deep understanding
Deep Learning
Source: Ian Goodfellow etal., Deep Learning Book
CAR PERSON DOG
Deep Learning Training
forward
dog
dog
?
error
labels
data
backward
• Pass data through the network –
forward pass
• Define an objective – Loss function
• Send the error back – backward
pass
Model: Output of Training a neural
network
X2
h2
w6 = 0.5
y = 1.0
y` = 0.9
loss = y –
y`
l = 0.1
y
X1 h1
w2 = 0.5
w1 = 0.5
w3 = 0.5
w5 = 0.4
w
4
=
0.5
0.1
0.1
backward pass
forward pass
Deep Learning Inference
• Real time Inference: Tasks that require immediate result.
• Batch Inference: Tasks where you need to run on a large data sets.
o Pre-computations are necessary - Recommender Systems.
o Backfilling with state-of-the art models.
o Testing new models on historic data.
model
forward
dog
Types of Learning
• Supervised Learning – Uses labeled training data to associate input data to output.
• Classification: Output is discrete categories
• Regression: Output is a continuous value
Example: Image classification, Speech Recognition, Machine translation
• Unsupervised Learning - Learns patterns from Unlabeled data.
Example: Clustering, Association discovery.
• Active Learning – Semi-supervised, human in the middle..
• Reinforcement Learning – learn from environment, using rewards and feedback.
Steps in Training
pre-
process
data
define
neural
network
define
loss
function
feed a
batch of
data
measure
training
accuracy
and loss
validate
model
backprop
training loop
loss &
optimizatio
n
find
gradients
update
weights
W	=	W	+	%W
Optimization
• find parameters that minimize the loss function
• Gradient Descent: Iteratively update parameters to get the
most optimal value for the objective function
Stochastic Gradient descent
A single iteration for the parameter update runs through a BATCH of the
training data
while True:
data_batch = sample_training_data(data, batch_size)
weights_grad = evaluate_gradient(loss_fun, data_batch, weights)
weights += - step_size * weights_grad
Overfitting/Underfitting
• Underfitting: model performs bad on training data
• Adding new features, increase feature cartesian product – nth degree
polynomial , reduce regularization.
• Overfitting: the model performs well on the training data but does not perform
well on the validation data.
• Use Regularization
Dropout
• keeping a neuron active with some probability p
• forces learning by all neurons.
• dropout is only applied during training and not at test.
Srivastava, Nitish, et al. ā€Dropout: a simple way to prevent neural networks from
overfittingā€, JMLR 2014
Perceptron
!"
!#
!$
!%
!&
y(3
wn
(2
("
z	= 0 1 + 34 (4. !4
6 = 7
", 49 : ≄ <=>?@=ABC
āˆ’"
• Weighted Linear Combination of
Input and weights
• Linear/Non-Linear decision
• Linear Offset = Bias
• Learn W and b
1
b
Perceptron
SpamHam
Multi Layer Perceptron
a.k.a Fully Connected Networks/ Feed Forward Networks or Dense Layers
Activation/ Non Linearity
• many real world problems
problems cannot be expressed
with a linear function.
• Without non-linearity, model is
just computing a linear
function.
Source: Andrew Ng’s DeepLearning course
10
0 0
10
1 0
linear function non-linearP∧Q P⨁Q
P Q P ∧ Q P ⨁ Q
T T T T
T F F F
F T F F
F F F T
Credits: Cyrus Vahid
Convolutional Neural Networks
• Convnets exploits the spatial co-relation in images.
• vastly reduces the number of learnable parameters.
• Neurons connected to a small region of the previous layer.
• Convnet architecture uses 3 types of layers CONV, POOL, FC
• Each layer transforms 3D input volume to 3D output volume of neuron
activations.
Source: Andrej Karpathy’s cs231nConvnet Arrangement
Convolutional layer
• Slide a kernel across the input image to get an output volume of activations.
• A kernel or filter is spatially small tensor that extends to full input depth.
• Each neuron in output is connected to a small patch in input.
• multiple filters each learning different feature at a given layer.
Pooling
• Down samples input to pick the most important features
• Reduces spatial size, parameters and compute.
• MAX, AVG, L2-NORM Pooling
Source: Andrej Karpathy’s cs231n
Outline
• Apache MXNet Framework
• RNN using Apache MXNet
• Deep Learning Inference
Apache MXNet - Background
• Apache (incubating) open source project
• Framework for building and training
DNNs
• Created by academia (CMU and UW)
• Adopted by AWS as DNN framework of
choice, Nov 2016
http://mxnet.apache.org
Why MXNet
Apache MXNet Customer Momentum
Simple, Easy-to-
Understand Code
Flexible, Imperative
Structure
Dynamic Graphs
High Performance
§ Neural networks can be defined using simple, clear, concise code
§ Plug-and-play neural network building blocks – including predefined
layers, optimizers, and initializers
§ Eliminates rigidity of neural network model definition and brings together
the model with the training algorithm
§ Intuitive, easy-to-debug, familiar code
§ Neural networks can change in shape or size during the training process
to address advanced use cases where the size of data fed is variable
§ Important area of innovation in Natural Language Processing (NLP)
§ There is no sacrifice with respect to training speed
§ When it is time to move from prototyping to production, easily cache
neural networks for high performance and a reduced memory footprint
MXNet Gluon – Imperative & Fast
Credits: Cyrus Vahid
GluonCV: a Deep Learning Toolkit for Computer Vision
https://gluon-cv.mxnet.io
50+ Pre-trained models, with training scripts, datasets, tutorials
Credits: Thomas Delteil
GluonCV: pre-trained models, help to choose
Credits: Thomas Delteil
GluonCV: example code
Credits: Thomas Delteil
Outline
• RNN using Apache MXNet
• Deep Learning Inference
Sequence Models with Apache
MXNet
Sandeep Krishnamurthy
Amazon AI
Agenda
• Motivation – Sequence Models
• Natural Language Processing – Basics
• Recurrent Neural Networks (RNN)
• Long Short Term Memory (LSTM)
• Lab – Sentiment Analysis
Motivation – Sequence Models
Applications with Sequence Data
Speech recognition
ā€œAlexa. Tell me a joke!ā€
Sentiment classification
ā€œThere is nothing to like
in this movie.ā€
Machine translation Voulez-vous chanter avec
moi?
Do you want to sing with
me?
Image v/s Sequence Data
Context
Pattern Recognition
Credits - https://tinyurl.com/yc8twa8o; https://tinyurl.com/y725b7bg
Natural Language Processing - Basics
Words - One Hot Vectors
• Vocabulary or Dictionary: list of all the words
• One hot encoded vector: Word representation
Credits - https://tinyurl.com/ya4bu5wv
Words – One Hot Vectors - Problems
Orange => Fruit
Apple => _____?
• Problems
• Relations are not captured
• No Generalization
Credits - https://tinyurl.com/ydf72mq6
10,000 word Vocabulary
Words - Embeddings
• 4 Dimension (Features) Embedding
• Each word is a vector of these 4 features
Orange => Fruit
Apple => Fruit
Credits - https://tinyurl.com/ydf72mq6
Words – Embeddings (contd…)
Credits - https://tinyurl.com/yagpo8og
Embeddings – In Practice
• Pre-Trained Embeddings: Billions of words, different
techniques
• Example: GloVe, Word2Vec, FastText
• GloVe: Ratios of word-word co-occurrence probabilities have
some information and meaning
Language Model
• Estimate the probability of the sequence of words(statement)
Credits - https://tinyurl.com/zytlcy5
Summary
• Representing words: One Hot Vector, Embeddings
• Sentences and beyond: Language Modelling
Notations
Representing a sentence
x: Harry Potter and Hermione Granger invented a new spell.
!"#$ !"%$
!"&$ … !"'$
!"#$.. !"'$ => Embedding vectors for each word in the sentence
Sequence Model - Problem Formulation
• Given a Sequence X:
• X:
• Predict Y
• Y -> Discrete (Ex: Sentiment Classification)
• Y -> Sequence (Ex: Language Translation)
!"#$ !"%$ !"&$ … !"'$
Recurrent Neural Networks (RNN)
Sequence Data – MLP - Problems
• Variable Length Inputs
• Memory / Context
• Too complex, too many parameters
Credits - https://tinyurl.com/ydf72mq6
RNN
• Terminologies
• Time Step
• RNN Cell
• Unrolled
• Variable Length Inputs
• Memory / Context
• Less Parameters
Credits - https://tinyurl.com/q6dcybc/
Context
RNN – Long Range Dependencies
Cat ate stomach full, ….., was sleeping on the bed.
Cats ate stomach full, ….., were sleeping on the bed.
Problem: Assuming the sequence is long, it will be hard to
ā€œmemorizeā€ or ā€œcaptureā€ the long-range singular/plural
dependencies
Solution: Have some memory by passing on some extra
information?
Long Short Term Memory (LSTM)
• 3 Inputs: Previous cell hidden state,
memory(cell) state, x<t>
• 2 Outputs: Memory, New hidden state
Credits - https://tinyurl.com/q6dcybc/
Long Short Term Memory (LSTM)
• 3 Inputs: Hidden State, Memory(Cell)
State, x<t>
• Information Highway
Credits - https://tinyurl.com/q6dcybc/
Long Short Term Memory (LSTM)
• 3 Inputs: Hidden State, Memory(Cell)
State, x<t>
• Information Highway
• Forget Gate
Credits - https://tinyurl.com/q6dcybc/
Long Short Term Memory (LSTM)
• 3 Inputs: Hidden State, Memory(Cell)
State, x<t>
• Information Highway
• Forget Gate
• Input Gate – Candidate output
Credits - https://tinyurl.com/q6dcybc/
Long Short Term Memory (LSTM)
• 3 Inputs: Hidden State, Memory(Cell)
State, x<t>
• Information Highway
• Forget Gate
• Input Gate – Candidate output
• Output Gate
• 2 Outputs: Memory (Long Term
Dependencies), New hidden state
Credits - https://tinyurl.com/q6dcybc/
Summary
• Representing a sentence: !"#$
…….. !"%$
• Recurrent Neural Networks (RNN): Context
• Long Short Term Memory (LSTM): Information highway,
Gates, Memory
Lab Time!
Sentiment analysis
A popular application of Natural Language Processing (NLP) that
classifies text or speech into a positive or negative feeling.
Credits- https://tinyurl.com/y7wsgp4b
Dataset
• http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.
tar.gz
• ~12,500 positive movie reviews
• ~12,500 negative movie reviews
• ~17,500 samples for training
• ~7,500 samples for testing
Positive reviews Negative reviews
"this was an awesome movie!" ā€i just could not watch it till the
end."
Gluon NLP Toolkit
• https://gluon-nlp.mxnet.io/
• Extension on top of Gluon
• State of the Art (SOTA) deep learning models
• Scripts
• Model Zoo
• High-Level APIs and Abstractions tailored for NLP
• Vocabulary, Embedding
• Data Loading, Data Pre-Processing, Data Iterators
• Operators, Loss Functions
Model
Model Server
Mobil
e
Desktop
IoT
Internet
So what does a deployed model looks like?
Credits: Hagay Lupesko
Performance
Availability
Networking
Monitoring
Model Decoupling
Cross Framework
Cross Platform
The Undifferentiated
Heavy Lifting of
Model Serving
Model Server for
MXNet
Credits: Hagay Lupesko
Trained
Network
Model
Signature
Custom
Code
Auxiliary
Assets
Model Archive
Model Export CLI
Model Archive
Back
Credits: Hagay Lupesko
MMS
Dockerfile
Pull or Build
Push
Launch
Containerization
Container Cluster
MMS Container
MMS ContainerMMS Container
MXNet Netty
MXNet Model Server
Lightweight virtualization, isolation, runs anywhere
Back
Credits: Hagay Lupesko
MXNet Model Server
• Machine learning model server
• Serves MXNet and ONNX models
• Automated HTTP endpoints setup
• Auto-scales to all available CPUs and GPUs
• Pre-built and configured containers
• CLI to package model artifacts for serving
• Open source project under AWS Labs
http://modelserver.io
Credits: Hagay Lupesko
Apache MXNet Social
YouTube: /apachemxnet
Twitter: @apachemxnet
Reddit: r/mxnet
Medium: /apache-mxnet
Resources/References
• Apache MXNet – Flexible and efficient deep learning.
• https://github.com/apache/incubator-mxnet
• https://github.com/TalkAI/apache-mxnet-odsc-2018
• Apache MXNet Gluon Tutorials
• The Deep Learning Book
• MXNet – Using pre-trained models
• Amazon Elastic MapReduce
• https://medium.com/apache-mxnet
• https://twitter.com/apachemxnet
Thank You