Apache MXNet ODSC West 2018

Naveen Swamy
Deep Learning from Research to
Production using Apache MXNet*
Senior Software Engineer
Amazon AI
*

Outline
• Review of Deep Learning Concepts
• DL in Computer Vision
• Apache MXNet
• RNN using Apache MXNet
• Deep Learning Inference

Input layer
(Raw pixels)
Output
(object identity)
3rd hidden layer
(object parts)
2nd hidden layer
(corners & contours)
1st hidden layer
(edges)
• Originally inspired by human biological
neural systems.
• A system that learns important
features from experience.
• Layers of neurons learning concepts.
• Deep learning != deep understanding
Deep Learning
Source: Ian Goodfellow etal., Deep Learning Book
CAR PERSON DOG

Deep Learning Training
forward
dog
dog
?
error
labels
data
backward
• Pass data through the network –
forward pass
• Define an objective – Loss function
• Send the error back – backward
pass
Model: Output of Training a neural
network
X2
h2
w6 = 0.5
y = 1.0
y` = 0.9
loss = y –
y`
l = 0.1
y
X1 h1
w2 = 0.5
w1 = 0.5
w3 = 0.5
w5 = 0.4
w
4
=
0.5
0.1
0.1
backward pass
forward pass

Deep Learning Inference
• Real time Inference: Tasks that require immediate result.
• Batch Inference: Tasks where you need to run on a large data sets.
o Pre-computations are necessary - Recommender Systems.
o Backfilling with state-of-the art models.
o Testing new models on historic data.
model
forward
dog

Types of Learning
• Supervised Learning – Uses labeled training data to associate input data to output.
• Classification: Output is discrete categories
• Regression: Output is a continuous value
Example: Image classification, Speech Recognition, Machine translation
• Unsupervised Learning - Learns patterns from Unlabeled data.
Example: Clustering, Association discovery.
• Active Learning – Semi-supervised, human in the middle..
• Reinforcement Learning – learn from environment, using rewards and feedback.

Steps in Training
pre-
process
data
define
neural
network
define
loss
function
feed a
batch of
data
measure
training
accuracy
and loss
validate
model
backprop
training loop
loss &
optimizatio
n
find
gradients
update
weights
W = W + %W

Optimization
• find parameters that minimize the loss function
• Gradient Descent: Iteratively update parameters to get the
most optimal value for the objective function

Stochastic Gradient descent
A single iteration for the parameter update runs through a BATCH of the
training data
while True:
data_batch = sample_training_data(data, batch_size)
weights_grad = evaluate_gradient(loss_fun, data_batch, weights)
weights += - step_size * weights_grad

Overfitting/Underfitting
• Underfitting: model performs bad on training data
• Adding new features, increase feature cartesian product – nth degree
polynomial , reduce regularization.
• Overfitting: the model performs well on the training data but does not perform
well on the validation data.
• Use Regularization

Dropout
• keeping a neuron active with some probability p
• forces learning by all neurons.
• dropout is only applied during training and not at test.
Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from
overfitting”, JMLR 2014

Perceptron
!"
!#
!$
!%
!&
y(3
wn
(2
("
z = 0 1 + 34 (4. !4
6 = 7
", 49 : ≥ <=>?@=ABC
−"
• Weighted Linear Combination of
Input and weights
• Linear/Non-Linear decision
• Linear Offset = Bias
• Learn W and b
1
b

Multi Layer Perceptron
a.k.a Fully Connected Networks/ Feed Forward Networks or Dense Layers

Activation/ Non Linearity
• many real world problems
problems cannot be expressed
with a linear function.
• Without non-linearity, model is
just computing a linear
function.
Source: Andrew Ng’s DeepLearning course
10
0 0
10
1 0
linear function non-linearP∧Q P⨁Q
P Q P ∧ Q P ⨁ Q
T T T T
T F F F
F T F F
F F F T
Credits: Cyrus Vahid

Convolutional Neural Networks
• Convnets exploits the spatial co-relation in images.
• vastly reduces the number of learnable parameters.
• Neurons connected to a small region of the previous layer.
• Convnet architecture uses 3 types of layers CONV, POOL, FC
• Each layer transforms 3D input volume to 3D output volume of neuron
activations.
Source: Andrej Karpathy’s cs231nConvnet Arrangement

Convolutional layer
• Slide a kernel across the input image to get an output volume of activations.
• A kernel or filter is spatially small tensor that extends to full input depth.
• Each neuron in output is connected to a small patch in input.
• multiple filters each learning different feature at a given layer.

Pooling
• Down samples input to pick the most important features
• Reduces spatial size, parameters and compute.
• MAX, AVG, L2-NORM Pooling
Source: Andrej Karpathy’s cs231n

Outline
• Apache MXNet Framework

Apache MXNet - Background
• Apache (incubating) open source project
• Framework for building and training
DNNs
• Created by academia (CMU and UW)
• Adopted by AWS as DNN framework of
choice, Nov 2016
http://mxnet.apache.org

Apache MXNet Customer Momentum

Simple, Easy-to-
Understand Code
Flexible, Imperative
Structure
Dynamic Graphs
High Performance
§ Neural networks can be defined using simple, clear, concise code
§ Plug-and-play neural network building blocks – including predefined
layers, optimizers, and initializers
§ Eliminates rigidity of neural network model definition and brings together
the model with the training algorithm
§ Intuitive, easy-to-debug, familiar code
§ Neural networks can change in shape or size during the training process
to address advanced use cases where the size of data fed is variable
§ Important area of innovation in Natural Language Processing (NLP)
§ There is no sacrifice with respect to training speed
§ When it is time to move from prototyping to production, easily cache
neural networks for high performance and a reduced memory footprint
MXNet Gluon – Imperative & Fast
Credits: Cyrus Vahid

GluonCV: a Deep Learning Toolkit for Computer Vision
https://gluon-cv.mxnet.io
50+ Pre-trained models, with training scripts, datasets, tutorials
Credits: Thomas Delteil

GluonCV: pre-trained models, help to choose

GluonCV: example code

Outline

Sequence Models with Apache
MXNet
Sandeep Krishnamurthy
Amazon AI

Agenda
• Motivation – Sequence Models
• Natural Language Processing – Basics
• Recurrent Neural Networks (RNN)
• Long Short Term Memory (LSTM)
• Lab – Sentiment Analysis

Motivation – Sequence Models

Applications with Sequence Data
Speech recognition
“Alexa. Tell me a joke!”
Sentiment classification
“There is nothing to like
in this movie.”
Machine translation Voulez-vous chanter avec
moi?
Do you want to sing with
me?

Image v/s Sequence Data
Context
Pattern Recognition
Credits - https://tinyurl.com/yc8twa8o; https://tinyurl.com/y725b7bg

Natural Language Processing - Basics

Words - One Hot Vectors
• Vocabulary or Dictionary: list of all the words
• One hot encoded vector: Word representation
Credits - https://tinyurl.com/ya4bu5wv

Words – One Hot Vectors - Problems
Orange => Fruit
Apple => _____?
• Problems
• Relations are not captured
• No Generalization
Credits - https://tinyurl.com/ydf72mq6
10,000 word Vocabulary

Words - Embeddings
• 4 Dimension (Features) Embedding
• Each word is a vector of these 4 features
Orange => Fruit
Apple => Fruit

Words – Embeddings (contd…)
Credits - https://tinyurl.com/yagpo8og

Embeddings – In Practice
• Pre-Trained Embeddings: Billions of words, different
techniques
• Example: GloVe, Word2Vec, FastText
• GloVe: Ratios of word-word co-occurrence probabilities have
some information and meaning

Language Model
• Estimate the probability of the sequence of words(statement)
Credits - https://tinyurl.com/zytlcy5

Summary
• Representing words: One Hot Vector, Embeddings
• Sentences and beyond: Language Modelling

Representing a sentence
x: Harry Potter and Hermione Granger invented a new spell.
!"#$ !"%$
!"&$ … !"'$
!"#$.. !"'$ => Embedding vectors for each word in the sentence

Sequence Model - Problem Formulation
• Given a Sequence X:
• X:
• Predict Y
• Y -> Discrete (Ex: Sentiment Classification)
• Y -> Sequence (Ex: Language Translation)
!"#$ !"%$ !"&$ … !"'$

Recurrent Neural Networks (RNN)

Sequence Data – MLP - Problems
• Variable Length Inputs
• Memory / Context
• Too complex, too many parameters

RNN
• Terminologies
• Time Step
• RNN Cell
• Unrolled
• Variable Length Inputs
• Memory / Context
• Less Parameters
Credits - https://tinyurl.com/q6dcybc/
Context

RNN – Long Range Dependencies
Cat ate stomach full, ….., was sleeping on the bed.
Cats ate stomach full, ….., were sleeping on the bed.
Problem: Assuming the sequence is long, it will be hard to
“memorize” or “capture” the long-range singular/plural
dependencies
Solution: Have some memory by passing on some extra
information?

Long Short Term Memory (LSTM)
• 3 Inputs: Previous cell hidden state,
memory(cell) state, x<t>
• 2 Outputs: Memory, New hidden state

• 3 Inputs: Hidden State, Memory(Cell)
State, x<t>
• Information Highway

State, x<t>
• Forget Gate

State, x<t>
• Forget Gate
• Input Gate – Candidate output

State, x<t>
• Forget Gate
• Input Gate – Candidate output
• Output Gate
• 2 Outputs: Memory (Long Term
Dependencies), New hidden state

Summary
• Representing a sentence: !"#$
…….. !"%$
• Recurrent Neural Networks (RNN): Context
• Long Short Term Memory (LSTM): Information highway,
Gates, Memory

Sentiment analysis
A popular application of Natural Language Processing (NLP) that
classifies text or speech into a positive or negative feeling.
Credits- https://tinyurl.com/y7wsgp4b

Dataset
• http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.
tar.gz
• ~12,500 positive movie reviews
• ~12,500 negative movie reviews
• ~17,500 samples for training
• ~7,500 samples for testing
Positive reviews Negative reviews
"this was an awesome movie!" ”i just could not watch it till the
end."

Gluon NLP Toolkit
• https://gluon-nlp.mxnet.io/
• Extension on top of Gluon
• State of the Art (SOTA) deep learning models
• Scripts
• Model Zoo
• High-Level APIs and Abstractions tailored for NLP
• Vocabulary, Embedding
• Data Loading, Data Pre-Processing, Data Iterators
• Operators, Loss Functions

Model
Model Server
Mobil
e
Desktop
IoT
Internet
So what does a deployed model looks like?
Credits: Hagay Lupesko

Performance
Availability
Networking
Monitoring
Model Decoupling
Cross Framework
Cross Platform
The Undifferentiated
Heavy Lifting of
Model Serving
Model Server for
MXNet

Trained
Network
Model
Signature
Custom
Code
Auxiliary
Assets
Model Archive
Model Export CLI
Model Archive
Back

MMS
Dockerfile
Pull or Build
Push
Launch
Containerization
Container Cluster
MMS Container
MMS ContainerMMS Container
MXNet Netty
MXNet Model Server
Lightweight virtualization, isolation, runs anywhere
Back

MXNet Model Server
• Machine learning model server
• Serves MXNet and ONNX models
• Automated HTTP endpoints setup
• Auto-scales to all available CPUs and GPUs
• Pre-built and configured containers
• CLI to package model artifacts for serving
• Open source project under AWS Labs
http://modelserver.io

Apache MXNet Social
YouTube: /apachemxnet
Twitter: @apachemxnet
Reddit: r/mxnet
Medium: /apache-mxnet

Resources/References
• Apache MXNet – Flexible and efficient deep learning.
• https://github.com/apache/incubator-mxnet
• https://github.com/TalkAI/apache-mxnet-odsc-2018
• Apache MXNet Gluon Tutorials
• The Deep Learning Book
• MXNet – Using pre-trained models
• Amazon Elastic MapReduce
• https://medium.com/apache-mxnet
• https://twitter.com/apachemxnet

Apache MXNet ODSC West 2018

Apache MXNet ODSC West 2018

More Related Content

What's hot

Similar to Apache MXNet ODSC West 2018

More from Apache MXNet

Recently uploaded

Apache MXNet ODSC West 2018