1
serveRless
computing for R
EARL 2019 – London
by Thomas Laber
2
Agenda
01 02 03
The Problem Serverless Architecture
Building a scalable What does this A solution
and flexible pipeline buzzword actually architecture for Azure
to deploy R models mean?
3
Agenda
01 02 03
The Problem Serverless Architecture
Building a scalable What does this A solution
and flexible pipeline buzzword actually architecture for Azure
to deploy R models mean?
4
The Problem
How can we build a
cost-effective data
science pipeline that
allows data scientists
using R to easily put
their models into
production, that
scales well and is
easy to maintain?
Rare picture of the fabled „eierlegende Wollmilch“
What we want
5
a serverless data science architecture
get training data docker pull
Batch Training
Pool 1 Pool 2 … Pool n
store models
Model storage
Project 1 Project 2 … Project n
get scoring data
read models
write results Auto-Scaling
Trigger REST
Batch Scoring Realtime Scoring
6
Agenda
01 02 03
The Problem Serverless Architecture
Building a scalable What does this A solution
and flexible pipeline buzzword actually architecture for Azure
to deploy R models mean?
7
The Solution
Just like wireless
internet has wires
somewhere, serverless
architectures still have
servers somewhere.
What ‘serverless’ really
means is that, as a
developer you don’t
have to think about
those servers. You just
focus on code.
(AWS) Components of a serverless architecture [Link]
Why serverless?
8
The promise: Focus on coding, not maintenance
NO ADMINISTRATION SCALE ON DEMAND PAY-PER-USE FASTER TURNAROUND
No server provisioning and Scaling is automatic and Billing is based on actual Spinning up new
maintenance is necessary. part of the service. compute resources used. environments is quick and
Hardware and OS are No compute used, no costs. allows for faster
abstracted away. experimentation.
The Evolution of the Cloud
9
Cloud provider versus customer roles for managing cloud services
Entreprise IT Infrastructure as a Service Platform as a Service Functions as a Service
legacy IT Virtual Machines Containers Functions
Customer Managed
Applications Applications Applications Applications
Scalability Scalability Scalability Scalability
Security Security Security Security
Customer Managed
OS OS OS OS
Provider Managed
Virtualization Virtualization Virtualization Virtualization
Provider Managed
Provider Managed
Servers Servers Servers Servers
Storage Storage Storage Storage
Networking Networking Networking Networking
Data Centers Data Centers Data Centers Data Centers
What now?
10
VMs vs containers vs functions
Virtual Machines Containers Functions
Unit of Scale machine application function
Abstraction hardware operation system language runtime
Packaging image container file code
Configure machine, storage, network, OS servers, applications, scaling run code when needed
Execution multi-threaded, multi-task multi-threaded, single-task single-threaded, single-task
Runtime hours to months minutes to days microseconds to seconds
Unit of Cost per VM per hour per container per hour per memory/second per request
Amazon EC2 Fargate Lambda
Azure Azure VM Container Instances Azure Functions
Google Google Compute Engine Google Kubernetes Cloud Functions
Three ways to interact with Cloud Providers
11
Terminology
Browser (GUI)
Command Line Interface (CLI)
Azure: ARM-Templates
AWS: CloudFormation
REST calls
excellent tool: Postman
Three ways to interact with Cloud Providers
12
Browser
Browser
(GUI)
Three ways to interact with Cloud Providers
13
Terminology
Command Line Interface (CLI)
Azure: ARM-Templates
AWS: CloudFormation
IaC
(Infrastructure
as Code)
Three ways to interact with Cloud Providers
14
Terminology
REST calls
excellent tool: Postman
Cost Comparison
15
Serverless can be cheap, but depends on work load
Big
lock-in
potential!
Cost Comparison
16
Serverless can be cheap, but depends on work load
You create a Linux container group with a 1 vCPU, 1 GB configuration once daily during a month (30
days). The duration of each container group is 5 minutes (300 seconds).
Memory duration
Number of container groups memory duration (seconds) GB price per GB-s number of days Total
1 300 seconds 1 € 0.0000014 per GB-s 30 days € 0.013
vCPU duration
Number of container groups vCPU duration (seconds) vCPU price per vCPU-s number of days Total
1 300 seconds 1 € 0.0000129 per vCPU-s 30 days € 0.116
TOTAL = € 0.013 + € 0.116 = € 0.13
Cost Comparison
17
Serverless can be cheap, but depends on work load
You create a Linux container group with a 1 vCPU, 2 GB configuration 50 times daily during a month (30
days). The container group duration is 150 seconds.
Memory duration
Number of container groups memory duration (seconds) GB price per GB-s number of days Total
50 150 seconds 2 € 0.0000014 per GB-s 30 days € 0.63
vCPU duration
Number of container groups vCPU duration (seconds) vCPU price per vCPU-s number of days Total
50 150 seconds 1 € 0.0000129 per vCPU-s 30 days € 2.903
TOTAL = € 0.63 + € 2.903 = € 3.53
Cost Comparison
18
Serverless can be cheap, but depends on work load
You create a Linux container group with a 4 vCPU, 8 GB configuration 2 times daily during a month (30
days). The container group duration is 1 hour (= 3600 seconds).
Memory duration
Number of container groups memory duration (seconds) GB price per GB-s number of days Total
2 3600 seconds 8 € 0.0000014 per GB-s 30 days € 2.419
vCPU duration
Number of container groups vCPU duration (seconds) vCPU price per vCPU-s number of days Total
2 3600 seconds 4 € 0.0000129 per vCPU-s 30 days € 11.146
TOTAL = € 2.42 + € 11.146 = € 13.56
19
Agenda
01 02 03
The Problem Serverless Architecture
Building a scalable What does this A solution
and flexible pipeline buzzword actually architecture for Azure
to deploy R models mean?
Two Use Cases
20
Model training and scoring have different architecture requirements
TRAINING SCORING
• Usually long running tasks • Mostly short running tasks
• Resource intensive • Resource usage low
• Mostly in batch mode • Either adhoc or on schedule
OUR FOCUS TODAY
Serverless Options
21
We primarily looked at the following options:
AWS Lambda Azure Functions Azure Container
Instances
Requirements
22
Many ways to realize serverless scoring architecture with different pros and cons
Must support at least
Loading from blobs
time/http trigger
Trigger Deploy Resources Load model Serve Model Score
! Custom runtime Optionally serving
support necessary scores with HTTP
Function as a Service
23
AWS Lambda
additional layers R/
└──library/
├── package 1/
├── package 2/ [Link]
├── package …/
└── package n/
Philipp Schirmer
R/
├──bin/
├──lib/
├──library/ [Link]
└──…
bootstrap
Compiled packages base layer runtime.r
can be a headache…
A function can use up to 5 layers at a time. The total unzipped size of the function
and all layers can't exceed the unzipped deployment package size limit of 250MB.
Function as a Service
24
Azure Functions
Neal Fultz
C# Java …
function code
modern open source high
performance RPC framework
language worker process
Protocol Buffers
.NET .NET Core
host Dirk Eddelbuettel
host process Google's language-neutral,
platform-neutral, extensible
mechanism for serializing
functions V2 runtime
structured data
Why Azure Container?
25
Container give us maximum flexibility regarding runtime and reduce vendor lock-in
PROS CONS
More setup involved compared
Supports arbitrary runtimes
to FaaS such as AWS Lambda
No problems with compiled libraries Higher startup times compared
to FaaS depending on Image
Lots of supported triggers in
combination with logic apps
Low vendor lock-in
Pay-as-you-go
Docker Basics
26
Terminology
docker build docker run
Dockerfile Docker Image Docker Container
FROM rocker/rstudio:3.5.3
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY * ./TravisR/
WORKDIR /usr/src/app/TravisR
Azure Container + Logic App
27
Our setup currently looks like this
01 Write Code
01
User Rstudio or the IDE of your choice to write some arbitrary
R code, ideally as a package
02 Create Dockerfile 02 Dockerfile
Use serverless to create a Dockerfile
or do it yourself
03 git Repo 03
Push your package and the dockerfile to github
Azure Container + Logic App
28
Our setup currently looks like this
04 Create Container Registry 04 ACR 05 ACR Task
Create a resource group and more importantly an Azure
Container Registry (ACR)
05 Create ACR Task
Create a registry task which builds a docker image
based on your github repo 06 07
ACI
06 Create Container Instance
Create a Azure Container Instance (ACI) pulling the
docker image from the ACR
07 Manage it
Use a Logic App or single REST calls to start and stop it
Logic App
30
A serverless workflow orchestration tool with GUI for prototyping
LOGIC APP DESIGNER LOGIC APP TEMPLATING
Scoring Workflow
31
Template workflow for a wide range of scenarios
Specify trigger type (time, HTTP, email,
etc)
Create container resources based on
spec (cpu + RAM + nodes)
Check if container group is spawned
successfully
Delete container after work is done
serveRless Package
32
We want to build a package to help automate this setup
IDEA STATUS
Build prototype to test setup
Build an R package that allows R users to
deploy their code in a serverless setup Build Rstudio Addin
R Package serveRless
Many thanks to Hong Ooi for his awesome work supporting R in Azure!
And of course Christoph Bodner und Florian Schwendinger who are not here today.
33
Short Demo
34
Questions?
35
Thank you
for your attention!
Feel free to reach :
[Link]/in/thomas-laber
[Link]
[Link]
What now?
36
VMs vs containers vs functions
Virtual Machines Containers Functions
Unit of Scale machine application function
Abstraction hardware operation system language runtime
Packaging image container file code
Configure machine, storage, network, OS servers, applications, scaling run code when needed
Execution multi-threaded, multi-task multi-threaded, single-task single-threaded, single-task
Runtime hours to months minutes to days microseconds to seconds
Unit of Cost per VM per hour per container per hour per memory/second per request
Amazon EC2 Fargate Lambda
Azure Azure VM Container Instances Azure Functions
Google Google Compute Engine Google Kubernetes Cloud Functions