x
DPDK Load Balancers
RSS H/W LOAD BALANCER –
DPDK S/W LOAD BALANCER –
L4 LOAD BALANCERS –
L7 LOAD BALANCERS
NOV 2018
Contact
• Vincent, Jay L - Your Contact For Load Balancer Follow up
[Link]@[Link]
• M Jay - For Feedback & For DPDK Cookbook
• [Link]@[Link]
2
Contributions And Acknowledgements
• M Jay Jay L Vincent John M Morgan
• Bob Ghaffari Jianwei Ma Ray Kinsella
• Walter Gilmore DPVS Community Brian Aherne
• Jim St Leger Tim O’Driscoll Hongjun Ni
• Jokul Li Steve Cunming Liu, Yu Y
• Edward Verplanke DPDK Community Network Builders
• Sujata Tibrewala Heqing Zhu Bill Carlson Alejandro Corredor
• Ed Pullin Zhang Pan1 Mike Glynn And Many others..
• Great Documentation in [Link]
• [Link]
• And Many others
3
Agenda
• RSS Load Balancer – NIC
• DPDK s/w Load Balancer – Cache Sharing
• L4 Load Balancer – Fast Forwarder
• Balance The Load? Or Benefit From Locality? - Stickiness
• Why to have Man in the Middle? – Direct Response
• Quiz - Can you have 2 devices with same IP addresses in same subnet?
• Direct Response, Full NAT – What? Where?
• One Arm Versus Two Arms Load Balancer – Devil is in the Details
• Call To Action
4
Receive Side Scaling – NIC Load Balancing**
NIC H/W Distributes Packets to different CPUs through queues
5
Receive Side Scaling – NIC Load Balancing
Pro: No software touching the packets for distribution decision
Con: Scope Is Only For Packet Types That Are Supported by NIC H/W RSS
6
DPDK S/W Load Balancer
Since S/W, packet types not supported by NIC also can be supported
7
DPDK S/W Load Balancer
• I/O Core Decides which Worker Core to Balance the Load with.
• In this case worker core transitions the cache line from “M” state to “S” state
• Exclusive to Shared
• Packet lookup (5 tuple) – I/O Core Cache Pulls The Packet (Header) – “E” state
• Worker core transitions the cache line to “S” state. – “E” state to “S” state
• Transition from Modified State (More Expensive)
• In case I/O Core wanted to store meta data as the result of lookup – “M” state
• Worker core transitions the cache line from “Modifed” State in I/O Core - Expensive.
Use Prefetch Hint
8
L4 Load Balancer – Kernel Space – Full NAT
RSI-IP
RS2-
IP
VIP
VIP – “Virtual IP” = “Service IP; ”RS1, RS2 – “Real Server IP”
The benefit of Full NAT – Servers can be used as is. No change needed
9
What About DPDK Based Load Balancer?
• DPVS - DPDK Based Load Balancer
• DP – From DPDK VS – From Linux Virtual Server
• [Link]
• Runs on commodity Linux Servers
• Easily Scales Up / Down – Simply Add / Remove Servers
• DPVS matches the packets to their required services and matches/ balances to “Real” servers
• Optimized for packet processing performance – DPDK Based
• Tested With 10 / 25 / 40 Gig NICs
• Control Plane using IPVS (IP Virtual Server) -
[Link]
10
DPVS - Data Path Benefiting From DPDK
11
DPVS Features & DPDK High Performance Technology
Major features of DPVS including: DPDK High performance Techniques:
Kernel by-pass (user space implementation)
• L4 Load Balancer, including FNAT, DR mode, etc.
Share-nothing, per-CPU for key data (Lockless)
• Different schedule algorithm like RR, WLC, WRR, etc.
RX Steering and CPU affinity (avoid context switch)
• User-space Lite IP stack (IPv4, Routing, ARP, ICMP ...).
Batching TX/RX
• SNAT mode for Internet access from internal network.
Zero Copy (avoid packet copy and syscalls).
• Support KNI, VLAN, Bonding for different IDC environment. Polling instead of interrupt.
• Security aspect, support TCP syn-proxy, Conn-Limit, black-list. lockless message for high performance ICP.
* QoS: Traffic Control other techs enhanced by DPDK.
[Link]
12
Control Path Benefiting From IP Virtual Server (IPVS)
• IPVS Implements transport-layer load
balancing inside the Linux kernel, so called
Layer-4 switching.
• IPVS running on a host acts as a load
balancer at the front of a cluster of real
servers.
• IPVS can direct requests for TCP/UDP
based services to the real servers
• Thus making services of the real servers to
appear as a virtual service on a single IP
address
[Link]
13
DPVS - Data Path (DPDK) + Control Path (IPVS)
ONE ARM
• Both clients & Servers
• On same side
• [Link] alone enough
TWO ARMS
• Both clients & Servers
• On same side
• [Link] & [Link]
14
One- arm and Two-arm Terminology
• Two-arm
• The term two-arm means, you have clients in one side of load-balancer (LB)
• And servers (RS) in another side,
• LB forwards packets between its two logical network interfaces.
• For example, WAN-to-LAN load balancing.
• One-arm
• On the other hand, one-arm means all clients and servers are in same side of load-balancer.
• LB forwards traffic through the same logical network interface.
• Logical interface (or device) could be
• Physical DPDK interface
• DPVS virtual devices like
• bonding,
• vlan and
• Tunnel devices.
15
DPVS With IPVS - Block Diagram
16
Spread To Servers? Or Benefit From Stickiness?
• Flow Pinning has the benefit of Locality
• Caching Benefits thereof.
• When Spreading the Load, say for instance Round Robin
• You trade locality for overall utilization.
• What Other Scheduling Algorithms Are Out There?
• Least Connected
• Tagging Weight To The Server
That Is About Scheduling. What About Data Flow? 17
Forwarding Methods - Thin Request. Bulky Response
• For Single Request, server returns multiple objects.
• So Why Have Load Balancer in the path of bulky Response?
• Why not Load Balancer Become Transparent During Server Response?
• That is Direct Response Server – DRS
• How?
Why To Have Man In The Middle During Response?
18
Request Is Served By Load Balancer But Not Response
Ready For Quiz?
19
Quiz
• Quiz
• Can you have 2 devices with same IP addresses in same subnet?
• If no, Why not?
• If yes, how can it?
20
Load Balancer & Server Farms Have Same Service IP Address
VIP
VIP
PRO: Great For Scalability
Consideration: Real Servers must host the service at VIP (rather than in RS IP) 21
Role Of Dummy Interface In Direct Routing
• Direct Routing:
• Packets from end users are forwarded directly to the real server. The IP packet is not
modified
• So, the real servers must be configured to accept traffic for the virtual server's
IP address.
• This can be done using a dummy interface or packet filtering to redirect traffic
addressed to the virtual server's IP address to a local port.
• The real server may send replies directly back to the end user.
• Thus, the Load Balancer does not need to be in the return path.
[Link].arp_ignore=1
22
[Link].arp_ignore=1
23
Tunnelling Versus Direct Response
• IP-IP Encapsulation (Tunnelling):
• The main advantage of using tunnelling is that real servers can be on a different
networks.
• Allows packets addressed to an IP address to be redirected to another address,
possibly on a different network.
• In the context of layer 4 switching the behavior is very similar to that of direct
routing
• Except that when packets are forwarded they are encapsulated in an IP packet,
rather than just manipulating the Ethernet frame.
24
Direct Response, Full NAT – What? Where?
• Direct Response Use Case Scope:
• Higher bandwidth During Response Compared To Request
• Configuring Server Is OK – apr_ignore=1
• Service should be hosted in VIP Server address
• Full NAT Use Case Scope
• Off The Shelf Server should be used as it is.
• No change to Server Configuration
Hosting at Default RS IP address
• Note that Rewrite happens for both Source IP Address and Destination IP Address
25
Note: Reference IPVS Documentation. Some of the above data may be dated. 26
One Arm Vs Two Arm – Devil Is In The Details
• One arm: The real servers can send reply packets directly to the end users without them
needing to be altered by the Load Balancer.
• Thus, we saw that LB does not need to be the gateway for the real servers.
• Two arm: However, in some situations (say Two Arm), for instance
• because the load balancer really is the gateway to the real server's network.
• In such case, it is desirable to route return packets from the real servers via LB.
• The source address of these packets will be the VIP.
• However the VIP belongs to an interface on the LB.
• Thus, it will drop the packets as being bogus.
• There are several approaches to this problem.
• Probably the best is to apply a kernel patch supplied by Julian Anastasov.
• This adds proc entries that allow this packet-dropping behaviour to be disabled on a per-interface basis.
• This patch can be obtained from [Link]
27
Current Status - DPVS
• 10 Gig NIC
• Successfully builds.
• Verified FNAT works Successfully
• Verified DRS works Successfully
• 40 Gig NIC
• Successfully builds – Screenshot on next page
28
Git checkout 9b16……ae83d
29
Status – 40 Gig NIC – Successful Build Of Load Balancer
30
40 Gig NIC Load Balancer – Next Steps
• Community Is Considering to upgrade the DPDK Version to “DPDK 17.11.4 (LTS).
New patches would be released then.
31
L4, L7 Load Balancers – Next Step
• Interested to reduce infrastructure cost?
• With L4 Load Balancers?
• With L7 Load Balancers?
32
Call To Action
• Want To Benefit From Cost Reduction of Network Appliances?.
• Interested in Do it yourself white box load balancer?
• Much of this work is being driven by innovative Cloud Service Providers in PRC
• To reduce cost of network appliances in their infrastructure
33
Contact
• Jay L Vincent - Your Contact For Load Balancer Follow up
• [Link]@[Link]
• M Jay - For Foils Feedback
• [Link]@[Link]
34
**Acknowledgements & References
• [Link]
kernel-networking-walkthrough
• [Link]
• [Link]
• [Link]
• [Link]
35