High Availability Techniques for CCNP ENCOR
High Availability Techniques for CCNP ENCOR
Junmei Zhang
Technical Marketing Eng.
Samer Theodossy
Principal Engineer
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 4
Agenda Schedule & Logistics
For Your
08:30 - 10:30 Reference
16:45 - 18:45
Maren Hurray We are done!!!
Junmei
We value your feedback:
Don't forget to complete your online session evaluations
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Cisco Webex Teams
Questions?
Use Cisco Webex Teams (formerly Cisco Spark)
to chat with the speaker after the session
How
1 Find this session in the Cisco Events Mobile App
2 Click “Join the Discussion”
3 Install Webex Teams or go directly to the team space
4 Enter messages/questions in the team space
[Link]/ciscolivebot#TECCRS-2001
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
Head Quarters
WAAS
Access
Switches
Distribution WAAS
Switches Central Manager
Nexus
WAN Communications
Router Internet Edge Managers
s
Access
Switches Internet Cisco ACE
Wireless LAN Routers Data Center
Regional Site Controller Firewalls
Nexus
Wireless LAN Data
Internet
Controllers
Center
RA-VPN Firewall
Access WAN
Switch Route
r Guest Wireless
DMZ
LAN Controller
Remote Site Switch
Web
Security
Appliance DMZ
Servers
Email
Teleworker/
Mobile Worker Hardware and Security Core
Software VPN Appliance
Switches
WAN
Access Routers
Switch
Stack
MPLS WAN
Router
WANs s Distribution
Switches
User
WAAS Access
Remote
Site Layers
WAAS
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Agenda
• Designing High Availability Networks for the Enterprise
• System Hardware and Software Resiliency
• Foundations of the Structured Network Design
• High Availability Architectures:
• Enterprise Wired LAN
• Enterprise Wireless LAN
• Enterprise Data Center
• High Availability System Recovery Analysis
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
Enterprise-Class Availability
Campus Systems Approach to High Availability
• Network-level redundancy
Next-Generation Apps
Video Conf., Unified Messaging,
• Enhanced management Global Outsourcing,
E-Business, Wireless Ubiquity
• Human ear notices the difference in voice within
150–200 msec
Mission Critical Apps.
• 10 consecutive G711 packet loss Databases, Order-Entry,
CRM, ERP
• Video loss is even more noticeable
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Cisco HA Evolution
No Redundancy
Redundancy with RPR Redundancy with RPR+
No Redundant Units
Adding Redundant Units Redundancy with SSO
Failure on Supervisor Adding Redundant Units
Outage:
causes reload Failure on Active Sup Adding Redundant Units
Failure on Active Sup
10’s of causes Switchover
causes Switchover
Line Cards reload Failure on Active Sup
minutes
on failure Standby Unit is in causes Switchover
Outage:state
STANDBY_COLD
Standby Unit is in
STANDBY_WARM state Standby Unit is in
Several
Line Cards reload after STANDBY_HOT state
minutes
switchover
Line Cards reload after
Outage:
switchover Line Cards Stay up after
Startup Configuration Several
Startup Configuration switchover
Synchronized to Peer
Seconds
Synchronized to Peer Outage:
Startup Configuration
Running Configuration Order
Synchronized of
to Peer
Synchronized to Peer and Running Milliseconds
Configuration
applied after switchover Synchronized to Peer and
applied.
and/or its affiliates. All rights reserved. Cisco Public
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
Defining Levels of Availability*
Application
Custom Application Scripts,
HTML, TCL, Python, many others Presentation
Session
Transport
ICMP Ping, IP Traceroute,
Bidirectional Forwarding
Detection, IP SLA Network
Total Service
Notification Time
Downtime
• Automation –
Diagnosis Time
• Trouble ticketing
• Technology/database Dispatch Time
Repair Time
• Redundant network design and resiliency features
Up Time
• Required for very high availability
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
What to Automate?
Device Monitoring
Configuration
Provisioning
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
Main Operational Challenges
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Agenda
• Designing High Availability Networks for the Enterprise
• System Hardware and Software Resiliency
• Foundations of the Structured Network Design
• High Availability Architectures
• High Availability System Recovery Analysis
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Agenda
• Designing High Availability Networks for the Enterprise
• System Hardware and Software Resiliency
• Availability Modeling
• Stateful Switchover, Non-Stop Forwarding, and Non-Stop Routing
• Stackwise480 and Stackwise
• In Service Software Upgrades
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
Why Use System and Network Availability
Modeling?
• Planning and Engineering
• Architecture validation
• Design tradeoff analysis/decisions
• Request for Proposal (RFP)
• Service Level Agreement (SLA)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Predicted Availability Ratings Are Not Guarantees
• Predicted Availability ratings are not
guarantees of network availability.
• Ratings are based on Industry standard
methodologies and statistical analysis
• Useful in making design decisions
and comparing different options.
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
Predicted Availability Rating
Function of Mean Time Between Failure and Mean Time to Repair
Availability Equation
Increase MTBF
Decrease MTTR
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Predicted Availability Equation (Basic)
Availability Equation
MTBF
Availability
MTBF MTTR
74,116 hrs.
0.999676
2hrs 50.3 min. per year 74,116 hrs 24 hrs. (No Spare)
74,116 hrs
0.999946
28 min. per year 74,116 hrs 4 hrs. (Spare Available)
74,116 hrs
0.999999
.526 min. per year 74,116 hrs .00833 (sub-second)
(Redundancy!!!)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
The Redundancy Effect
Single Points of Failure
Availability = 99.998%
Downtime = ~10 min/yr
99.999% 99.999%
~5 min/yr ~5 min/yr
Linecard Supervisor
Unit 1 Unit 2
Blocks in Series
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
The Redundancy Effect
Single Points of Failure Redundant Components
Availability = 99.999999%
Downtime = ~0.0053 min/yr
Availability = 99.998%
Downtime = ~10 min/yr
Unit 1
99.999%
~5 min/yr
99.999% 99.999%
~5 min/yr ~5 min/yr Supervisor
Supervisor
Blocks in Series
Blocks in Parallel
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
Example of Predicted Availability Rating
(No Redundancy)
For Your
Reference
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Example of Predicted Availability Rating
(With Redundancy)
For Your
Reference
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Example of Predicted Availability Rating
(Catalyst 3850 No Redundancy)
For Your
Reference
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
Example of Predicted Availability Rating
(With Component Redundancy)
For Your
Reference
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
Example of Predicted Availability Rating For Your
Reference
(With Stackwise480 Redundancy, Single Attached)
Part MTBF MTT Switcho Combin Combine Annual
• Catalyst WS-C3850-48F (hours) R ver time ed d Downti
MTBF Availabilit me
y
Catalyst 241,050 4 hrs. -- 241,050 99.99834062 --
C3850-48F %
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Example of Predicted Availability Rating
(Catalyst 4507R+E Non Redundant)
For Your
Reference
Chassis X Power Supply X Line Card X Supervisor Module X SFP Uplink = System MTBF
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
Example of Predicted Availability Rating
(Catalyst 4507R+E With Redundancy )
For Your
Reference
WS-C4507R+E
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
Example of Predicted Availability Rating
(Catalyst 6800XL Non Redundant)
For Your
Reference
Chassis X Fan Tray X Power Supply X Line Card X Supervisor Module X SFP Uplink = System MTBF
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Example of Predicted Availability Rating
(Catalyst 6800XL With Redundancy)
For Your
Reference
Chassis X Combined Power Supply X Combined Line Card X Combined Supervisor Module X Combined SFP Uplink =
System MTBF
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
Choosing the Right Platform and Network Design
It is More Than Just Predicted Availability Ratings
• Design to business requirements
• Use Predicted Availability ratings as part of your overall design considerations
• Common factors that dictate platform selection:
• Backplane throughput and performance
• Interface types and port densities
• Scalability for future growth/ investment protection
• Software upgrade procedures
• Software feature support
• Simplicity / Ease of Use
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
Agenda
• Designing High Availability Networks for the Enterprise
• System Hardware and Software Resiliency
• Availability Modeling
• Stateful Switchover, Non-Stop Forwarding, and Non-Stop Routing
• Stackwise480 and Stackwise
• In Service Software Upgrades
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
Control Plane and Data Plane
Control Plane
CPU, Software , Memory
EIGRP OSPF BGP SNMP
FIB
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
Control Plane and Data Plane
Control Plane
CPU, Software , Memory
EIGRP OSPF BGP SNMP
FIB
A B
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
Control Plane and Data Plane
Control Plane
CPU, Software , Memory
EIGRP OSPF BGP SNMP
FIB
SRC A DST B
A B
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
Control Plane and Data Plane
For Your
Reference
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 45
Stateful Switchover (SSO)
For Your
Reference
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 46
Redundant Supervisors – IOS Active Supervisor
Active – Standby Model Control Plane
• Console access
Control Plane
• Manages Configurations
Data Plane • Manages Chassis
Environmentals
Active Supervisor
• L2 – L3 Protocols
Data Plane
• Hardware-based switching
CF RF Standby Supervisor
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
Stateful Switchover Mode – IOS
SSO-Aware and SSO-Compliant IOS Applications
Cisco IOS
SSO-Compliant Applications SSO-Aware Applications
Redundancy
Facility Forwarding Information Base
Routing Protocols
IEEE 802.1x
NetFlow Checkpointing PAgP / LACP
Cisco Discovery Protocol Facility …and more
…and more
Active Supervisor
Cisco IOS
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
SSO Compliant Redundancy Clients
IOS Partial List Example
Router# show redundancy clients
clientID = 0 clientSeq = 0 RF_INTERNAL_MSG
clientID = 1319 clientSeq = 1 Cat6k Platform Swove
clientID = 5030 clientSeq = 2 Redundancy Mode RF
EEM Server RF CLIENT Frame Relay IPROUTING NSF RF • Cat6k Inline Power
CTS HA
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
SSO by itself Does Not
Provide Redundancy for
the Routing Protocols
Graceful Restart, Non-Stop Forwarding and
Non-Stop Routing
• Non-Stop Forwarding was developed by Cisco to maintain traffic forwarding by a router
experiencing a control plane switchover event. The router will essentially synchronize its
Forwarding Information Base between an Active and Standby Route Processor as well as signal
to its routing neighbors to continue forwarding traffic while routing topology information is
exchanged
• The IETF developed standards based implementations similar to Cisco NSF
• The IETF implementations use different terminology including the terms “Graceful Restart” to
describe the signaling used between the routers
• Graceful Restart(GR) and Non-Stop Forwarding (NSF) are terms often used interchangeably
• Graceful Restart/Non-Stop Forwarding as well as Non-Stop Routing (NSR) all allow for the
forwarding of data packets to continue along known routes while the routing protocol information
is being restored (in the case of Graceful Restart) or refreshed (in the case of Non Stop Routing)
following a processor switchover.
• Each routing protocol has its own unique implementation and signaling mechanisms
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Routing Protocol Redundancy With NSF
Active Supervisor Engine Slot 1 Standby Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table EIGRP RIB OSPF RIB ARP Table
Prefix Next Hop Prefix Next Hop IP MAC Prefix Next Hop Prefix Next Hop IP MAC
FIB Table
SSO FIB Table
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 53
Routing Protocol Redundancy With NSF
Active Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table
- - - - - -
- - - - - -
- - - - - -
FIB Table
[Link] aabbcc:ddee32
[Link] adbb32:d34e43
[Link] aa25cc:ddeee8
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Routing Protocol Redundancy With NSF
Active Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table
- - - - - -
- - - - - -
- - - - - -
FIB Table
[Link] aabbcc:ddee32
[Link] adbb32:d34e43
[Link] aa25cc:ddeee8
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Routing Protocol Redundancy With NSF
Active Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table
- - - - - -
- - - - - -
- - - - - -
FIB Table
[Link] aabbcc:ddee32
[Link] adbb32:d34e43
[Link] aa25cc:ddeee8
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Routing Protocol Redundancy With NSF
Active Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table
[Link] [Link] - - - -
[Link] [Link] - - - -
[Link] [Link] - - - -
FIB Table
[Link] aabbcc:ddee32
[Link] adbb32:d34e43
[Link] aa25cc:ddeee8
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Routing Protocol Redundancy With NSF
Active Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table
FIB Table
[Link] aabbcc:ddee32
[Link] adbb32:d34e43
[Link] aa25cc:ddeee8
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Routing Protocol Redundancy With NSF
Active Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table
FIB Table
[Link] aabbcc:ddee32
[Link] adbb32:d34e43
[Link] aa25cc:ddeee8
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Non Stop Forwarding Router Roles
• Non-Stop Forwarding, NSF, allows a
router to continue forwarding data along NSF Aware
routes that are already known, while the
routing protocol information is being
restored
• NSF Aware router or
NSF Helper router*
• A router running NSF-compatible
software, capable of assisting a NSF Aware
neighbor router perform an NSF restart NSF Capable
Device with
• NSF Capable router Redundant
Supervisors
• A router configured to perform
an NSF restart, therefore able to rebuild
routing information from neighbor
NSF-aware or NSF capable router
* NSF Helper - This term is used in IETF terminology
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
NSF/SSO Switchover Operation – IOS
1
Active Supervisor Fails Active Supervisor
Newly Active Supervisor
RP
RP CPU
Control Plane
CPU 5
OSPF EIGRP IS-IS BGP
Control
Path 9
Routing Information Base ARP Table
2 10 6
4
Cisco IOS CEF Tables Global Epoch = 1
FIB Table Adjacency Table
Prefix Next Hop Interface Epoch Next Hop MAC Epoch
10.2 [Link] Vlan 10 01 [Link] AA-BB-.. 01
Data Plane
3 12
Hardware 3
FIB Adjacency
Table Table
Forwarding Path
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
Non-Stop Forwarding
OSPF Implementation Example
NSF Capable NSF Aware NSF Capable NSF Aware
Restart
Graceful-
Announce
Fast Hello LS Update
(2 Sec Interval Fast Hello (Grace LSA) LS ACK
Fast Hello
RS Bit Set) (2 Sec Interval (Grace LSA)
RS Bit Clear)
Fast Hello
Discovery
OSPF
(2 Sec Interval Fast Hello Hello [Link]
RS Bit Set) (2 Sec Interval [Link] Hello
RS Bit Clear)
Database Database
Description Database Description Database
Description
Database Exchange
Description
Out-of-Band Sync
LSA LSA LSA LSA
Requests/ Requests/ Request Requests
Update Update s/Update /Update
Hello Hello
(RS Bit Clear) Hello Hello
(RS Bit Clear)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 63
NSF Configuration - IOS
Capable Vs Helper Configuration
• Configuration is required to enable “NSF Capable”
• Configuration is NOT required to enable “NSF Helper” with default settings
• Helper supports both types on the device
router eigrp 1
nsf
!
router ospf 1
nsf ietf
!
router isis 1
nsf cisco
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 64
NSF Interoperability
Interoperability between different Cisco devices
• The Graceful Restart extensions used in NX-OS are based on the IETF
RFCs except for EIGRP, which is Cisco proprietary and can interoperate
with Cisco NSF.
• This implies that for OSPFv2, OSPFv3, and BGP the GR extension are
compatible with versions of IOS that use the RFC based extensions
router ospf 1 router ospf 1
graceful-restart graceful-restart
✔ router ospf 1
nsf ietf
Si Si
router ospf 1
nsf cisco
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 66
Non-Stop Routing (NSR)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 67
Routing Protocol Redundancy With NSR
Active Supervisor Engine Slot 1 Standby Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table EIGRP RIB OSPF RIB ARP Table
Prefix Next Hop Prefix Next Hop IP MAC Prefix Next Hop Prefix Next Hop IP MAC
[Link] [Link] 192.168.0 [Link] [Link] aabbcc:ddee32 [Link] [Link] 192.168.0 [Link] [Link] aabbcc:ddee32
[Link] [Link] [Link] [Link] [Link] adbb32:d34e43 [Link] [Link] [Link] [Link] [Link] adbb32:d34e43
[Link] [Link] [Link] [Link] [Link] aa25cc:ddeee8 [Link] [Link] [Link] [Link] [Link] aa25cc:ddeee8
FIB Table
SSO FIB Table
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 68
Routing Protocol Redundancy With NSR
Active Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table
FIB Table
[Link] aabbcc:ddee32
[Link] adbb32:d34e43
[Link] aa25cc:ddeee8
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 69
NSR Deployment Scenario
Case Study: MPLS VPN Provider Edge
together P
CE
CE
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 70
NSR Configuration - IOS
• Configuration is required to enable NSR
router eigrp 1
nsr
!
router ospf 1
nsr
!
router isis 1
nsr
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 71
Comparing NSF and NSR
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 72
High Availability At
Different layers
Standalone Chassis Redundant Core
Redundant Supervisors Yes or No ? Catalyst 6500
• Redundant topologies with equal cost multi-
paths (ECMP) provide sub-second
convergence
?
Si Si
• NSF/SSO provides superior availability in
environments with non-redundant paths
RP Convergence
Seconds of Lost Voice
Is Dependent Si Si
on IGP and Tuning
Si
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 75
Design Considerations for NSF/SSO
Where Does It Make Sense?
• Access switch is the single point of failure
in best practices HA design
• Supervisor failure is most common Si Si
cause of access switch service outages
• Recommended design with NSF/SSO provides for
sub 600 msec recovery of voice and data traffic
Seconds of Lost Voice
Si Si
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 76
Agenda
• Designing High Availability Networks for the Enterprise
• System Hardware and Software Resiliency
• Availability Modeling
• Stateful Switchover, Non-Stop Forwarding, and Non-Stop Routing
• Stackwise480 and Stackwise
• In Service Software Upgrades
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 77
Catalyst 9300 Series
Cisco Stackwise-480
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 78
Stacking Cable – Close-up
Stacking
Cable
Cable Lengths
• 0.5m
• 1m
• 3m
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 79
Understanding the Stack Ring
ASIC Stack Interface
• 6 rings in total
• 3 rings go East Is math really an
• 3 rings go West opinion?
Stack Interface
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 80
Understanding Spatial Reuse
Doubling the capacity of my stack
Assuming 4
3
1
2
4 x 24-port
9300 Switches Destination
Stripping
Packet travels
½ the rings.
Taken out of
stack by
destination
3
1
2
4
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Stack Ring Healing
Detection is by hardware
Example Software is notified
shows: immediately
4 x 24-port Ring Wrap initiated
Cat9K immediately (1-2ms)
Switches
For Recovery –
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 82
IOS XE Software Internals Overview
Infra Domain
LC Domain
Service
Location RP Domain
Interface HA
Wireless Controller
Manager Consolidated
Logging
Availability Framework
IOSd RP
Forwarding &
Feature Mgr (FFM)
Stack Manager (3K)
Internal IPC Licensing
Services
Features PD Comet
External
Libraries/
Utilities Services
Platform UADP ASIC Transports
Drivers Drivers
(TCP/SCTP/UDP) Services
Platform
Low Level APIs Manager
System
Forwarding Engine Driver Packet Delivery Service
Manager
Kernel
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
UADP UADP
provides an
Designed for Flexibility unparalleled degree
of Flexibility
in an Access Switch
Excellent for
encapsulations, which
often need recirculation
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 84
VXLAN as a protocol had not even been invented
when the UADP ASIC was designed …
Source MAC 48
14 Bytes
Outer MAC Header VLAN Type
Underlay
16
0x8100
VLAN ID 16
(4 Bytes Optional)
in
IP Header
72
Misc. Data
Outer IP Header Ether Type
16
0x0800
Protocol 0x11 (UDP) 8
Header
UDP Header Checksum
16 20 Bytes
Dest. IP
32
32
Src RLOC IP Address
of 256 Bytes
Inner (Original) MAC Header VXLAN Port 16
16
8 Bytes Up to 250 frames across
Hash of inner L2/L3/L4 headers of original frame.
Enables entropy for ECMP load balancing.
UDP Length
Inner (Original) IP Header
Checksum 0x0000 16
stages at one time…
UDP 4789
VXLAN Flags RRRRIRRR 8
Allows 64K
possible SGTs
Overlay
• Switches boot.
Discovery exits
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 86
Stack Active Election
Rules of Election
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 87
Define Stack Roles
minimal Downtime
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 88
Catalyst 9K Stack similarity to Catalyst 6500
A A S
S
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 89
Show switch with SSO
Stack Mac follows
Active initially
Switch# show switch
Switch/Stack Mac Address : 2037.06cf.0e80
H/W Current
Switch# Role Mac Address Priority Version State
------------------------------------------------------------ Active
*1 Active 2037.06cf.0e80 10 V01 Ready
2 Standby 2037.06cf.3380 8 V00 Ready
3 Member 2037.06cf.1400 6 V00 Ready Standby
4 Member 2037.06cf.3000 4 V00 Ready
Member
* Indicates which member is providing the “stack Identity” (aka “stack MAC”)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 90
Show switch detail output
Switch# show switch detail
Switch/Stack Mac Address : 2037.06cf.0e80
H/W Current
Switch# Role Mac Address Priority Version State
------------------------------------------------------------
*1 Active 2037.06cf.0e80 10 V01 Ready
2 Standby 2037.06cf.3380 8 V00 Ready
3 Member 2037.06cf.1400 6 V00 Ready
4 Member 2037.06cf.3000 4 V00 Ready
Stack Port
Stack Port Status Neighbors
Switch# Port 1 Port 2 Port 1 Port 2 Information
--------------------------------------------------------
1 OK OK 2 4
2 OK OK 3 1
3 OK OK 4 2
4 OK OK 1 3
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 91
Catalyst 9000 – HA State Machine
• Active starts RP Domain locally 2min timer
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 92
Show redundancy states
Switch# show redundancy states
my state = 13 –ACTIVE Terminal state for Active Unit.
peer state = 8 -STANDBY HOT
Mode = Duplex
Unit ID = 1
Terminal state for Standby Unit
for SSO.
Redundancy Mode (Operational) = SSO
Redundancy Mode (Configured) = SSO
Redundancy State = SSO
Manual Swact = enabled Slot Number of Active Unit
Communications = Up
client count = 76
client_notification_TMR = 360000 milliseconds Communication Channel
keep_alive TMR = 9000 milliseconds Status between the
keep_alive count = 0 Active/Standby RP units
keep_alive threshold = 9
RF debug mask = 0
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 93
Show Redundancy Command Output…
Switch#sh redundancy
Redundant System Information :
------------------------------
Available system uptime = 29 weeks, 2 days, 11 hours, 47 minutes
Switchovers system experienced = 2
Standby failures = 0
Last switchover reason = user_forced
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 94
StackWise Virtual Architecture
Extending StackWise Architecture
Dist-1
Does it look familiar?
SW-1 SW-2
VSS
40G/10G
Cat 9k Cat 9k
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 96
StackWise Virtual Architecture
Resilient Software Design
Dist-1
SW-1 SW-2
40G/10G
Cat 9K Cat 9k
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 97
StackWise Virtual Architecture
Simplified. Scalable.
Core Core
Dist-1
SW-1 SW-2
Distribution
Cat 9k Cat 9k 40G/10G Cat (k
Access
• Cisco StackWise Virtual supports Unified control and management plane architecture
This operation
Family
requires a reload of the system. Do you want to proceed? [y/n]y
2 install_activate: Reloading the box to complete activation of the SMU...
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
SMU Deployment Experience with Cisco DNA Center
• Download SMU to APIC- Cisco DNA Center App
EM file server
• Analyze SMU impact
• Test SMU on Pilot setup Network ReadMe
Admin
• Schedule SMU SMU
SMU APIC EM SMU
Server
deployment File Server
[Link]
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 101
Stackable Best
Practices
Stacking Convergence Not a recommended design
Multi-Layer Access
vIP: [Link]
vMAC: 0000.0c07.ac00
Summary
Subnets
• Active unit with uplink failure D1 D2
introduces two failures Distribution HSRP
ACTIVE
HSRP
STANDBY
Si Si
• Active control plane
• Uplink interface
L2
• When the Active fails,
the Standby will take over. Active Standby
Access S1 S2 S3
• Upstream, HSRP / GLBP Single Logical Switch
will detect link down, and
D2 will start answering to the
virtual MAC 0000.0c07.ac00
• Downstream traffic is IP:
MAC:
[Link]
[Link].aa01
IP:
MAC:
[Link]
[Link].aa03
re-routed to D2 via L3 link GW: [Link] GW: [Link]
ARP: 0000.0c07.ac00 ARP: 0000.0c07.ac00
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 103
Stacking Convergence vIP: [Link]
Multi-Layer Access
vMAC: 0000.0c07.ac00
Summary
Subnets
D1 D2
• Active unit Failure Distribution HSRP HSRP
(without uplink) ACTIVE
Si Si
STANDBY
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 104
Catalyst 9300 Stack Wise
Routed Access Summary
Subnets
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 105
Changing Stack Mac on Cat9K Switches
• By default the timer value is set to indefinite (0)
• System continues to keep
selected stack mac after
switchover
Catalyst9k#show switch
• Avoids Protocol flapping Switch/Stack Mac Address : 2037.06cf.0e80
Catalyst9k#show switch
Mac persistency wait time: Indefinite
Switch/Stack Mac Address : 2037.06cf.0e80
2037.06cf.3380
Mac persistency wait time: Indefinite H/W Current
• How to change it Switch# Role Mac Address Priority Version State
H/W Current
------------------------------------------------------------
• A new command introduced Switch#
*1 Role
Active Mac Address Priority
2037.06cf.0e80 10 Version
V01 StateReady
switch#stack-mac update force ------------------------------------------------------------
2 Standby 2037.06cf.3380 8 V00 Ready
*1 3 Member
Member 0000.0000.0000
2037.06cf.1400 10 6 V01V00 Removed
Ready
2 4 Active
Member 2037.06cf.3380
2037.06cf.3000 8 4 V00V00 Ready
Ready
3 Member 2037.06cf.1400 6 V00 Ready
4 Member 2037.06cf.3000 4 V00 Ready
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 106
Key Recommendations for Stacking
• Run the stack in full ring mode to get full bandwidth
• Configure the Active switch priority and Standby switch priority
• Predetermine which switch is the Active and Standby which will become the Active
should the Active fail
• Simplifies operations
• Configure Active and Standby unit without uplinks if possible
• If deploying a stack of 4 or more switches keep the Active and Standby switches
without uplinks, this will simplify the convergence and reduce the outage time
• Do Not change the stack-mac timer value
• By default the value is 0 (indefinite)
• Avoids protocol flapping
• There is a command to change the stack-mac when needed
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 107
Agenda
• Designing High Availability Networks for the Enterprise
• System Hardware and Software Resiliency
• Availability Modeling
• Stateful Switchover, Non-Stop Forwarding, and Non-Stop Routing
• Stackwise480 and Stackwise
• In Service Software Upgrades
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 108
ISSU Overview
• ISSU provides a mechanism
to perform software upgrades
and downgrades without taking
the switch out of service
• Leverages the capabilities of NSF Active Sup
and SSO to allow the switch to SSO
forward traffic during Supervisor Standby Sup
Line Card
IOS upgrade (or downgrade)
Line Card
• Key technology is the
ISSU Infrastructure
• Allows SSO between different
versions Catalyst 9400
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 109
In Service Software Upgrades
Streamlined Process for Software Upgrades/Downgrades
ISSU ISSU
Loadversion Acceptversion
(Optional)
1 2 3 4
ISSU ISSU
Runversion Commitversion
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 111
Stateful Switchover Mode – IOS
ISSU Client and Versioning Infrastructure
ISSU Versioning
Cisco IOS Version 1 ISSU Clients
HA-Compliant Applications HA-Aware Applications
Routing Protocols Redundancy
Forwarding Information Base
NetFlow Facility
Port Manager
Cisco Discovery Protocol PAgP / LACP
Checkpointing
…and more Facility …and more
Active Supervisor
Agree V1
If Compatible, then
Compatible N V1 Message Exchange V1, V2,V3
Compatible N
Y
Y Can Proceed
Message Message Message
Message Transformation
Exchange Transformation Exchange
MSG V1 MSG V3
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 113
ISSU Dual Supervisor –
Catalyst 9400
ISSU Process
Dual Supervisors
Start ISSU • ISSU Process leverages SSO/NSF
Architecture
Active Supervisor
SSO
Standby Supervisor
Line Card
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 115
C9K ISSU
Dual Supervisor ISSU
3 Step Process
• Install add file <tftp/ftp/flash/disk:*.bin>
Granular Control on
the upgrade process
• Install activate ISSU
with ability to rollback
• Install commit
1 Step Process
• Install add file <tftp/ftp/flash/disk:*.bin> activate ISSU commit Single Command
to perform
complete ISSU
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 116
C9K ISSU Workflow
Dual Supervisor ISSU
1. ISSU Started, Image is
expanded on Active and Standby
V1 S1 Active
If S2 fails to become standby it
will revert back to step 1
2. Standby Reloads
with the new V2 Image
5. ISSU V2 S1 Standby
V1 S1 Active
Expired Abort timer will revert
Complete to Step 2 and then Step 1
V2 S2 Active V1 V2 S2 Standby
Abort Timer
Expired
Abort Timer
Stopped
V1 V2 S1 Standby
3. Auto-Switchover causes S2 to
4. ‘Commit’ Keyword become new active and S1 reloads
stops the abort timer
V2 S2 Active
with the new V2 image
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Stackwise Virtual - ISSU
C9K ISSU
Stackwise Virtual ISSU and Dual Supervisor ISSU
3 Step Process
• Install add file <tftp/ftp/flash/disk:*.bin>
Granular Control on
the upgrade process
• Install activate ISSU
with ability to rollback
• Install commit
1 Step Process
• Install add file <tftp/ftp/flash/disk:*.bin> activate ISSU commit Single Command
to perform
complete ISSU
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 119
Stackwise Virtual ISSU
ISSU Process
Install ISSU
Dual-Active Detection Link
Catalyst 9500-24Q Catalyst 9500-24Q
Auto-Switchover 1st Sub-second
2nd Sub-second 16.8.1
16.8.2 16.8.1
16.8.2 traffic convergence
traffic convergence
Stackwise-Virtual Link
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 120
Enhanced Fast Software
Upgrade – Catalyst
9000
Achieving High Availability on Catalyst 9300
Enhanced Fast Software Upgrade
• eFSU provides a mechanism to upgrade
and downgrade the software image by
Control-Plane
RIB
[Link]
[Link]
[Link]
[Link] aabbcc:ddee32
[Link] adbb32:d34e43
[Link] aa25cc:ddeee8
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 122
Enhanced Fast Software Upgrade
Regular Upgrade Vs Enhanced Fast Software Upgrade Process
16.10.1*
< 30 seconds of
traffic impact
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 123
Enhanced Fast Software Upgrade
CLI commands
• One step command which activates the fast software upgrade and
commits it
9300# install add file flash:cat9k_iosxe.BLD_V1610 activate
reloadfast enhanced commit
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 124
Enhanced Fast Software
Upgrade – VSS system
VSS Software Upgrade on Catalyst 6500
Enhanced Fast Software Upgrade (EFSU)
1. Before ISSU software upgrade, VSS Switch-1 and
Preparation Steps
Switch-1 Switch-2
Switch-2 will be running the old software image.
2. Install the new image to the same location on the file
systems of both Supervisors
3. Make sure the boot register is configured for auto boot
0x2102
= Old Version
VSS Active
STANDBY COLD
VSS Standby Hot
WS-X6708-10G
R
= New Version
WS-X6708-10G
Si Si
1. ISSU Loadversion VSL
Execute Upgrade
50%
SW2
SO = Switchover
1 2 3 4
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 126
VSS Software Upgrade on Catalyst 6500
Enhanced Fast Software Upgrade (EFSU)
1. Before ISSU software upgrade, VSS Switch-1 and
Preparation Steps
Switch-1 Switch-2
Switch-2 will be running the old software image.
2. Install the new image to the same location on the file
systems of both Supervisors
3. Make sure the boot register is configured for auto boot SO
0x2102 R
STANDBY COLD
VSS Standby Hot
VSS Active VSSStandby
VSS Active Hot
= Old Version
WS-X6708-10G
= New Version
WS-X6708-10G
Si Si
1. ISSU Loadversion VSL
Execute Upgrade
2. ISSU Runversion
VSS Standby HOT
R = Reload 100%
3. ISSU Acceptversion
(Optional)
50%
SW2 SW1
SO
= Switchover
1 2 3 4
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 127
VSS Software Upgrade on Catalyst 6500
Enhanced Fast Software Upgrade (EFSU)
1. Before ISSU software upgrade, VSS Switch-1 and
Preparation Steps
Switch-1 Switch-2
Switch-2 will be running the old software image.
2. Install the new image to the same location on the file
systems of both Supervisors
3. Make sure the boot register is configured for auto boot
0x2102 R
STANDBY COLD
VSS Standby Hot VSS Active
= Old Version
WS-X6708-10G
= New Version
WS-X6708-10G
Si Si
1. ISSU Loadversion VSL
Execute Upgrade
2. ISSU Runversion
VSS Standby HOT
R = Reload 100%
3. ISSU Acceptversion
(Optional)
50%
4. ISSU Commitversion SW2 SW1 SW1
SO
= Switchover
1 2 3 4
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 128
VSS Quad SUP SSO - Catalyst 6500
• In Chassis Standby SUP in each
Switch
• This will keep the unit up and ICA
SSO Act
ICA
SSO Stby
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 129
EFSU Quad Sup
Normal Quad Sup Upgrade Vs Staggered Quad Sup Upgrade
100% 100%
50% 50%
SW 2 SW 1 SW 1 SW 2 SW 1
1 2 3 4 1 2 3 4 5
1. ISSU Loadversion (2 Sup on Standby Chassis - ICS)
nd
1. ISSU Loadversion (Whole Standby Sw2 chassis reload) 2. ISSU Loadversion – Step 2
2. ISSU Runversion (whole active Sw1 chassis reload) (Switchover with the Standby Chassis, LCs reload)
4. ISSU Commitversion (whole Standby Sw1 chassis reload) 4. ISSU Commitversion (ICS on new Standby Chassis)
5. ISSU Commitversion – Step 2
(Reload on the new Standby Chassis LC)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 130
Cisco IOS ISSU Summary
• ISSU is a software upgrade /downgrade procedure
• Changes the risk assessment criteria
• Minimizes the impact of upgrades/downgrades
• Allows for a trial period with automated rollback
• Less downtime
• Both software versions must be ISSU compatible
in order to achieve and SSO–based upgrade
• Software version compatibility includes
• 18 month rolling window between software releases of the same train
• Same license level required between versions
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 136
Graceful Insertion and
Removal - GIR
Graceful Insertion and Removal for Catalyst 9000
Isolation of Switch from network
One command!
Pre-change System Snapshot
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 138
Graceful Insertion and Removal for Catalyst 9000
Return Switch into network
Stop Maintenance
One command!
Pre-change System Snapshot
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 139
Graceful Insertion and Removal
Isolation of Switch from network
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 140
Graceful Insertion and Removal
Default and Customizable Templates
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 141
Graceful Insertion and Removal
Snapshots
• Automatic Snapshots
Switch#show system snapshots compare before_maintenance
• Snapshots are automatically after_maintenance
generated when entering and ================================================================================
Feature Tag .before_maintenance .after_maintenance
exiting maintenance mode ================================================================================
[interface]
troubleshooting
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 142
GIR Summary
• GIR used to isolate a switch
• Maintenance
• HW upgrade
• SW upgrade
• Works well in an L3 end to end network
• Order of Maintenance is
• EGP -> IGP (in parallel) -> L2 shutdown
• HSRP/VRRP can be leveraged without causing issue on switchover
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 143
Agenda
• Designing High Availability Networks for the Enterprise
• System Hardware and Software Resiliency
• Foundations of the Structured Network Design
• High Availability Architectures:
• Enterprise Wired LAN
• Enterprise Data Center
• Enterprise Wireless LAN
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 144
Dana Daum Maren Kostede
Communications Architect Technical Solutions Architect
Junmei Zhang
Technical Marketing Eng.
Samer Theodossy
Principal Engineer
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 146
Headquarters
WAAS
Access
Switches
Distribution WAAS
Switches Central Manager
Nexus
WAN Communications
Router Internet Edge Managers
s
Access
Switches Internet Cisco ACE
Routers Data Center
Regional Site Wireless Firewalls
LAN
Controller Nexus
Wireless LAN Data
Internet
Controllers
Center
RA-VPN Firewall
Access WAN Access
Switch Route Switch
r Guest Wireless
DMZ
LAN Controller
Remote Site Switch
Web
Security
Appliance DMZ
Servers
Email
Teleworker/
Mobile Worker Hardware and Security Core
Software VPN Appliance
Switches
WAN
Access Routers
Switch
Stack
MPLS WAN
Router
WANs s Distribution
Switches
User
WAAS Access
Remote
Site Layers
WAAS
WAN Remote Site
Aggregation Wireless LAN
Controller
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Hierarchical network design
High availability using modularity, hierarchy, and structure
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 148
Hierarchical network design
• Core
• Connectivity, availability and scalability
• Distribution
• Aggregation for wiring and traffic flows
• Policy and network control point (FHRP, L3 summarization)
• Access
• Physical – Ethernet wired 10/100/1000(802.3z)/mGig(802.3bz);
802.3af(PoE), 802.3at(PoE+), and Cisco Universal POE (UPOE)
• Policy enforcement – security: 802.1x, port security, DAI, IPSG, DHCP
snooping; identification: CDP/LLDP; QoS: policing, marking, queuing
• Traffic control – IGMP snooping, broadcast control
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 149
Hierarchical network design
Do I need a core layer?
• It is a question of operational complexity and a Do I need a core layer?
question of scale
• n x (n-1) scaling
• Routing peers
• Fiber, line cards and port counts ($,€,£)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 150
Hierarchical network design
Do I need a core layer?
• It is a question of operational complexity and a Do I need a core layer?
question of scale
• n x (n-1) scaling
• Routing peers
• Fiber, line cards and port counts ($,€,£)
• Capacity planning considerations
• Easier to track traffic flows from a block
to the common core than to ‘n’ other blocks
• Geographic factors may also influence the design
• Multi-building interconnections may have fiber
limitations
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 151
Structured campus network design
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 153
What we are trying to avoid!
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 154
Agenda
• Designing High Availability Networks for the Enterprise
• System Hardware and Software Resiliency
• Foundations of the Structured Network Design
• Modularity, Hierarchy, and Structure
• Leveraging Hardware-Based Path Restoration
• High Availability Architectures
• High Availability System Recovery Analysis
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 155
Optimizing network convergence
Failure detection and recovery
• Optimal high availability network design attempts to
leverage ‘local’ switch fault detection and recovery
• Design should leverage the hardware capabilities of
the switches to detect and recover traffic flows
based on these ‘local’ events
• Design principle –
Hardware failure detection and recovery is both
faster and more deterministic
• Design principle –
Software failure detection mechanisms provide a
secondary, not primary, fault detection and recovery
mechanism in the optimal design
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 156
Optimizing network convergence
Layer 1 link redundancy and failure detection
• Direct point to point fiber provides for fast failure detection
• IEEE 802.3z and 802.3ae link negotiation define the use of Remote Fault
Indicator & Link Fault Signaling mechanisms
• IOS debounce –
• GigE and 10GigE fiber ports is 10 msec
• Minimum for copper is 300 msec
• Design principle
Understand how hardware choices and tuning impact fault detection and
response to link failures
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 157
Optimizing network convergence
Layer 2 software fault detection (e.g. UDLD)
• While 802.3z and 802.3ae link negotiation provide for L1 fault detection,
hardware ASIC failures can still occur
• UDLD provides an L2 based keep-alive mechanism that confirms bi-directional
L2 connectivity
Tx Rx
• Each switch port configured for UDLD will send UDLD protocol packets (at L2)
containing the port’s own device / port ID, and the neighbor’s device / port IDs Rx Tx
seen by UDLD on that port
• If the port does not see its own device / port ID echoed in the incoming UDLD
packets, the link is considered unidirectional and is shutdown
• Design principle – UDLD Keepalive
Redundant fault detection mechanisms required (SW as a backup to HW as
possible)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 158
Optimizing network convergence
Layer 2 and 3 – Why use routed interfaces?
L3 routed interfaces allow faster convergence than L2 switchport with an associated L3 SVI
[Link].042 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet3/1, changed state to down
[Link].050 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet3/1, changed state to down
[Link].050 UTC: IP-EIGRP(Default-IP-Routing-Table:100): Callback: route_adjust GigabitEthernet3/1
[Link].813 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet2/1, changed state to down
[Link].821 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet2/1, changed state to down
[Link].069 UTC: %LINK-3-UPDOWN: Interface Vlan301, changed state to down
[Link].069 UTC: IP-EIGRP(Default-IP-Routing-Table:100): Callback: route, adjust Vlan301
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 159
Agenda
• Designing High Availability Networks for the Enterprise
• System Hardware and Software Resiliency
• High Availability Architectures:
• Enterprise Wired LAN
• Multilayer Campus Distribution and HA Considerations
• Simplified Distribution and HA Advantages
• Extending HA Advantages by Simplifying Virtualization
• Enterprise Data Center
• Enterprise Wireless LAN
• High Availability System Recovery Analysis
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 160
Optimizing the Layer 2 design – spanning tree
• At least some VLANs span multiple access switches • Each access switch has unique VLANs
• Layer 2 and 3 running over link between distribution • Layer 3 link between distribution
• More typical of a “classic” data center design • More typical of a campus LAN design
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 161
Optimizing the Layer 2 design
Non-STP-blocking topologies converge fastest
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 162
Optimizing the Layer 2 design
PVST+, Rapid PVST+, MST
• PVST+ (pre 802.1D-2004) - traditional spanning
tree
• Rapid-PVST+ (802.1w) greatly improves the
restoration times for any VLAN that requires a
topology convergence due to link UP
• Rapid-PVST+ also greatly improves convergence
time
over BackboneFast for any indirect link failures
• Rapid PVST+
• Scales to large size (up to 16,000 logical ports)
• Easy to implement, proven, scales
• MST (802.1s)
• Permits very large scale STP implementations
(up to 75,000 logical ports)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 163
Optimizing the Layer 2 design
Complex topologies take longer to converge
• Time to converge is dependent on the protocol
implemented – 802.1D, 802.1s, or 802.1w
• It is also dependent on –
• Size and shape of the L2 topology (how deep is the tree)
• Number of VLANs being trunked across each link
• Number of logical ports in the VLAN on each switch
• Non-congruent topologies take longer to converge.
Restricting the topology is necessary to reduce
convergence times
• Prune all unnecessary VLANs from trunk configuration
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 164
Optimizing the Layer 2 design
STP toolkit – PortFast and BPDU guard
• PortFast is configured on edge ports to allow them to quickly
move to forwarding bypassing listening and learning and
avoids TCN (Topology Change Notification) messages
• BPDU guard can prevent loops by moving PortFast
configured interfaces that receive BPDUs to errdisable state
• BPDU guard prevents ports configured with PortFast from
being incorrectly connected to another switch
• When enabled globally, BPDU guard applies to all interfaces
that are in an operational PortFast state
Switch(config-if)#spanning-tree portfast
Switch(config-if)#spanning-tree bpduguard enable
1w2d: %SPANTREE-2-BLOCK_BPDUGUARD: Received BPDU on port FastEthernet3/1 with BPDU Guard enabled. Disabling port.
1w2d: %PM-4-ERR_DISABLE: bpduguard error detected on Fa3/1, putting Fa3/1 in err-disable state
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 165
Optimizing the Layer 2 design
STP best practices for campus
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 166
Layer 2 access with Layer 3 distribution
First hop redundancy protocols (FHRP)
• HSRP, GLBP, and VRRP are used to provide a resilient
default gateway / first hop address to end stations
• A group of routers act as a single logical router providing
first hop router redundancy
• Protect against multiple failures
• Distribution switch failure
• Uplink failure
• Default recovery is ~10 Seconds
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 167
First Hop Redundancy
Sub-Second Timers Improve Convergence
interface Vlan4
ip address [Link] [Link]
standby 1 ip [Link]
standby 1 timers msec 250 msec 750
standby 1 priority 150
standby 1 preempt
standby 1 preempt delay minimum 180
interface Vlan4
ip address [Link] [Link]
glbp 1 ip [Link]
glbp 1 timers msec 250 msec 750
glbp 1 priority 150
glbp 1 preempt
glbp 1 preempt delay minimum 180
interface Vlan4
ip address [Link] [Link]
vrrp 1 description Master VRRP
vrrp 1 ip [Link]
vrrp 1 timers advertise msec 250
vrrp 1 preempt delay minimum 180
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
HSRP preemption—why it is desirable
• Spanning tree root and HSRP
primary aligned
• When spanning tree root is re-
introduced, traffic will take a two-
hop path to HSRP active
• HSRP preemption will allow HSRP
to follow the spanning tree
topology
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 169
FHRP design considerations
Preempt delay needs to be longer than boot time
• HSRP is not always aware of the status of
the entire switch and network
• Ensure that you provide enough time for the
entire system to be up – diagnostics (full or
partial), L1 (line cards), L2 (STP),
L3 (IGP convergence)
• Tune delay and preempt delay conservatively
as the network is already forwarding data
interface Vlan402
. . .
standby delay minimum 60 reload 600
standby 1 ip [Link]
standby 1 timers msec 250 msec 750
standby 1 priority 110 ‘standby delay’ Controls How Long Before the Interface
standby 1 preempt delay minimum 60 reload 600
standby 1 authentication ese Needs to Be Up Before HSRP Starts and ‘preempt delay’
standby 1 name HSRP-Voice
hold-queue 2048 in Controls How Long to Wait After HSRP Establishes a
Neighbour Relationship.
You Should Configure Both.
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 170
Sub-second timer considerations
HSRP, GLBP, OSPF, PIM
• Evaluate your network before implementing any sub-second timers
• Certain events can impact the ability of the switch to process sub-
second timers
• Application of large ACL
• OIR of line cards in Catalyst 6500/6800
• The volume of control plane traffic can also impact the ability to process
• 250 / 750 msec GLBP & HSRP timers are only valid in designs with less
than 150 VLAN instances (Catalyst 6x00 in the distribution)
• Spanning Tree size
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 171
FHRP design considerations—
asymmetric routing (unicast flooding)
• Alternating HSRP Active between distribution
switches can be used for upstream load balancing
• This can cause a problem with unicast flooding
• ARP timer defaults to four hours and CAM timer
defaults to five minutes
• ARP entry is valid, but no matching L2 CAM table
exists
• In many cases when the HSRP standby needs to
forward a frame, it will have to unicast flood the
frame since its CAM table is empty
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 172
FHRP design considerations—
asymmetric routing (unicast flooding) solutions
• Using ‘V’ based design with unique voice and data VLANs
per access switch, this problem has no user impact
• Don’t deploy stacking switches (ie. daisy-chained switches)
that depend on spanning tree for managing interconnects in
the stack
• Tune ARP timer to 270 seconds and leave CAM timer to
default, unless ARP > 10,000, change CAM timers
• Deploy MultiChassis EtherChannel with Virtual Switching
System (VSS or vPC) in the distribution block
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 173
Even with faster convergence from RPVST+
we still have to wait for FHRP convergence
FHRP Active FHRP Standby
• FHRP protocol based forwarding topologies
• Load balancing based on Per-Port or Per-VLAN
50
• Convergence is dependent on multiple factors – 50
• Load balancing – 20
• Asymmetric forwarding 10
9.1
Layer 2
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 177
Simplification with routed access design
After: Layer 3 distribution with Layer 3 access
IGP IGP
Layer 2
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 178
Routed access advantages
Simplified control plane
Simplified Control Plane
• No STP feature placement (root bridge,
loopguard, …)
• No default gateway redundancy setup/tuning
(HSRP, VRRP, GLBP ...)
• No matching of STP/HSRP priority
• No asymmetric flooding
• No L2/L3 multicast topology inconsistencies
• No Trunking Configuration Required
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 179
Routed access advantages
Simplified network recovery
• Routed access network recovery is
dependent on L3 re-route
• Time to restore upstream traffic flows
is based on ECMP re-route
• Time to detect link failure
• Process the removal of the lost routes
from the SW RIB
• Update the HW FIB
• Time to restore downstream flows is
based on a routing protocol re-route
• Time to detect link failure
• Time to determine new route
• Process the update for the SW RIB
• Update the HW FIB
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 180
Routed access advantages
Faster convergence times
• RPVST+ convergence times
dependent on FHRP tuning
• Proper design and tuning can
achieve sub-second times
• EIGRP converges <200 msec 2
• OSPF converges <200 msec
1.8
1.6
with LSA and SPF tuning
1.4 Upstream
1.2
1
0.8
0.6
0.4
0.2
0
RPVST+ OSPF EIGRP
FHRP
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 181
Routed access advantages
A single router per subnet: simplified multicast
Layer 2 access has two multicast routers per access subnet, RPF checks
and split roles between routers
Routed access has a single multicast router which simplifies multicast
topology and avoids RPF check altogether
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 182
Routed access advantages
Ease of troubleshooting
• Failure differences
• Routed topologies fail closed—i.e.
neighbor loss
• Layer 2 topologies fail open—i.e. switch#sh ip cef [Link]
broadcast and unknowns flooded [Link]/24
nexthop [Link] TenGigabitEthernet9/4
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 183
Why isn’t routed access deployed everywhere?
Routed access design constraints
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 184
Campus wired LAN design
Option 2: Layer 3 routed access (BRKCRS-3036)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 186
Traditional multilayer campus design
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 187
Simplified end-to-end VSS design
Data Center
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 188
Comparison – standalone (multilayer) versus VSS
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 189
Unified system architecture
• •
•
•
•
•
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 190
Catalyst VSS setup
LAN distribution layer
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 191
Catalyst VSS setup
LAN distribution layer
• The switch now renumbers from y/z to x/y/z • The switch now renumbers from y/z to x/y/z
• When process is complete, save configuration when • When process is complete, save configuration when
prompted, switch reloads and forms VSS. prompted, switch reloads and forms VSS.
Configured Router mac address is different from operational value. Change will take effect
after config is saved and the entire Virtual Switching System (Active and Standby) is reloaded.
Prerequisites:
• Switches running same software with feature support (C4K:3.6E, C6K:15.2(1)SY1)
• Links to be used for VSLs up with CDP communication
1) C6K - Enable Easy VSS feature, convert, and reload VSS
VSS-Sw1# switch virtual easy VSS-Sw1 VSS-Sw2
VSS-Sw1# switch convert mode easy links ? VSL
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 193
VSS dual supervisor inter-chassis redundancy
• VSS dual supervisor (single sup per chassis) supports inter-
chassis SSO redundancy.
• Single in-chassis supervisor - SSO Active or Standby role. Reduced
NSF Recovery
Capacity
Reduced
virtual-switches
VSL
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 194
Catalyst quad-supervisor NSF/SSO redundancy
Inter-Chassis Sup
Redundancy
and capacity with dual redundancy domain – 6500-VS4O#show switch virtual redundancy
Switch|Mode|Current|Fabric
| inc
• Payload overhead
VSL Uptime : 1 day, 1 hour, 17 minutes
VSL Control Link : Te2/3/1
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 196
6500E/6800/4500E VSS dual sup – VSL design
Two Cisco recommended designs
Profile 2 – Diversified VSL between
Profile 1 – Two VSL links on Supervisor Supervisor and VSL capable Linecard
VSL
VSL
• Cost-effective solution to leverage both uplinks. Continue • Redundant and diversified fibers between supervisor and
to use non-VSL capable linecard for 10G core connection. next-gen VSL capable linecards.
• Redundant fibers connects thru common fabric and ASICs, • Same design as Profile 1 but increases system reliability as
this could result vulnerability in system stability. each VSL port are diversified across different fabric/ASICs.
• Optimal and preset VSL parameters – Load-Balancing, • Optimal and preset VSL parameters – Load-Balancing, QoS,
QoS, HA, Traffic-engg, Dual-Active etc.. HA, Traffic-engg, Dual-Active etc..
• Restricted to bundle 2 x VSL ports or 20G switching • Flexible to scale up to 8 x VSL for high-dense system to
capacity on per virtual-switch node basis. aggregate uplink, service modules, single-home etc..
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 197
6500E/6800 VSS quad-supervisor VSL design
RPR-WARM
Sup2T/6T quad-supervisor NSF/SSO VSL redundancy
Sup-1 Sup-2
Sup-3 Sup-4
Sup-4
Sup-3
VSL
SW1 SW2
• Same design profile – 1 dual sup
• Flexible to increase VSL capacity
• Continue to leverage existing non-VSL
10G linecard for uplink connection
• Retains all original VSL benefits
• Vulnerable design during any
supervisor self-recovery fault incident
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 198
6500E/6800 VSS quad-supervisor VSL design
SSO advantage
Sup2T/6T quad-supervisor NSF/SSO VSL redundancy
Recommended: Full-Mesh VSL on Quad-Sup
Sup-2 Sup-1 Sup-2
Sup-3 Sup-4
Sup-4 Sup-3 Sup-4
Sup-4
Sup-3 Sup-3
VSL VSL
ASIC to Port Mapping ASIC to Port Mapping SW-1 Front Panel SW-2
Ports
Internal Stub ASIC – 1 1–8 1–8
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 200
Cisco Catalyst platforms and transitions
Where is VSS?
Cisco Catalyst
Cisco Catalyst
9500 Series
9400 Series
Cisco Catalyst
Cisco® Catalyst® 9300 Series
9200 Series
Cisco Catalyst Cisco Catalyst Cisco Catalyst Cisco Catalyst Cisco Catalyst
2960X/XR Series 3850 copper 4500E Series 3850F/4500-X 6840-X/ 6880-X
Access switching Backbone switching
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 201
“How can I simplify my distribution without VSS?”
StackWise Virtual
• Both StackWise Virtual members must have consistent Cisco IOS-XE and license
StackWise Virtual Pair
WS-3850-48XS WS-3850-48XS SVL
Fast
Distribution
Hello
Access
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 202
Cisco StackWise Virtual (SWV) setup
LAN distribution layer
1) Prepare standalone switches for SWV3850-D1 3850-D2 1) Prepare standalone switches for SWV
3850-D1
3850-D2#conf t
3850-D1#conf t SVL 3850-D2(config)# stackwise-virtual
3850-D1(config)# stackwise-virtual
3850-D2(config-stackwise-vir)# domain <1-255>
3850-D1(config-stackwise-vir)# domain <1-255>
Note: Maximum of 8 SVL member links and 4 dual active detection links
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 203
Virtual Switch Link capacity planning
• Plan VSL capacity to reduce congestion point,
handle failures and specific configurations
VSL
• Supported VSL interfaces types :
• Catalyst 6500E/6800 : 10G and 40G
• Catalyst 4500E/4500X : 1G and 10G
• Catalyst 3850 : 1G, 10G, and 40G
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 205
VSS – multi-homed physical connections
• Redundant network paths per system delivers best architectural approach
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 206
VSS – Multichassis EtherChannel
• MEC enables:
• Simplified STP loop-free network topology
• Consistent L3 control-plane and network design as traditional
Standalone mode system
• Deterministic sub-second network recovery
A1 A2
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 207
Simplified STP network topology with VSS
• VSS simplifies STP. VSS does not eliminate STP.
Never disable STP.
• Multiple parallel Layer 2 network path builds STP
loop network
• VSS with MEC builds single loop-free network to
utilize all available links.
• Distributed EtherChannel minimizes STP
complexities compared to standalone distribution
design
• STP toolkit should be deployed to safe-guard
multilayer network
STP BLK Port
Loop-free L2 EtherChannel
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 208
Traditional distribution design
Redundant design with sub-optimal topology and complex operation
Stabilize network topology with several L2 features:
• STP Primary and Backup Root Bridge
• Rootguard
• Loopguard or Bridge Assurance
• STP Edge Protection
Protocol restricted forwarding topology
• STP FWD/ALT/BLK Port
• Single Active FHRP Gateway
• Asymmetric forwarding
• Unicast Flood
Protocol dependent driven network recovery:
• PVST/RPVST+ and FHRP Tuning
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 209
Resiliency versus performance/scale tradeoff:HSRP
FHRP Active FHRP Standby
• Multichassis EtherChannel based forwarding topologies
• Per-Flow Load Balancing based on Layer 2 to Layer 4 + VLANs
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 210
Resiliency versus performance/scale tradeoff:VSS
• Multichassis EtherChannel based forwarding topologies
• Per-Flow Load Balancing based on Layer 2 to Layer 4 + VLANs
VSS-SW1
• Hardware-Based Fault Detection and Recovery
• Deterministic network convergence with simplistic approach
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 211
PIM timers also need tuning
• Multicast recovery depends on PIM DR failure detection PIM DR
in Layer 2 network
• PIM routers exchanges PIM expiration time in query
message
• DR Failure Detection:
~90 seconds (30 sec. hello * 3 multiplier)
• Tune PIM query interval to sub-sec as FHRP for faster
multicast convergence
• Sub-second protocol timer must be avoided on SSO interface Vlan2
capable network ip pim sparse-mode
ip pim query-interval 250 msec
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 212
Simplified and robust multicast network design
using VSS
• Single PIM DR system in Layer 2 network to process IGMP
from host receivers
• Doubles multicast forwarding performance across all VSS-SW1 PIM-DR
Multichassis EtherChannel member links
• Optimize multicast network with PIM stub configuration
interface Vlan2
ip pim passive
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 213
Multichassis EtherChannel load sharing
• MEC hash algorithm is computed
independently by each virtual-switch to
perform load share via its local physical ports. SW-1 SW-2
EtherChannel section.
1 8 X X X X X X X
2 4 4 X X X X X X
3 3 3 2 X X X X X
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 214
Optimize EtherChannel load balancing
• Load share egress data traffic based on input
hash Core
Default : src-dst-ip vlan
• Optimal load sharing results with :
Recommended : src-dst-mixed-ip-port
• Bucket-based load-sharing – Bundle member-links
in power-of-2 (2/4/8)
• Multiple variation of input for hash (L2 to L4)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 215
Summary: Multichassis EtherChannel performs
better in any network design
• Network recovery mechanic varies in different
1
distribution design –
Convergence (sec)
0.8
• Standalone – protocol and timer dependent
0.6
• VSS – hardware dependent
0.4
• VSS logical distribution system – 0.2
• Single P2P STP Topology 0
• Single Layer 3 gateway L2-FHRP L2-MEC
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 216
VSS-enabled campus core design
• Extend VSS architectural benefits to campus
core layer network
• VSS enabled core increases capacity,
optimizes network topologies and simplifies
system operations
• Key VSS enable core best practices :
• Protect network availability and capacity with
Catalyst 6800 Sup6T Quad-Sup NSF/SSO
• Simplify network topology and routing database
with single MEC
• Leverage self-engineer VSS and MEC capabilities
for deterministic network fault detection and
recovery
Data Center
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 217
VSS core network design alternatives
VSL VSL
VSL VSL
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 218
Catalyst 6500/6800 VSS-enabled campus
design
ECMP forwarding table construction
• ACTIVE switch responsible for: Unicast Forwarding Path
Multicast Forwarding Path
• Construct two software tables : Routing Information Base (RIB)
and Forwarding Information Base (FIB)
T1/2/1 T1/2/1 T2/2/1 T2/2/2
Hardware FIB inserts entries for ECMP routes using locally attached links
If all local links fail the FIB is programmed to forward across the VSL link as last resort
SW1 (ACTIVE) SW2 (HOT_STANDBY)
Unicast ECMP Software RIB (System-Wide) Unicast ECMP Switch-1 Hardware FIB
Four ECMP
RIB Entries Two SW1 HW
FIB Entries
Unicast ECMP Software FIB (System-Wide) Unicast ECMP Switch-2 Hardware FIB
Four ECMP
FIB Entries Two SW2 HW
FIB Entries
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 219
Summary – optimizing core performance (1/2)
HW Driven Forwarding Topology & High Availability Unicast Forwarding Path
Multicast Forwarding Path
VSS-Core
Standalone-Core
VSS-Dist
Standalone--Dist
• •
• •
• •
•
•
•
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 220
Summary – optimizing core performance (2/2)
HW Driven Forwarding Topology & High Availability Unicast Forwarding Path
Multicast Forwarding Path
Standalone-Core
Standalone-Core
VSS-Dist
Standalone-Dist
• •
• •
• •
•
•
•
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 221
Simple core network design delivers
deterministic network recovery
• Routing protocol independent network
convergence in large scale campus core T1/2/1 T1/2/1 T2/2/1 T2/2/2
Convergence (sec)
2.5
tuning required 2
1.5
• Hardware-based fault detection and recovery in 1
MEC/EC designs 0.5
0
500 1000 5000 10000 15000 20000 25000
ECMP (W/o PIC) ECMP (With PIC) MEC
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 222
VSS core simplifies multicast operation, improves
performance and redundancy (1/2)
• Standalone core needs anycast MSDP peering
for RP redundancy AnyCast - MSDP
Core
• ECMP builds single multicast forwarding path
PIM RP PIM RP
and protocol-based fault detection and recovery
Single OIL
PIM Join
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 223
VSS core simplifies multicast operation, improves
performance and redundancy (2/2)
Single Logical
• VSS based Catalyst systems enables PIM PIM RP Core
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 224
Simplified multicast network design delivers
deterministic network recovery
• ECMP multicast recovery is mroute scale dependent could range in
seconds.
• MEC/EC multicast recovery is hardware-based and recovery is scale-
independent in sub-seconds
6
Convergence (sec)
5
4
3 ECMP
2 MEC/EC
1
0
100 500 1000 5000
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 225
Implementing non-stop forwarding
• Catalyst 4500E, 4500X and 6500E/6800 deployed in VSS mode must enabled NSF.
No configuration required on NSF Helper system
• NSF capability must be manually enabled for all Layer 3 routing protocols :
• EIGRP, OSPF, ISIS, BGP, MPLS etc..
• In VRF environment the NSF must be manually enabled on per-VRF IGP instance
Inter-Chassis NSF/SSO Recovery Analysis
• Multicast NSF capability is default ON 16
14
Convergence (sec)
12
10
8
6
4
2
0
Without NSF With NSF
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 226
Sub-second protocol timers and NSF/SSO
Core
• NSF is intended to provide availability through route convergence avoidance
• Fast IGP timers are intended to provide availability through fast route convergence interface Port-Channel 10
ip ospf dead-interval minimal multiplier 4
• In an NSF environment dead timer must be greater than:
• SSO recovery + Routing Protocol restart + time to send first hello
• Recommendation –
• Do not configure aggressive timer Layer 2 protocols, i.e. Fast UDLD
VSL
Dist
• Do not configure aggressive timer Layer 3 protocols, i.e. OSPF Fast Hello, BFD etc.. Keep all
protocol timers at default settings
0.2 0.2
Access
0.1 0.1
Catalyst 2K/3K/4K
0 0
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 227
Campus wired LAN design
Option 3: Layer 2 access with “simplified” distribution (BRKCRS-1500)
Logical
• Leading campus design for easy configuration
topology— and operation when using stacking or similar
L3: technology (VSS, StackWise Virtual)
core/dist. • Flexibility to support Layer 2 services within
L2:
dist./acc.
distribution blocks, without FHRPs.
• Easy to scale and manage
Survives device and link failures
• Select appropriate VSS capable system that fits in network and solution requirements
• Plan and design VSL with appropriate capacity, diversification and redundancy
• Keep Layer 2 and Layer 3 protocol timers at factory default. Do not enable protocols with
aggressive timers
• Configure redundant dual active trusted ePAgP neighbors (L2/L3)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 230
Agenda
• Designing High Availability Networks for the Enterprise
• System Hardware and Software Resiliency
• High Availability Architectures:
• Enterprise Wired LAN
• Multilayer Campus Distribution and HA Considerations
• Simplified Distribution and HA Advantages
• Extending HA Advantages by Simplifying Virtualization
• Enterprise Data Center
• Enterprise Wireless LAN
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 231
•
•
•
•
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 232
Hop-by-hop network virtualization
Multi-VRF architecture overview
• Two preset network setup:
• Hop-by-hop network segmentation with logical connection
• Build control and data-plane over each logical connection
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 233
Hop-by-hop network virtualization
Data-plane isolation
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 234
Multi-VRF: Campus network design alternatives
Standalone devices
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 235
Multi-VRF: Campus network design alternatives
Cisco VSS
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 236
M-VRF: Per-hop VPN control plane complexity
ECMP unicast and multicast adjacencies comparison (1 of 4)
Standalone Design
10 VRF Sample Design
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 237
M-VRF: Per-hop VPN control plane complexity
ECMP unicast and multicast adjacencies comparison (2 of 4)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 238
M-VRF: Per-hop VPN control plane complexity
ECMP unicast and multicast adjacencies comparison (3 of 4)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 239
M-VRF: Per-hop VPN control plane complexity
ECMP unicast and multicast adjacencies comparison (4 of 4)
VSS-ECMP Design
10 VRF Sample Design
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 241
Multi-VRF: MEC design simplifies complexity
EC/MEC unicast and multicast adjacencies comparison (2 of 4)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 242
Multi-VRF: MEC design simplifies complexity
EC/MEC unicast and multicast adjacencies comparison (3 of 4)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 243
Multi-VRF: MEC design simplifies complexity
EC/MEC unicast and multicast adjacencies comparison (4 of 4)
LSP
Core
IP/MPLS
LSP LSP
Distribution
LSP LSP LSP
IP
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 245
Simplified underlay = simplified overlay (before)
P/PE P/PE
VPN PE Management
MP-iBGP PE Systems
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 246
Simplified underlay = simplified overlay (after)
P/PE
VPN PE Management
MP-iBGP PE Systems
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 247
MPLS before VSS
IGP Tuning
OSPF LSA/SPF Tuning P/PE P/PE
BGP Tunings
MP-iBGP Multipath
Control/Management/Forwarding Complexity
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 248
MPLS VSS benefits summary
IGP Tuning
OSPF LSA/SPF Tuning
P/PE
BGP Tunings Scale-independent Recovery
Control/Management/Forwarding Complexity
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 249
Headquarters
WAAS
Access
Switches
Distribution WAAS
Switches Central Manager
Nexus
WAN Communications
Router Internet Edge Managers
s
Access
Switches Internet Cisco ACE
Routers Data Center
Regional Site Wireless Firewalls
LAN
Controller Nexus
Wireless LAN Data
Internet
Controllers
Center
RA-VPN Firewall
Access WAN Access
Switch Route Switch
r Guest Wireless
DMZ
LAN Controller
Remote Site Switch
Web
Security
Appliance DMZ
Servers
Email
Teleworker/
Mobile Worker Hardware and Security Core
Software VPN Appliance
Switches
WAN
Access Routers
Switch
Stack
MPLS WAN
Router
WANs s Distribution
Switches
User
WAAS Access
Remote
Site Layers
WAAS
WAN Remote Site
Aggregation Wireless LAN
Controller
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
What’s different in your network today versus a
decade ago? How does it affect availability?
Cyber
Mobility IoT Security
Bring Your Own Device Auto-detect Non-User Devices Networking and Security
Devices in the Workspace Devices everywhere Advanced threats
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 251
Key Challenges for Traditional Networks
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 252
What if you could do this?
Cisco Software-Defined Access
Border Border
Nodes Nodes
• Enables:
• Host mobility
• Network segmentation Edge Edge
Nodes Nodes
• Role-based access
control Logical Layer 2 Overlay Logical Layer 3 Overlay
• It is an overlay network
to the network underlay
• Control plane based on LISP
• Data plane based on VXLAN
Physical Topology
• Policy plane based on TrustSec
Software-Defined Access Design Guide - CVD
[Link]
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 253
SD-Access
Why overlays?
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 254
SD-Access
Types of overlays
• •
• •
• •
• •
•
•
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 255
Campus wired LAN design
Option 4: Cisco Software-Defined Access (BRKCRS-1501, many others)
Logical
• Uses advantages of a routed access physical
topology— design, with Layer 2 capable logical overlay
L2/L3: design
flexible OR • Provisioning and policy automation
overlays • Integrates wireless into the same policy
• Requires automation to simplify configuration
Survives device and link failures
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 257
Missed One?
Access Cisco Software-Defined Sessions are available online
@[Link]
Cisco Live Barcelona - Session Map
Tuesday (Jan 29) Wednesday (Jan 30) Thursday (Jan 31) Friday (Feb 01)
08:00-11:00 11:00-13:00 13:00-15:00 15:00-18:00 08:00-11:00 11:00-13:00 13:00-15:00 15:00-18:00 08:00-11:00 11:00-13:00 13:00-15:00 15:00-18:00 08:00-11:00 11:00-13:00 13:00-15:00 15:00-18:00
BRKCLD-2412 BRKCRS-3811
Cross-Domain Policy SD-Access Policy
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 258
SD-Access resources
Related Sessions Cisco SD-Access - 8H Technical Seminar - TECCRS-3810
Reference
• Monday, Jan 28 8:30 AM - 6:45 PM
Cisco SD-Access - Technology Deep Dive - BRKCRS-3810 Cisco SD-Access - Scaling to Hundreds of Sites - BRKCRS-2825
• Tuesday, Jan 29 2:30 PM - 4:00 PM • Wednesday, Jan 30 2:30 PM - 4:00 PM
Cisco SD-Access - Connecting Multiple Sites - BRKCRS-2815 Cisco SD-Access – Integrating Existing Network - BRKCRS-2812
• Wednesday, Jan 30 11:00 AM - 1:00 PM • Friday, Feb 01 11:30 AM - 1:30 PM
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 259
Campus wired LAN design options—summary
Traditional Layer 3 L2 Access / SD-Access /
Multilayer Routed Simplified Fabric for
Campus Access Distribution Campus
BRKCRS-2031 BRKCRS-3036 BRKCRS-1500 BRKCRS-1501
(and many others)
Logical
topology OR
Physical
topology:
2 core
2 dist./acc.
On-line library at [Link] TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 260
How do I get there?
Successful deployments… …start with a plan.
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 263
Dana Daum Maren Kostede
Technical Solutions Architect
Communications Architect
Junmei Zhang
Technical Marketing Eng.
Samer Theodossy
Principal Engineer
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 265
… a typical day of a connected life…
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 266
No Wireless == No Network Access
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Section Objective
admin
The goal of this section is to show you how to design and deploy a Highly
Available wireless network to reduce the network downtime
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 268
Wireless High Availability concepts
• Good news: all the High Availability concepts and best practices we have seen for wired are
applicable to wireless access as well
• Bad news: wireless is not wired
Ch 1 Ch 6 Ch 11
Thin air…..
We use the air to transmit packets, it’s a shared media, it’s unlicensed….enough?
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 269
Agenda
• High Availability (HA), the theory of operations:
• What to do at the Radio Frequency layer?
• Controller HA for different Deployment Modes:
Centralized (Cloud/non-Cloud)
SD-Access
FlexConnect
Mobility Express
• HA Design and Deployment Practices
• Wireless Assurance: proactively monitor your network!
• Key takeaways
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 270
RF HA – how to build redundancy at the RF layer?
My
Myantenna
power isgain
halfisof4
my times
brother
smaller
MacBook
I trythen
and to connect
move totoanother
5GHz
and stay ifconnected
BSSID until
it is REALLY
the signal better
is REALLY bad
Adaptive 802.11r, FastLane, iOS Analytics
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 274
Radio Frequency (RF) High Availability
• Tools
• What you use is less important than how you use it
• Use the same tool to compare results
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 275
RF High Availability: Cisco RRM
• What’s RRM
• DCA—Dynamic Channel Assignment
• TPC—Transmit Power Control
• CHDM—Coverage Hole Detection and Mitigation
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 276
RF High Availability: Cisco RRM
• RRM DCA in action
A rogue AP is detected on
channel 11
11
Channel change is triggered to
improve the RF
1 11
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 277
RF High Availability: Cisco RRM
RRM Channel Hole Detection Mitigation (CHDM) in action
RRM will determine the optimal
Power plan based on AP layout
If an AP fails…
If an AP fails…
5GHz. 2.4-5GHz
2.4GHz
Serving Monitoring
Serving
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 280
Summary
“RF Matters”
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 282
… adding a Wireless Controller (functionality)
Private or public
Cloud
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 284
Wireless Controller modes fitting different
requirements
Centralized
Configure SDA-Wireless Flex Set
Connect
up Mobility Express
Ease
Fromof Deployment
a web browser or Policy Segmentation and Eliminate the need for a
and Simplified Controller-less
Cisco wireless app,for
management use consistent wired-wireless Controller at every Site for a
largethe
campuses. Cloud
setup wizard to management deployment for distributed
distributed deployment. Cloud
and non-Cloud options.
enable multiple APs deployments and small sites
and non-Cloud options.
simultaneously
LAN
Campus Fabric WAN
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 285
Cisco Wireless Controller Options
Launched Nov 2018
Controller Series
Catalyst 9800
200 APs 1000 APs 2000 APs 3000 APs 6000 APs
AireOS WLCs
WLC 3504
150 APs
Mobility Express WLC 5520 WLC 8540
50-100 APs 1500 APs 6000 APs
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco Catalyst 9800 Series – Wireless benefits
Powered by IOS XE
Open and Programmable
Trustworthy Solutions
Modular operating system
• On-Prem, Private/Public cloud, • Software updates with no • Detect encrypted threats with
Embed wireless on a 9k switch disruption Encrypted Traffic Analytics (ETA)
• AWS GovCloud ready • Rolling AP upgrades • Integration with StealthWatch
• Scale as you grow • Seamlessly add new AP models • Automated macro/micro
segmentation with SDA
• WPA3 Support*
*Future
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 288
High Availability
Reducing downtime for Upgrades and Unplanned Events
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 289
Centralized Mode High Availability: SSO and N+1
Requirements Benefits
No License needed on
• 1:1 box redundancy
secondary Controller
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 290
Wireless Controller HA -
Centralized Mode
N+1 Redundancy
N+1 Redundancy • Administrator statically assigns APs a primary,
WLAN-Controller-A WLAN-Controller-B WLAN-Controller-C
secondary, and/or tertiary controller
• Assigned from controller interface (per AP) or Prime
Infrastructure (template-based)
• You need to specify Name and IP if WLCs are not in the
same Mobility Group
IP Network • Pros:
• Predictability: easier operational management
• Support for L3 network between WLCs
Access Point
• Flexible redundancy design options:1:1, N:1, N:N:1
Primary: WLAN-Controller-1 Primary: WLAN-Controller-2
Primary: WLAN-Controller-3 • WLCs can be of different HW and SW (*)
Secondary: WLAN-Controller-2 Secondary: WLAN-Controller-3
Secondary: WLAN-Controller-2
Tertiary: WLAN-Controller-3 Tertiary: WLAN-Controller-1
Tertiary: WLAN-Controller-1 • “Fallback” option in the case of failover
• Can overload APs on controllers (using AP priority)
Cons:
• Stateless redundancy. There is a network downtime
when the WLC fails
• More upfront planning and configuration
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 292
N+1 Redundancy
Global backup Controllers
Configuration > AP Join >…
Controller Series
Catalyst 9800
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 293
N+1 Redundancy
AP Failover mechanism
< 30-45 sec (*)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 294
N+1 Redundancy
AP Fast Heartbeat
< <30-45
30-45sec
sec(*)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 295
N+1 Redundancy
AP Primary Discovery Request Timer
• The access point periodically sends primary discovery requests to the Primary WLC to
know when it is back online. Default is 120 sec.
• If AP Fallback is enabled (default), the AP automatically joins back the Primary controller
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 296
N+1 Redundancy Failed WLC Backup WLC
AP Failover Priority
Overloaded
Critical AP
fails over
• Assign priorities to APs: Critical, High,
Medium, Low Medium priority
AP dropped
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 297
N+1 Redundancy
Typical Design
< 30-45 sec Geo separated DC
Centralized Mode -
Stateful Switch Over
(SSO)
High Availability (Client SSO)
A direct physical connection between Active and Standby Redundant Ports or Layer 2 connectivity is
required to provide stateful redundancy within or across datacenters
Sub-second failover and zero SSID outage
Active Wireless Hot-Standby Wireless
Controller Controller
C9800-40-K9
Redundancy Port Connectivity
RP via L2
Gigabit SFP RP port Gigabit SFP RP port
C9800-80-K9
C D C D
P P P P
C D C D C D C D
P P P P P P P P
HA interface
HA interface
vswitch
vswitch vswitch
vswitch
vswitch vswitch
switch
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 304
Stateful Switchover (SSO) < 1 sec
• HA Pairing is possible only between the same type of hardware and software versions
• True Box to Box High Availability i.e. 1:1
• One WLC in Active state and second WLC in Hot Standby state
• Secondary continuously monitors the health of Active WLC via dedicated link
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 306
Stateful Switchover (SSO)
Failover sequence ACTIVE STANDBY
ACTIVE
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 308
High Availability
Cisco Catalyst 9800
Wireless Controller
Differentiators
Reducing downtime for Upgrades and Unplanned Events
Hot Patch
Controller Software Update (No Wireless Controller Cold Patch
Software Maintenance updates ( SMU^ ) reboot) HA install on SSO Pair
Auto Install on Standby
Flexible
Rolling AP Update
Access Point Updates (No Wireless Controller
AP Device
Pack
Per-Site,
New AP Model & AP updates* New AP Model
Per-Model
Reboot) Updates
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 309
Wireless Controller HA –
Catalyst 9800 only
High Availability
Cisco Catalyst
9800 Wireless
Controller
Differentiators
Reducing downtime for Upgrades and Unplanned Events ^ MD Release Only
Hot Patch
Controller Software Update (No Wireless Controller Cold Patch
Software Maintenance updates ( SMU^ ) reboot) HA install on SSO Pair
Auto Install on Standby
Flexible
Access Point Updates Rolling AP Update AP Device
Pack
Per-Site,
(No Wireless Controller
New AP Model & AP updates* Reboot) New AP Model Per-Model
Updates
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 311
Future
SMU on MD
Release only
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 312
Wireless Controller SMU
Hot Patch Cold Patch
Wireless Controller SMU installation (No Wireless Controller reboot)
Wireless Controller Reboot
Options Auto Install on Standby
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 313
Catalyst 9800 SMU Cold Patch + AP Service
Pack
Follows ISSU path and both
Standby & Active controller
reloaded but there is no
Active Standby impact to AP and Client
session.
SMU SMU
SMU SMU
Hot Patch
Controller Software Update (No Wireless Controller Cold Patch
Software Maintenance updates ( SMU^ ) reboot) HA install on SSO Pair
Auto Install on Standby
Flexible
Access Point Updates Rolling AP Update AP Device
Pack
Per-Site,
(No Wireless Controller
New AP Model & AP updates* Reboot) New AP Model Per-Model
Updates
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 315
Rolling AP Upgrade: Choose how aggressive…
N=4 Neighbor APs N=8 Neighbor APs N=24 Neighbor APs
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 317
High Availability
Cisco Catalyst
9800 Wireless
Controller
Differentiators
Reducing downtime for Upgrades and Unplanned Events ^ MD Release Only
Hot Patch
Controller Software Update (No Wireless Controller Cold Patch
Software Maintenance updates ( SMU^ ) reboot) HA install on SSO Pair
Auto Install on Standby
Flexible
Access Point Updates Rolling AP Update AP Device
Pack
Per-Site,
(No Wireless Controller
New AP Model & AP updates* Reboot) New AP Model Per-Model
Updates
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
AP
N+1 Rolling AP Upgrade
Wireless Controller image upgrade using N+1 staging Controller
Trigger Rolling
Upgrade
Version : X
X+1 Mobility Group Version: X+1
Software Defined-
Access Wireless
Software Defined Access: Bringing Intent Based
Networking to Life
Cisco DNA Center
Automated
Network Fabric
Policy Automation Analytics
Single Fabric for Wired & Wireless
with simple Automation
B B
C
Outside
Identity-Based
Policy & Segmentation
Decouples Security & QoS
from VLAN and IP Address
Insights &
SDA
Extension
Telemetry
User Mobility
Analytics and Insights into
Policy stays with User
User and Application behavior
IoT Network Employee Network © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Catalyst 9800 SD-Access Wireless
Cisco DNA Center
Controller Appliance or
Private Cloud
SD-WAN
(Viptela)
c c
MPLS | Metro
SD-Access 4G/5G/LTE | Internet
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 322
Software Defined-Access: Roles and Terminology
Cisco DNA Controller – Enterprise SDN
Controller for Automation & Assurance. GUI
Cisco DNA management abstraction via multiple Service
Apps
Identity Controller
ISE / AD Identity Services – NAC & ID Systems
Services (e.g. ISE) for dynamic Endpoint to Group
mapping and Policy definition
Fabric Mode
WLC Control-Plane (CP) Node – – Map System that
manages Endpoint to Device relationships
Fabric Border
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 324
Platforms supporting SD-Access Wireless
Optimized for Distributed Braches Small and Medium Campus Medium and Large Campus
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 326
Wireless Controller HA
FlexConnect Mode
FlexConnect quick recap…
Controller
Cluster
Central Site
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 329
Clients at locally switched SSIDs stay connected
at Controller/WAN outage
AAA/ Prime
RADIUS
WAN
WAN
Outage
Wireless Controller Access Point
Branch Office
CAPWAP Control – UDP 5246
CAPWAP Data – UDP 5247
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 330
Impact of WAN Outage or Controller Failure
Controller
Cluster
Central Site 1
1. Controller failure : 2. WAN Failure/ Controller
N+1 HA Design: not reachable:
• No Impact for locally switched • Access Point will continue to
SSIDs transmit/receive Data on
• FlexConnect AP will search for locally switched SSIDs. 2WAN
backup WLC and resume • Connected Clients stay
client sessions with centrally connected
switched SSIDs. • Fast roaming is possible for
1:1 HA Design with Client SSO: Clients with
• No impact for centrally CCKM/OKC/802.11r support
switched SSIDs: Centrally and • New Clients can connect if Local
locally switched SSIDs stay local RADIUS or Authentication Switching
up. provided.
• Lost features: RRM, wIDS,
location, WebAuth, NAC
FlexConnect Branch
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Office
Wireless Controller HA
Mobility Express
Cisco Mobility Express: Controller Function
embedded into the access point
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 333
Mobility Express Overview
AIR-AP1852I-B-K9
MASTER
AIR-AP2702I-B-K9
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 334
Mobility Express: Master AP Redundancy
• If Master AP fails, another Mobility Express capable AP is elected
automatically.
• Newly elected Master AP has same IP and config as original Master AP.
• Election Priorities
1. Most capable Access Points. 3800 > 2800 > 1800.
2. AP with least client load
3. In case of tie, election based on lowest MAC Address
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 335
Mobility Express Master Election Process
AIR-AP1852I-B-K9
P
AIR-AP2802I-B-K9 AIR-AP1852I-B-K9
MASTER
AP
AIR-AP2702I-B-K9
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 336
Mobility Express WLAN Deployment Options
Single Office Distributed Office Distributed Enterprise
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 337
Agenda
• High Availability (HA), the theory of operations:
• What to do at the Radio Frequency layer?
• Controller HA for different Deployment Modes:
Centralized (Cloud/non-Cloud)
SD-Access
FlexConnect
Mobility Express
• HA Design and Deployment Practices
• Wireless Assurance: proactively monitor your network!
• Key takeaways
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 338
HA Best Practices: Connecting an AP to the
wired network
Recommendations:
Create redundancy throughout the access layer by
connecting APs to different switches/stack
members/linecards
If the AP is in Local mode, configure the port as
access with SPT PortFast, BPDU guard, etc..
If the AP is in Flex mode and Local Switching,
configure the port as trunk and allow only the
VLANs you need
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 339
HA Best Practices: Connecting a Single
Controller to the wired network
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 340
HA Best Practices: Connecting HA pair to the
wired network
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 341
HA Best Practices: Connecting a Client SSO
Controller Cluster to the wired network (VSS)
Option 2: to VSS pair
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 342
HA Best Practices: Connecting a Client SSO
Controller Cluster to the wired network (HSRP)
Option 3: to HSRP pair
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 343
HA Deployment Best
Practices
Focus on Campus
HA Deployment Best Practices
Campus
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 345
HA Deployment Best Practices for Campus
N+1 SSO SSO + 1 SSO + SSO
L2
Primary Secondary Primary Controller
Primary Controller
Controller Controller
Active WLC Standby WLC
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 362
HA Deployment Best Practices
Campus
• What is the acceptable downtime for your business applications?
• No downtime? Go with AireOS Stateful Switchover
• Are 30 sec to few minutes ok? Go with N+1 to have more deployment flexibility
• What is the downtime to upgrade a HA pair and how to minimize it?
• What is the recommended HA deployment in a multi-site Campus?
1. Use 2-Tier Redundancy (SSO and N+1) HA deployment
• Use SSO in the main site (Primary WLC)
• Use Secondary/Tertiary in redundancy sites
2. For max resiliency use SSO in all sites
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 363
Multi-site Campus: Combine SSO with N+1 DC 1
SSO pair can act as the Primary Primary 9.6.61.x/ 24 Secondary WLC
Controller and be deployed with [Link]
single Secondary and Tertiary WLC Main Data Centre
Si
PI ISE DC 2
Network downtime:
Tertiary WLC
• No network downtime for single controller
failure in the Primary DC
.2
Si
IP network [Link]
Recommendations: AP Config:
Primary WLC – [Link]
• Make sure that AP Fallback is enabled Secondary WLC – [Link]
• Use AP Failover priority in case of Tertiary WLC – [Link]
oversubscription of the backup WLC Si
Si Si
Si
• Useful to reduce downtime for SSO pair
software upgrade
Campus
Access
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 364
Multi-Site Campus : SSO everywhere! DC 1
assigning primary, Si
.3
secondary, tertiary to the PI
Si
DC 2
APs.
Si Si Tertiary SSO
Max level of High 9.6.63.x
Availability: no network .2
downtime upon controller Si
Si
AP Config:
Campus Primary WLC – [Link]
Secondary WLC – [Link]
Access
Tertiary WLC – [Link]
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 365
HA Deployment Best
Practice
Focus on Branch
HA Deployment Best Practices: Branch Key
Design Questions
Local Controller
FlexConnect
Controller (Appliance/virtual) Mobility Express • Single pane of Mgmt. &
Troubleshooting
• Specific per branch configuration • Specific per branch configuration • Reduced branch footprint
• Independency from WAN quality • Independency from WAN quality • Built-in resiliency
• Reduced configuration on • low hardware footprint (Controller • Perfect fit for centralized IT Team
switches running on Access Point)
• Full feature support
• L3 roaming supported
HA questions:
• Is the branch independent from the Central site from an operation prospective?
• What is the traffic flow of your application? Are the APP servers centrally located?
• Is there a local Internet breakout? How do you authenticate new users if WAN/Controller is
down? Where is the AAA server located?
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 367
FlexConnect Branch Summary
“Central Controller Cluster for thousands of Sites and Access Points”
Key Facts
Data Centre
Campus Services • “Cloud Controller”
When to use:
(private or public)
ISE • Perfect for centralized IT Team
WLC SSO pair • Ease of Operations:
single point of High Availability:
PI configuration for up to • If controller not reachable:
Si
6000 APs
Si
• Local Data path stays UP and Clients stay
connected, you can use AAA survivability
WAN
• SSO at central site provides control plane
survivability
Si
• Branch/local IT staff requires configuration outside
PI of corporate standard
WAN
High Availability:
• Full features available if WAN is down
Remote • use N+1 or SSO for site controller redundancy
WLC
Local Services: location
• Local Authentication, DHCP, DNS required for full
AAA, DHCP, DNS
WAN Independency
Si
Keep in Mind:
• Need to manage each site individually
• Prime Infrastructure should be considered for central
manageability
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 369
Mobility Express Branch Summary
“Quick and Easy setup, no additional Hardware, WAN Independency”
When to use:
Data Centre • Key Facts:
Campus Services • WAN independency is required and low hardware
• It’s a Wireless Controller
footprint is desired.
running on an Access
ISE • Ideal for new deployments using 18xx/28xx/38xx
Point!
Si
Series Access Points
Si
PI High Availability:
WAN
• Self-Healing redundancy
• Independent from WAN
Remote • Local AAA, DHCP, DNS for full WAN independency
Local Services: location
AAA, DHCP, DNS Si Keep in Mind:
• Switchport as Trunk if SSID/VLAN separation
needed
• Per branch configuration and management
• consider adding Prime Infrastructure or Cisco DNA
Mobility Express APs Center for central management
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 370
Agenda
• High Availability (HA), the theory of operations:
• What to do at the Radio Frequency layer?
• Controller HA for different Deployment Modes:
Centralized (Cloud/non-Cloud)
SD-Access
FlexConnect
Mobility Express
• HA Design and Deployment Practices
• Wireless Assurance: proactively monitor your network!
• Key takeaways
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 371
Cisco Wireless
Assurance:
Proactively Monitor your
Network
Cisco DNA Center can manage all wireless deployment
modes for Automation and Assurance
Cisco DNA Center
SDA-Wireless Centralized
Configure Flex Set
Connect
up Mobility Express
From a web browser or Simplified Controller-less
Policy Segmentation and Ease of Deployment Eliminate the need for a
Cisco wireless app, use
andthe
management deployment for distributed
consistent wired-wireless setup wizard for
to Controller at every Site for a
large
enablecampuses
multiple APs distributed deployment deployments and small sites
management
simultaneously
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 373
Continuous Verification
Configs, Changes, Routing, Security
Services, Compliance, Audits
Successful Rollouts, Operational Continuity
Corrective Actions
Guided Remediation, Automated Updates
System Optimization
IT Productivity
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
*Available with 16.10.1s
Purpose-Built for Cisco DNA Assurance and Cisco DNAC 1.2.8 or later
• HTTP 2.0/gRPC based • Supported from AireOS 8.5 • KPI Parity with AireOS • HTTPS for Automation and
• Anomaly Event, RF Stat, • Real-Time client event • Immediate Event Update reporting
PCAP, Spectrum • Embedded Wireless in • PnP-based Provisioning
• Scheduled and Automated Cat9300 • Fully Managed by Cisco
DNAC
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 375
Cisco DNA Center: Built ground-up for Assurance
• Client RF stats, Onboarding • Floor reassignment to make • Live and In-Service capture of
state and location (<5 sec) 1800s sensor mobile Onboarding failures with PCAPs
• Client Onboarding Health with • Speed tests to validate Cloud • Spectrum Analyzer for analyzing
Sankey charts for better app connectivity Interference sources
analysis • IP SLA tests for Real-time • On-Demand AP stats for Wi-Fi
• Near-Real time Client tracking AppX assessment troubleshooting
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 376
Wireless Assurance: Client Onboarding
Client
Onboarding
Actionable Dashboards:
1 Onboarding Sankey charts
for better analysis
Sankey chart
Real-time Correlation:
Correlate Onboarding
2 events with poor RF and
client location for RCA
Intelligent Capture:
3 Onboarding failures with
In-service PCAPs
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 377
Wireless Assurance: Sensors to monitor SLAs
Sensor based
SLA Monitoring
Simulate Client
perspective:
1 1800s Sensor is mobile
with floor re-assignment
Active Testing:
Test the cloud app
2 performance and Real-
time AppX assessment
SLA Dashboard:
3 Onboarding, Network
Services, Cloud App
Performance and IP SLA
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 378
Cisco Sensors: Intelligence of Cisco DNA
Assurance to the edge
Client 360:
Historical Time travel with
2 client RF correlated with
the Onboarding events
Intelligent Capture:
3 On-Demand AP stats for
Wi-Fi troubleshooting
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 380
Know you clients!
Client Insights– Apple iOS Analytics
1 Device Profile
2 Wi-Fi Analytics 3 Assurance
Client shares these details Client shares these details Client shares these details
1. Model e.g. iPhone 7 1. BSSID Error code for why did it
2. OS Details e.g. iOS 11 2. RSSI previously disconnected
3. Channel #
Support per device-group Insights into the clients view Provide clarity into the
Policies and Analytics of the network reliability of connectivity
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 381
Cisco DNA Wireless Assurance
• Be proactive: Use Sensor-based verification for
critical services!
• Know your clients: Cisco/Apple WiFi iOS
Analytics.
• Intelligent Capture: Who’s fault is it? “always
on” packet capture – helping to differentiate
between RF or application/client issue.
• Go back in time: What happened yesterday/last Cisco DNA Center
week?
• Actionable Insights: Provide guidance on how Policy Automation Analytics
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 383
I would like to leave you
with…
Key Takeaways
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 385
Selected additional Wireless Sessions…
For Your
Reference
• Cisco DNA Wireless Assurance: Isolate problems for faster troubleshooting - BRKEWN-2034
• Tuesday, Jan 29, 2:30 PM - 4:00 PM | Hall 8.0, Session Room A108
• Cisco DNA Center Assurance and Analytics– Reducing Time to resolution using Big Data and
Machine Learning - BRKNMS-2542
• Wednesday, Jan 30, 2:30 PM - 4:00 PM | Hall 8.0, Session Room D134
• Improve Enterprise WLAN Spectrum Quality with Cisco's advanced RF capacities (RRM, CleanAir,
ClientLink, etc) - BRKEWN-3010
• Wednesday, Jan 30, 8:30 AM - 10:30 AM | Hall 8.0, Session Room A103
• Thursday, Jan 31, 11:00 AM - 1:00 PM | Hall 8.0, Session Room C126
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 386
Agenda
• Designing High Availability Networks for the Enterprise
• System Hardware and Software Resiliency
• Foundations of the Structured Network Design
• High Availability Architectures:
• Enterprise Wired LAN
• Enterprise Wireless LAN
• Enterprise Data Center
• High Availability System Recovery Analysis
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 387
Dana Daum Maren Kostede
Technical Solutions Architect
Communications Architect
Junmei Zhang
Technical Marketing Eng.
Samer Theodossy
Principal Engineer
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 389
Data Center HA Section Objective
• Focus is on Enterprise Data Center Network
• High Availability design options and best practices
• High Availability operational best practices
• Same principle: The Enterprise Campus Network High Availability concepts
are applicable to Data Center network
• Same goal: minimize network downtime
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 390
Agenda
• Enterprise Data Center High Availability (DC HA)
• DC Switch NX-OS HA Architecture and HA Features
• DC Network HA Design and Operational Best Practices
Legacy DC with vPC
Programmable Fabric
Application Centric Infrastructure (ACI)
Programmable Network
• Key Takeaways
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 391
Platform-dependent
NX-OS HA Architecture
hardware-related modules
System-infrastructure
modules
Feature
• Fully distributed modular modules
design
• Control-plane & data-plane
separation
Feature API
• Service restart-ability API
Management
• Non-disruptive SSO* Infrastructure
HA
& ISSU Infrastructure
API
Hardware
Drivers
Netstack
Kernel
*SSO only available on dual-sup Nexus 7x00 and 9500
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 392
NX-OS Service Restart-ability
• Stateful Restart with Persistent Storage Service (PSS)
• Checkpoints states to PSS
• Recover states from PSS upon restart
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 393
NX-OS NSF with Stateful Fault Recovery
Restart process!
Software RIB
TCP/UDP
HSRP
OSPF
LACP
BGP
IPv6
STP
PIM
etc.
Graceful restart Graceful restart
HA Manager
Linux Kernel
LC - NSF
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 395
NX-OS NSF Configuration
• The Nexus products are “NSF Capable” by default
for all the routing protocols in all NX-OS software releases.
• No additional configuration is required unless
you need to modify the default NSF timers.
version 4.2(1)
feature ospf
<snip>
router ospf 1
graceful-restart
graceful-restart grace-period 60
area [Link] authentication message-digest
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 396
NX-OS Software Maintenance Upgrade
Direction
• Non-Disruptive Bug Fix • Limited number of Patches
for re-startable/ stateful processes supported
• Works with or without ISSU • Not every bug will have a patch
• For Operationally Impacting • May be disruptive
Bugs with no workaround
• Platform and process specific
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 397
NX-OS Stateful Process Restart & Patching
• NX-OS services Checkpoint their runtime state to the Control-Plane
Persistent Storage Service O SPF 1 EIGRP B GP B GP
Restart
process!
BGP
Management H S RP 1 O TV vPC H S RP 2
Infrastructure
Data-Plane
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 398
Software Patching: CLI procedure
SMU SMU
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 399
Dual Supervisor Standard ISSU
N7K# install all kickstart bootdisk:7.2-kickstart system bootdisk:7.3-system
Release 7.3
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 400
Fixed Switch (ToR) Standard ISSU
• Control plane is inactive during Reload supervisor
reload while data plane is Control-Plane
Data-Plane
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 401
Fixed Switch (ToR) Enhanced ISSU (N3K/N9K)
Container – B
spawned to bootup Container- B
#install all nxos [Link] with NX-OS (V2) becomes Active
NX-OS upgraded to V2
Container - A Container - B with ~3-5 seconds impact
Container- A to Control plane traffic
destroyed
NX-OS (V1) NX-OS (V2)
Host OS (Linux)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 402
In-Service Software Upgrade
For Your
Reference
Standard ISSU Dual supervisor modular switch: N9500, Control plane: <3-5 sec
N7700, N7000 Data plane: 0/no service
disruption
Fixed switch: N9300, N3000, N5500, Control plane: < 120 sec
N5600, N6000 Data plane: 0/no service
disruption
Enhanced ISSU Fixed switch: N9300, N3000 Control plane: <3-5 sec
Data plane: 0/no service
disruption
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 403
NX-OS ISSU Best Practices
• For Layer 2 and Layer 3 protocols with sensitive timers, the timeout value should be
increased. Otherwise, the upgrade will be disruptive
• Best practices vPC ToR
Make sure that both vPC peers are in the same mode (traditional ISSU mode or enhanced
ISSU mode)
Connect host using port-channel to a pair of vPC ToR
If ToR vPC is STP root bridge: Enable peer-switch to avoid STP root change during ISSU
If ToR vPC is not STP root bridge: enable all ports as edge/edge trunk ports
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 404
Graceful Insertion and Removal for NXOS
Isolation of Switch from network
vPC vPC
One command!
Pre-change System Snapshot
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 405
Graceful Insertion and Removal for NXOS
Return of Switch into network
vPC vPC
One command!
Post-change System Snapshot
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 406
Configuration Profiles
• Maintenance-mode profile is applied when entering GIR mode,
• Normal-mode profile is applied when GIR mode is exited.
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 407
Graceful Insertion and Removal Feature
Graceful Removal with Isolate command
• New CLI 'isolate' in all Unicast Protocols //Sends route withdraws
router bgp 33
• Make Nexus undesirable for all transit traffic
isolate
• Maintain Protocol Adjacencies //Poisons the routes by sending highest metric
• Send route withdrawals/worse metrics router eigrp 1
isolate
• Local route states are maintained.
//Advertises max-metric router-lsa
• Multicast follows Unicast for RPF router ospf 1
isolate
• Feature available: N5K/6K:7.3(0)N1(1); N7K:
7.3(0)D1(1); N9K/N3K: 7.0(3)I2(1) //Refreshes LSPs with overload-bit on
router isis 1
isolate
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 408
GIR – Platform Specifics
For Your
Reference
Nexus 7K Only support shutdown mode from Default mode is isolate from
7.2(0)D1(1) 7.3(0)D1(1), shutdown is optional mode
Supported features: BGP/BGPv6, EIGRP/EIGRPv6, Supported features: BGP/BGPv6, EIGRP/EIGRPv6,
ISIS/ISISv6, OSPF/OSPFv3, RIP, FabricPath (spine ISIS/ISISv6, OSPF/OSPFv3, RIP, FabricPath (spine
switch), vPC/vPC+, Interfaces switch), vPC/vPC+(shutdown only), Interfaces
(shutdown only)
Nexus 9K/3K Default mode is isolate from 7.0(3)I2(1), Default mode is isolate from 7.0(3)I2(1),
shutdown is optional mode shutdown is optional mode
Supported features: BGP/BGPv6, EIGRP/EIGRPv6,
ISIS/ISISv6, OSPF/OSPFv3, PIM(on vPC), RIP,
vPC(shutdown only), Interfaces (shutdown only)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 409
Putting it all Together
What to use? GIR Mode? Patching? ISSU? All of them?
ISSU ✓ X ✓
GIR + Cold Boot ✓ X ✓
GIR + Disruptive
✓ X ✓
Installer
SMU Restart ✓ X X
GIR + SMU Reload ✓ X X
GIR X ✓ X
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 410
Agenda
• Enterprise Data Center High Availability (DC HA)
• DC Switch NX-OS HA Architecture and HA Features
• DC Network HA Design and Operational Best Practices
Legacy DC with vPC
Programmable Fabric
Application Centric Infrastructure (ACI)
Programmable Network
• Key Takeaways
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 411
Data Center Fabric Technology Evolution
VXLAN EVPN
VXLAN F&L
FabricPath
vPC
STP
2015-2019
2014
2010
2009
2008 and ACI
before
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 412
Cisco Data Center Network Solutions
DB DB
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 413
High Availability Design Principle
Structure, Modularity, and Hierarchy
• Structured design
• Allows you to manage and understand traffic flows, and network failure behavior
• Modular design
• Allows for easier evolution and change to the network
• Hierarchical design
• Provides for improved scalability
• Separates network services into manageable building blocks
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 414
High Availability Design Principle
Structure, Modularity, and Hierarchy
• Optimize the interaction of the physical redundancy with the network
protocols
• Provide the necessary amount of redundancy
• Pick the right protocol for the requirement
• Optimize the tuning of the protocol
• Optimize network convergence failure detection and recovery
• Optimal high availability network design attempts to leverage ‘local’ switch fault
detection and recovery
• Design should leverage the hardware capabilities of the switches to detect and
recover traffic flows based on these ‘local’ events
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 415
Agenda
• Enterprise Data Center High Availability (DC HA)
• DC Switch NX-OS HA Architecture and HA Features
• DC Network HA Design and Operational Best Practices
Legacy DC with vPC
Programmable Fabric
Application Centric Infrastructure (ACI)
Programmable Network
• Key Takeaways
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 416
Cisco Data Center Network Solutions
DB DB
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 417
vPC Feature Overview
vPC Terminology
Layer 3 Cloud
vPC Peer
Keepalive Link
vPC vPC Domain
Peer P S
Peer Link
vPC
Orphan
Device S3
vPC is supported on Cisco Nexus switches (N5k, N6k, N7k, N9k, N3k)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 418
vPC Failure Scenario
vPC Peer-Keepalive Link up & vPC Peer-Link down
• vPC peer-link failure (link loss):
P vPC Peer-keepalive S
• vPC peer-keepalive up
• Status of other vPC peer known S1 S2
SW3 SW4
Keepalive Heartbeat
P Primary vPC
S Secondary vPC
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 419
Legacy DC HA Design with vPC
Core
Core
• Core Layer
S1 S2
• Layer 3 ECMP for multipath redundancy
Aggregation
S3 S4
Access
Access
S5 S6
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 420
Legacy DC HA Design with vPC
Core
Core
• Aggregation Layer
S1 S2
• HSRP / VRRP/ GLBP with vPC for
active/active gateway
Aggregation
• Use default FHRP timers S3 S4
Access
Access
S5 S6
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 421
Legacy DC HA Design with vPC
Core
Core
• Access Layer
S1 S2
• Connect to a pair of Aggregation switch
via Layer 2 port-channel
Aggregation
• Redundant uplinks S3 S4
Access
Access
S5 S6
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 422
Legacy DC HA Design with vPC
Core
• Access Layer
• Double-sided vPC connecting to Aggregation layer
• Higher resilience
• Different vPC domain ID
vPC Domain 10 Aggregation
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 423
VSS vs vPC
Catalyst Nexus
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 424
VSS Design vs vPC Design
Catalyst VSS Nexus vPC
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 425
VSS Design vs vPC Design
Catalyst VSS Nexus vPC
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 426
vPC – Layer 2 Data Center Interconnect (DCI)
DC 1 DC 2
N Network port
vPC domain 11 vPC domain 21 E Edge or portfast
Long Distance
Dark Fiber - Normal port type
CORE
CORE
B BPDUguard
E F F E
- - F BPDUfilter
N N R Rootguard
802.1AE (Optional)
N N
- E F F E -
R
R -
- - R R
Layer 2 vPC
AGGR
AGGR
N N N N
Portchannel
- -
- -
R R
R R
vPC domain 10 vPC domain 20
ACCESS
ACCESS
- -
E E
B B
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 427
For Your
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 436
vPC Hitless Role Change Feature
Without vPC Hitless Role Change With vPC Hitless Role Change
• Traffic interruption. • No traffic interruption.
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 439
Graceful Insertion and Removal Example
FHRP with vPC Switch Isolation using GIR
• Use automatic profile to go Core Network
into GIR. f
Isolate unicast
routing protocol
//Enter maintenance mode using the system mode
L3
maintenance command:
switch# configure terminal L2 VPC
switch(config)# system mode maintenance
Following configuration will be applied: Shutdown
router ospf 100
isolate
vpc domain 2
shutdown
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 440
Legacy DC HA Design with vPC Key Takeaways
• To minimize Legacy DC down time:
• Follow vPC design best practices
• Follow vPC configuration best practices
• Follow vPC operation best practices
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 441
vPC References
For Your
Reference
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 442
Agenda
• Enterprise Data Center High Availability (DC HA)
• DC Switch NX-OS HA Architecture and HA Features
• DC Network HA Design and Operational Best Practices
• Legacy DC with vPC
• Programmable Fabric
• Application Centric Infrastructure (ACI)
• Programmable Network
• Key Takeaways
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 443
Cisco Data Center Network Solutions
Classic Ethernet Programmable Fabric Application Centric Programmable
& VPC Infrastructure Network
DB DB
• Standards-based
• VXLAN BGP EVPN
• Forwarding & Multi-Tenancy
• Disaggregated Management
• Open NX-OS
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 444
Programmable Fabric Underlay HA Design
• Structured design with Spine,
Leaf and Border Leaf
• Allows you to manage traffic flow, External Layer-3 Network
network failure VTEP VTEP
Border Leaf
• Layer 3 IP fabric with point-to-
point link: Spine Spine Spine Spine
Spine
• Better stability, faster convergence
• Redundant links with ECMP
• Scale out spine leaf design
VTEP VTEP VTEP VTEP VTEP VTEP VTEP
Leaf
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 445
Programmable Fabric Underlay HA Design
• Structured design with Border
Spine and Leaf
• Layer 3 IP fabric with point-to-
point link: External Layer-3 Network
• Better stability, faster convergence
• Redundant links with ECMP Spine Spine Spine Spine
Border Spine
Leaf
Pod 1
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 446
Programmable Fabric Overlay HA Design
• VXLAN EVPN based overlay
• Same “Anycast” SVI IP/MAC is
External Layer-3 Network
enabled at all VTEPs/ToRs
VTEP VTEP
Overlay
Spine Spine
Spine
pinning to GW
• Enable host mobility
VTEP VTEP VTEP VTEP VTEP VTEP VTEP
Leaf
SVI IP Address
MAC: 0000.1111.2222
IP: [Link]
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 447
Programmable Fabric Host HA Design
• Host connects to a pair of vPC
External Layer-3 Network
leaf VTEP directly
(recommended)
VTEP VTEP
Border Leaf
Overlay Spine
recommended)
• Redundant host uplinks VTEP VTEP VTEP VTEP VTEP VTEP VTEP
Leaf
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 448
vPC Leaf VTEP Best Practices for HA
• vPC leaf VTEP best practices vpc domain 100
peer-switch
• Enable peer-gateway peer-keepalive destination [Link] source [Link]
delay restore 150
• Enable peer switch peer-gateway
ip arp synchronize
• Enable IP ARP Sync ipv6 nd synchronize
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 449
vPC Delay Restore and Source Hold Down Timer
Spine Anycast VTEP
Spine vPC Peer-Link Advertisement
Control plane
connection not
adjacencies not
fully established X recovered yet
X Leaf 1 X Leaf
Leaf 2
Leaf 1 4
Leaf 2 2
X Host-to-Leaf
Recovering
connection not
device
recovered yet
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 452
vPC Leaf VTEP HA Best Practices
• vPC leaf VTEP best practices
o Enable layer 3 link between the two vPC Underlay Network
VTEPs to connect them in the underlay With IP ECMP Load Sharing
network so that when one VTEP loses all
its uplinks, it can still learn the routes
through its vPC peer, and forward the
traffic via its peer
o Layer 3 link can be dedicated link or via
point-to-point VLAN SVI over vPC peer-
link VTEP vPC- vPC-
....... VTEP-1 VTEP-2
vPC
Port-Channel
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 453
Programmable Fabric External Routing HA Design
The two Border Leaf VTEPs are
independent to each other.
They each individually exchange
Spine RR RR routes with the external routing
devices, and advertise the external
routes into the EVPN fabric
VXLAN Overlay
EVPN MP-BGP Border Leaf
VTEP VTEP
Leaf VTEP VTEP VTEP VTEP
Routing
Protocol
of
Choice
Distributed Anycast Gateway on the internal
VTEPs Leafs
Global Default VRF Instance
BGP multi-pathing needs to be enabled on the IP Routing or User Space VRF Instances
internal VTEP to leverage both border leaf
VTEPs
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 454
Programmable Fabric External Routing HA Design
IP Routing
VXLAN Overlay
EVPN MP-BGP
VTEP VTEP
Leaf VTEP VTEP VTEP VTEP
Anycast Gateway Anycast Gateway Anycast Gateway Anycast Gateway Anycast Gateway Anycast Gateway
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 455
Programmable Fabric Multi-X Connectivity (DCI)
VXLAN Multi-Site
2017+
Fabric #1Domain 1
EVPN Control-Plane BGP EVPN Fabric #2Domain 2
EVPN Control-Plane
Overlay Overlay
VTEP VTEP VTEP VTEP VTEP VTEP VTEP VTEP
DCI
Data-Plane Domain 1 Data-Plane Domain 2
Data-Plane
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 456
VXLAN Multi-Site
Main Use Cases
Scale-Up Model to Build a
Large Intra-DC Network
• Anycast Border Gateways (supported since day 1)
and recommended for interconnecting VXLAN EVPN
fabrics
• VPC Border Gateways (supported since 9.2(1)) Site 1
Site 1
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 458
VXLAN Multi-Site VPC Border Gateways
DCI Use Cases
Multi-Site VIP
(‘evpn multisite fabric-tracking’ command)
[Link]
If all the Site-Internal interfaces are detected as
BGW BGW BGW BGW down:
The isolated BGW stops advertising PIP/VIP
VTEP VTEP VTEP VTEP
1.
PIP-BGW2 PIP-BGW3 PIP-BGW4
addresses toward the Site-External network
[Link] [Link] [Link]
2. The remaining BGWs perform new DF elections for
Site-Internal
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 460
VXLAN Multi-Site
Failure Detection on Anycast BGWs – DCI Isolation
DC Core
(Layer-3 Unicast) The Site-External interfaces on BGW nodes are
also tracked to determine their status (‘evpn
Site-External
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 461
Graceful Insertion and Removal Example
VXLAN BGP EVPN Leaf vPC VTEP Isolation with GIR
• Use automatic profile to go into GIR.
S10 S20
//Enter maintenance mode using the system
mode maintenance command:
switch# configure terminal
switch(config)# system mode maintenance
Following configuration will be applied: VXLAN BGP EVPN
ip pim isolate
router bgp 1
isolate
router ospf UNDERLAY
isolate
vpc domain 1000
shutdown
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 464
Graceful Insertion and Removal Example
VXLAN BGP EVPN Leaf vPC VTEP Isolation with GIR
• Use automatic profile to come out of GIR.
S10 S20
//Enter maintenance mode using the system
mode maintenance command:
switch# configure terminal
switch(config)# no system mode maintenance
Following configuration will be applied: VXLAN BGP EVPN
vpc domain 1000
no shutdown
router ospf UNDERLAY
no isolate
router bgp 1
no isolate
no ip pim isolate
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 465
Graceful Insertion and Removal Example
VXLAN BGP EVPN Spine RR Isolation with GIR
• Use automatic profile to go into GIR.
S10 S20
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 466
Programmable Fabric HA Takeaways
• Programmable fabric HA design
• Spine leaf L3 IP fabric with ECMP
• VXLAN EVPN fabric with Anycast GW
• vPC for host
• Multiple DC fabric HA design
• VXLAN Multi-Site
• Follow configuration and operational best practices to minimize down time
for different failure scenarios
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 467
Programmable Fabric Resources
For Your
Reference
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 468
Agenda
• Enterprise Data Center High Availability (DC HA)
• DC Switch NX-OS HA Architecture and HA Features
• DC Network HA Design and Operational Best Practices
• Legacy DC with vPC
• Programmable Fabric
• Application Centric Infrastructure (ACI)
• Programmable Network
• Key Takeaways
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 469
Cisco Data Center Network Solutions
Classic Ethernet Programmable Fabric Application Centric Programmable
& VPC Infrastructure Network
DB DB
• VXLAN-based
• Forwarding, Multi-Tenancy &
Security
• Integrated Controller with
Enhanced APIs
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 470
ACI Fabric Underlay HA Design
• Zero touch provision Application Policy Infrastructure Controller
• Structured design
• Layer 3 IP fabric with point-to-
point link: ACI
Fabric
• Better stability, faster convergence
• Redundant links with ECMP
• Scale out spine leaf design
• Better scalability and availability SVI IP Address
MAC: 0000.1111.2222
IP: [Link]
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 471
ACI Fabric Overlay HA Design
• eVXLAN EVPN based overlay Application Policy Infrastructure Controller
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 472
ACI Fabric Host vPC HA Design
ACI Spine Nodes
ACI Fabric
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 473
• Differences between ACI vPC and standard
vPC in ACI Fabric vPC
• No Peer Link is required
• Peer communication/path recovery
ACI Fabric Services (ZMQ) happens via the Fabric
• CFS (Cisco Fabric Services) is replaced by
vPC Anycast vPC Anycast
VTEP
IFS (ACI Fabric Services) which is based
VTEP
on Zero Message Queue (ZMQ)
VTEP VTEP • Forwarding selection (which peer will
forward a frame
• Within the Fabric the vPC interfaces use an
anycast VTEP which is active on both vPC
peers
Host or Switch
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 474
ACI Port Tracking Policy for Uplink Failure
Detection
• The port tracking policy specifies
• Number of uplink connections that trigger the policy
• A delay timer for bringing the leaf switch access ports back up
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 475
ACI Fabric Convergence Improvement
• Convergence improvement from sub-seconds to 200ms for ACI3.1
• With new Cloudscale ASIC N9Ks
• Failure scenarios with convergence improvement:
• Fabric (between leaf and spine): link failure, Spine reload/upgrade, Spine linecard
reload, Leaf reload/upgrade, power failure of Spine
• Access link/node with vPC or portchannel
• External (Border Leaf) connectivity (L3 out): link failure, Border Leaf
reload/upgrade
• Achieved by special ASIC capability and software design
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 476
ACI Fabric Convergence Improvement
• Uncovered failure scenarios:
• Double failure, L2/L3 multicast, copper links, process crashes on
Leaf/Spine/Border Leaf, etc.
• Convergence for traffic from EP to ACI fabric is dependent on how fast the
EP is able to divert traffic to ACI Leaf
• Convergence for traffic from external node to ACI fabric is dependent on
how fast external node is able to divert traffic to ACI
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 477
Fabric Fast Convergence - Enable LBX
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 478
Access Link with vPC or PortChannel Fast
Convergence - Debounce Policy Configuration
Reduce debounce timer from default 100ms to 10msfor faster convergence under Fabric Access Policy
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 479
ACI Fabric Fast Convergence Best Practices
• Always use vPC
• Distribute scale
• 100 L3out per Leaf
• 50 BD per Leaf
• Use static EPG instead of L2out
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 480
ACI Fabric Maintenance Mode
• New decommission option from ACI3.0(1k)
• To help to isolate the switch in ACI fabric with keeping
management access to the switch
• Prior to ACI3.0(1k): decommission options are Regular or
Remove from controller.
• The switch reboots and is wiped out of all the configuration
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 481
Spine Node Maintenance Mode Decommission
• IS-IS on spine advertises routes with max matric
• OSPF, EIGRP and BGP do graceful shutdown on IPN/GOLF link
GOLF
IPN
IPN ports are still up but
OSPF neighbor is down.
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 482
Spine Node Insertion Recommission
• Spine switch reboots and is wiped out of all the configuration
• After the switch comes up and is discovered by APIC, the policy is
programmed on the switch
• After the switch configuration is done, the switch establishes IS-IS, OSPF
and BGP peers. Then the switch will be in active forwarding path. Max
metric will be set 10 mins during startup. Thus, internal traffic will be less
preferred for 10 mins.
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 483
Leaf Node Maintenance Mode Decommission
• IS-IS on Leaf node advertises route with max metric
• OSPF, EIGRP and BGP do graceful shutdown
• vPC shuts down Keep-Alive & Peer Link
• Shutdown all front panel ports and directly connected IFC ports (Cuts Laser
on the Port)
Set max metric
Traffic goes through
different paths
vPC shutdown
Shutdown front panel ports
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 484
Leaf Node Insertion Recommission
• The switch reboots and is wiped out of all the configuration
• After the switch comes up and is discovered by APIC, the policy is
programmed on the switch
• After the switch configuration is done, the switch will establish IS-IS, OSPF
and BGP peers. Then the switch will be in active forwarding path. Max
metric will be set for 10 mins during startup. Thus, internal traffic will be
less preferred for 10 mins.
• There is a 2 min delay before we bring up the vPC ports.
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 485
ACI 3.0 Release
ACI Multi-Site
VXLAN
Inter-Site
Network
MP-BGP - EVPN
Multi-Site Orchestrator
Site 1 Site 2
REST
GUI
API Availability Zone ‘B’
Availability Zone ‘A’
• Separate ACI Fabrics with independent APIC clusters • MP-BGP EVPN control plane between sites
• No latency limitation between Fabrics • Data Plane VXLAN encapsulation across
• ACI Multi-Site Orchestrator pushes cross-fabric sites
configuration to multiple APIC clusters providing • End-to-end policy definition and
scoping of all configuration changes enforcement
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 486
ACI Multi-Site
Main Use Cases
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 487
ACI Policy Upgrade
• Ability to upgrade all switches and controllers in the fabric from one place,
with a single click
• Requires the upload of the new controller and switch image
• Then, create a firmware group
• Finally, Create Maintenance groups as needed to define which switches
get upgrade at what time
• Controllers are upgraded through a different “Controller Firmware” Policy
• Controllers are kicked off at the same time (sort of like a single maintenance
group) and upgrade sequentially.
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 488
ACI Maintenance Group Logic
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 489
ACI HA Key Takeaways
• ACI is a turnkey solution for Data Center fabric with built in HA and full
automation
• ACI integrates all the best practices and lessons we learned from previous
technologies
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 490
ACI Fabric Resources
For Your
Reference
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 491
Agenda
• Enterprise Data Center High Availability (DC HA)
• DC Switch NX-OS HA Architecture and HA Features
• DC Network HA Design and Operational Best Practices
• Legacy DC with vPC
• Programmable Fabric
• Application Centric Infrastructure (ACI)
• Programmable Network
• Key Takeaways
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 492
Cisco Data Center Network Solutions
Classic Ethernet Programmable Fabric Application Centric Programmable
& VPC Infrastructure Network
DB DB
• Open NX-OS
• Enhanced APIs and
Automation Ecosystem
(DevOps)
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 493
Nexus Device Programmability
• Power on Auto Provisioning (PoAP)
• On-box Python Scripting
• NX-OS Software Development Kit (SDK)
• Configuration Management Tools
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 494
Cisco Nexus Power on Auto Provisioning (PoAP)
Default
Gateway
Reboot if needed. Switch up
Power up Phase: Start Power
and running with the 1 On Auto-Provisioning Process
downloaded image and
5
config
Nexus Switch
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 495
Deploy and Manage POAP Using DCNM..
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 496
Deploy using POAP Script
• Download POAP script from github:
• https://[Link]/datacenter/nexus9000/blob/master/nx-os/poap/[Link]
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 497
Nexus 9000 Programmability
On-box Python
• Python script can be run in interactive or non-interactive mode
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 498
Python Usecases
“Off-Box” Python “On-Box” Python
Linux Server
Python
SSH/NETCONF
Python
NX-OS Device
NX-OS
NX-OS NX-OS Device
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 499
Auto Back-up Use Case
“On-Box” Python and EEM Cisco Nexus 9000 Python SDK User Guide:
[Link]
sdk-user-guide-and-api-reference
Nexus 93xx
EEM
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 500
NX-OS SDK (Software Development Kit)
• NX-OS SDK enables on-box custom applications to access NX-OS native
functionality
Nexus 9K
Custom Applications Existing 3 rd Party
(Python, C++ etc..) Linux Applications
Linux – Native
Shell or Guest Linux
Shell Networking
Stack
NX-OS
CLI
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 501
Nexus Programmability
Configuration Management Tools
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 502
NX-OS Programmability Resources
For Your
Reference
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 503
Data Center HA
Key Takeaways
High Availability Enterprise Data Center Design
Key Principles
• Follow HA design and operational best practices to minimize network
downtime
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 505
Maren Kostede
Dana Daum Technical Solutions Architect
Communications Architect
Junmei Zhang
Technical Marketing Eng.
Samer Theodossy
Principal Engineer
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 507
Reconvergence
Effect on “Mission-Critical”, Real-Time Operations
• And how it would have looked with … standard HSRP timers …
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 508
Reconvergence
Effect on “Mission-Critical”, Real-Time Operations
• And how it would have looked with … 3-second reconvergence …
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 509
Reconvergence
Effect on “Mission-Critical”, Real-Time Operations
• And how it would have looked with … 500-msec re-convergence …
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 510
Published design guides
[Link]/go/cvd
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco Webex Teams
Questions?
Use Cisco Webex Teams (formerly Cisco Spark)
to chat with the speaker after the session
How
1 Find this session in the Cisco Events Mobile App
2 Click “Join the Discussion”
3 Install Webex Teams or go directly to the team space
4 Enter messages/questions in the team space
[Link]/ciscolivebot#TECCRS-2001
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 512
Complete your online
session survey
• Please complete your Online Session
Survey after each session
• Complete 4 Session Surveys & the Overall
Conference Survey (available from
Thursday) to receive your Cisco Live T-
shirt
• All surveys can be completed via the Cisco
Events Mobile App or the Communication
Stations
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 513
Continue Your Education
TECCRS-2001 © 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public 514
Thank you