0% found this document useful (0 votes)
15 views62 pages

Nexus 9000 Tahoe Platform Insights

The document provides an overview of the Cisco Nexus 9K architecture, focusing on the Tahoe platform's Cloud Scale ASICs and their components, including slice architecture and packet flow mechanisms. It discusses various troubleshooting tools, including Ethanalyzer and SPAN, and highlights the importance of monitoring and health-checking features like On-Board Failure Logging and Control-Plane Policing. Additionally, it covers the use of consistency checkers and the Virtual TAC Assistant for efficient troubleshooting in network environments.

Uploaded by

alphonse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views62 pages

Nexus 9000 Tahoe Platform Insights

The document provides an overview of the Cisco Nexus 9K architecture, focusing on the Tahoe platform's Cloud Scale ASICs and their components, including slice architecture and packet flow mechanisms. It discusses various troubleshooting tools, including Ethanalyzer and SPAN, and highlights the importance of monitoring and health-checking features like On-Board Failure Logging and Control-Plane Policing. Additionally, it covers the use of consistency checkers and the Virtual TAC Assistant for efficient troubleshooting in network environments.

Uploaded by

alphonse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

#CiscoEngage

Cisco Nexus 9K Architecture


Seminar
Introduction and Troubleshooting on Tahoe Platform

Eason Xiao(肖思源)
Technical Consulting Engineer

#CiscoEngage
• Cloud Scale ASIC Overview
• Cloud Scale Packet Flow
• Monitor and Health-Check

Agenda • Troubleshooting Tools


• Unicast Forwarding On Tahoe ASICs
• General NX-OS troubleshooting

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cloud Scale ASIC Overview
Could Scale/Tahoe Character
• The ASICs are designed with 16 FinFET Plus(16FF+) process
• The ASICs offer high-density 100G and 40G
• It is targeted for both stand-alone and ACI Platform
• The ASICs use a slice architecture with shared memory output
queuing system
• Multiple additional features geared towards Tetration, Netflow, SDN.

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cloud Scale ASIC Components
• Each ASIC has a similar architecture and is made of 3 major
components
ü Slice Component
ü Global Component aka (Slice Interconnect)
ü IO Component

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
What Is Slice ?
ASIC
• Self-contained forwarding
complex controlling subset of
ports on single ASIC
• Separated into Ingress and Egress
functions
• Ingress of each slice connected to
egress of all slices
• Slice interconnect provides non-
blocking any-to-any
interconnection between slices

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
What Is Slice ?

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Tahoe ASIC Components

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cloud Scale Platforms
• Nexus 9300-EX, FX/FX2, GX
• Premier TOR platforms
• Full Cloud Scale functionality
• ACI leaf / standalone leaf or spine
• FX option with MACSEC using LS1800FX silicon
• FX2 option with key enhancements using LS3600FX2 silicon
• GX option with 400G and SRv6

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cloud Scale Platforms
• Nexus 9500 with X9700-EX and X9700-FX Modules
• Switching modules for Nexus 9500 modular chassis
• Full Cloud Scale functionality
• ACI spine / standalone aggregation or spine
• FX option with MACSEC using LS1800FX silicon

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Nexus 9300-EX Switch Architectures

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
How To Check Slice Number

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cloud Scale Packet Flow
Packet Walk (TOR) – IP Unicast
Receive From RW SMAC/DMAC
Slice 0, transmit to Slice 1 TTL

93180YC-EX

Slice Interconnect

Slice 0 Egress Forwarding Controller


Slice 1
Ingress Forwarding Controller Buffering/
Packet Lookup Queuing/ Egress
MAC Rewrites MAC
Parser Pipeline Schedulin Policy
g

Receive Frame MTU Check Forwarding Lookup Receive from slice interconnect FCS Generation
FCS Checking Extract Header Fields Ingress ACL Buffer packet in queue
VLAN Checking Generate Lookup Key Traffic Classification AFD drops
Load Balancing Scheduling
Flow Table
#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Packet Walk (TOR) – IP Unicast

• Approximate Fair Drop (AFD) – Maintains buffer headroom per queue to maximize burst absorption
• Dynamic Packet Prioritization (DPP) – Prioritizes short-lived flows to expedite flow setup and completion

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Flexible Forwarding Tiles
• Provide fungible pool of table entries for lookups
• Number of tiles and number of entries in each tile varies between ASICs
• Variety of functions, including:
• IPv4/IPv6 unicast longest-prefix match (LPM)
• IPv4/IPv6 unicast host-route table (HRT)
• IPv4/IPv6 multicast (*,G) and (S,G)
• MAC address/adjacency tables
• ECMP tables
• ACI policy

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Classification TCAM
• Dedicated TCAM for packet classification
• Capacity varies depending on platform
• Leveraged by variety of features:
• RACL / VACL / PACL
• L2/L3 QOS
• SPAN / SPAN ACL
• NAT
• COPP
• Flow table filter

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
TCAM Region Resizing
• Default carving allocates 100% of TCAM and enables:
• Ingress / Egress RACL
• Ingress QOS
• SPAN
• SPAN ACLs
• Flow table filter
• Reserved regions

• Based on features required, user can resize TCAM regions to adjust scale
• To increase size of a region, some other region must be sized smaller

• Region sizes defined at initialization – changing allocation requires system reboot


• Configure all regions to desired size (“hardware access-list tcam region”), save configuration, and reload

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Buffering

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Queuing and Scheduling

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Slice with no receivers MET lookup Replication Policy and rewrites for

Packet Walk (TOR) – Multicast


drops packet to local OIL each copy

Slice 2 Slice 3
Egress Forwarding Controller Replication
Egress Forwarding Controller
Buffering/
Packet Lookup Egress
MAC Queuing/ Rewrites MAC
Parser Pipeline Scheduling Policy
Buffering/
Queuing/
Scheduling

Replication to all
slices
(*,G) or (S,G) lookup Slice Interconnect
RPF check
MET pointer

Slice 0 Buffering/ Slice 1


Ingress Forwarding Controller Queuing/ Egress Forwarding Controller
Scheduling
Packet Lookup Egress
MAC Rewrites MAC
Parser Pipeline Policy
Replication

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Monitor and Health-
Check
Hardware Diagnostics
• Diagnostic tests status and testing intervals:

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
On-Board Failure Logging (OBFL)
• OBFL logs failure data to
persistent storage
• Persistent storage: Non-volatile
flash memory on the modules.
Accessible for future analysis.
• Enabled by default for all
features
• As OBFL Flash supports limited
numbers of Read-Write
operations, choose key set of
features for logging.
#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
On-Board Failure Logging (OBFL)

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Control-Plane Policing (CoPP)
• Choose either strict (default), moderate, lenient or dense
policy.
• CoPP is performed per forwarding-engine. Configure
rates to make sure the aggregate traffic doesn’t
overwhelm CPU.
• Monitor drop counters continuously and justify drop
counters.
• Remember… CoPP configuration is an on-going
process.

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Control-Plane Policing (CoPP)

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Control-Plane Policing (CoPP)
• Check violated under each classes
• Do ‘clear copp statistics’ and check again
• Record the result!

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Hardware Rate-Limiters (HWRL)
• Rate-limiters prevent redirected-due-to-exception packets from
overwhelming CPU. E.g., ACL Log or Layer3 Glean
• Clear stats with “clear
hardware rate-limiter …”

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Monitor and Health Check - Summary
• OBFL helps to keep an eye on the systems’ events and exceptions.
Critical for analysis.
• Monitoring resource usage is critical, and it helps to implement
precautionary measures
• Fine-tune CoPP and HWRL to protect control-plane and ensure
stability
• Never underestimate the power of syslog (show logging log),
interface counters and errors (show interface) , memory/CPU usage
(show process memory/CPU) or system LED.

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Troubleshooting Tools
Ethanalyzer
Process and Configuration
(1) Identify Capture Interface
• mgmt – captures traffic on mgmt0 interface
• Inband - captures traffic sent to and received from the control-
plane/CPU
(2) Configure Filter
• Display-Filter – captures all traffic but displays only the traffic meeting the
criteria
• Capture-Filter - captures only the traffic meeting the criteria
(3) Define Stop Criteria
• By default, it stops after capturing 10 frames. Can be changed with
limitcaptured-frames configuration. 0 means no limit, runs until user
issues cntrl+C
#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Ethanalyzer
Introduction
• Built-in tool to analyze the traffic sent and received by CPU. Helpful to
troubleshoot High CPU or Control-plane issues like HSRP failover or OSPF
adjacency flaps.
• Based on tshark code
• Two filtering approaches for configuring a packet capture

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Ethanalyzer
Putting It All Together

N9K# ethanalyzer local interface inband display-filter "stp" limit-captured-frames 0 capturering-


buffer filesize 200 write bootflash:stp_ring.pcap display autostop files 5

• Captures on the inband interface


• Uses a display-filter searching for “stp” frames
• Sets limit-captured-frames to zero to allow continuous capturing of frames
• Uses a capture-ring-buffer to create a new file every 200 KB
• Write files to bootflash:stp_ring.pcap, adding a timestamp as a prefix
• autostop after 5 files have been created

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
SPAN to CPU
Introduction and Configuration
• Switch Port ANalyzer (SPAN) mirrors the
traffic from source ports/VLANs to
destination port(s).
monitor session 1
source interface eth1/1
destination interface eth1/6

• In SPAN to CPU, the destination port is


the CPU in the switch.
monitor session 1
source interface eth1/1
destination interface sup-eth 0

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
SPAN to CPU
Things to Know
• All SPAN replication is done in the hardware with no impact to CPU
• SPAN packets to CPU are rate-limited, and excess packets are
dropped in the inband path. Use “hardware rate-limiter span …”
command to change the rate.
• SPAN is not supported for management ports

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Consistency Checkers
What it does?
Consistency Checkers compares the
software state against the hardware state
for consistency, and report PASSED or
FAILED.

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Consistency Checkers
Example – Unicast Route and vPC
• Consistency-Checker for an IP address. Same can be used for a
prefix.

• Consistency-Checker for vPC

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Virtual TAC Assistant
Commands Cascading
• What is it?
• It takes output and parameters from one command and pass them on to
the next command as inputs and cascade them through the entire
sequence of troubleshooting.
• How it helps with troubleshooting?
• speeds up troubleshooting
• avoids missing out commands
• avoids entering wrong commands inputs
• no need to know the procedure or methodology

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Virtual TAC Assistant
L2 MAC – Command Options

N9K-C93180YC-EX-1# show troubleshoot ?


l2 Display l2 information
l3 Display l3 information

N9K-C93180YC-EX-1# show troubleshoot l2 ?


mac MAC address
port-channel Switched Port Channel
N9K-C93180YC-EX-1# show troubleshoot l2 mac ? N9K-C93180YC-EX-1# show troubleshoot l2 mac
E.E.E Address (Option 1) 000a.000a.000a vlan 100 ?
EE-EE-EE-EE-EE-EE Address (Option 2) <CR>
[Link] Address (Option 3) > Redirect it to a file
[Link] Address (Option 4) >> Redirect it to a file in append mode
detail Print detailed debugging info for mac/interface
| Pipe command output to filter

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Virtual TAC Assistant
L2& L3 – Command Options

N9K-C93180YC-EX-1# show troubleshoot l3 ?


ipv4 Choose IPv4 address
ipv6 Choose IPv6 address

N9K-C93180YC-EX-1# show troubleshoot l3 ipv4 [Link] ?


src-ip Source IP for routing hash CLI
vrf Check routes for a specific VRF

N9K-C93180YC-EX-1# show troubleshoot l3 ipv4 [Link] vrf default ?


> Redirect it to a file
>> Redirect it to a file in append mode
| Pipe command output to filter

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Virtual TAC Assistant
Example - L3 IPv4

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Virtual TAC Assistant
Example – ECMP Hardware Programming Failure Detection

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Port ACL / Router ACL
Tool and Requirements
• For intermittent packet loss issue
specifically in scenarios where the
exact packet count can be defined,
Router ACL (RACL) and Port ACL
(PACL) can be a useful tool
• Requires TCAM allocation for PACL
followed by switch reload.

#CiscoLive © 2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Elam Capture
• Useful tool to see the packet flow in the hardware
• Steps:
• Configure the Elam Capture trigger
• Start the Elam Capture
• Send the packets to trigger the capture
• View the elam capture report
• NOTE: Customer must always use ELAM with TAC Assistance
If TAC engineer Need assistance from Development to analyze
ELAM out put then they must Capture out put with “Detail” key word

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
ELAM configuration post NX-OS 7.0(3)I7(2)

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Elam Capture....Cont.

N9K-C93180YC-EX N9K-C93180YC-EX
[Link] [Link]
E1/1 E1/1

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Unicast Forwarding On Tahoe ASICs
Day in the life of mac learn notification
• There is NO hardware learning in
Tahoe platforms
• All learning is done through software.
• The mac-addresses to be learnt in the
packets trigger notifications
• The notification sent from the HW on
not finding the mac-address in the tiles
is controlled by the bloom-filter.
• After processing by MTM and L2FM,
finally the learnt mac is written into the
hardware.
• The tiles assigned to L2 will store the
mac-addresses programmed in the MTM:MAC Table Manager
L2FM:L2 Feature Manager
HW
#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Layer 2-Troubleshooting
• show mac address-table

• show hardware mac address-table 1

• show system internal l2fm info summary

• show system internal l2fm l2dbg macdb address [Link] vlan X

• slot 1 show system internal mtm l2dbg macdb address [Link] vlan X

• slot 1 show hardware internal tah sdk-l2 entries

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Path of the Packet – Layer 3
Troubleshooting communication failure
for traffic flowing through Nexus 9300

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Path of the Packet
L3 Flow – Check SW/HW FIB

Check Forwarding Information Base (FIB) in Software

Check Forwarding Information Base (FIB) in Hardware make sure the results are matching

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Path of the Packet Table #1 is for default VRF.
To find table number for
L3 Flow – Route Programmed in ASIC other VRFs, use “show
hardware internal tah L3
Entry in Tahoe-Sugarbowl Routing Table(HW) v4host” command

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Path of the Packet
L3 Flow – Adjacency Programmed in ASIC
Adjacency Information in Software

Adjacency Information in Hardware make sure the results are matching

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Layer 3 - troubleshooting
Show ip route xxxx

show forwarding route xxxxxx


show ip adjacency xxxxxx

show forwarding adjacency xxxxxx


Attach mod 1

show hardware internal tah l3 xxxxxx table x


show hardware internal tah l3 v4host

show hardware internal tah l3 adjacency xxxxxx

show consistency-checker forwarding

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
General NX-OS troubleshooting
Layer 1 and Transceivers
• Connect the cable/media at both ends, insert the transceivers completely and through
following commands verify speed, duplex, capabilities, supported modes and DOM values.
• show interface eth x/y transceiver details
• show interface eth x/y capabilities
• show interface brief - check for the interface tuple display and others
• show interface eth x/y status

• Enable auto-negotiation at both ends.


• Check transparent device or circuit in the middle, if any
• Have you checked Transceiver compatibility? Review Transceiver Compatibility Matrix at
[Link]
• Internal event-history commands can be helpful to determine which device have initiated link-
down first.
• show hardware internal tah link-events fp-port x
• show tech-support tah-usd | no-more
#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Crash and Outage
show cores
show logging onboard
show logging onboard kernel-trace
show logging onboard stack-trace
DGFG-ZQ-PB-SW-N9K-07_205K09_41# show process log

VDC Process PID Normal-exit Stack Core Log-create-time


--- --------------- ------ ----------- ----- ----- ---------------

1 tahusd 31066 N Y Y Mon Nov 16 [Link] 20 <<<<<<<<<<

core://<module-number>/<process-id>[/instance-num]
copy core://1/31066/1 ftp:

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
System Management [Link]
Choose right NX-OS version se/b_Minimum_and_Recommended_Cisco_NX-
OS_Releases_for_Cisco_Nexus_9000_Series_Switches.html

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
System Management
Scalability [Link]
y/guide-933/b-Cisco_Nexus-[Link]

#CiscoEngage ©2022 Cisco and/or its affiliates. All rights reserved. Cisco Public
Thank you

#CiscoEngage

You might also like