Objective 10.3 – Troubleshoot Common NSX component issues

Principles

  1. Differentiate NSX Edge logging and troubleshooting commands
  2. Verify NSX Controller cluster status and roles
  3. Verify NSX Controller node connectivity
  4. Check NSX Controller API service
  5. Validate VXLAN and Logical Router mapping tables
  6. List Logical Router instances and statistics
  7. Verify Logical Router interface and route mapping tables
  8. Verify active controller connections
  9. View Bridge instances and learned MAC addresses
  10. Display Logical Router instances
  11. Verify NSX Manager services status
  12. View Logical Interfaces and routing tables
  13. Analyze NSX Edge statistics

References

  1. NSX Administration Guide

http://pubs.vmware.com/NSX-62/topic/com.vmware.ICbase/PDF/nsx_62_admin.pdf

  1. NSX Command Line Interface Reference

http://pubs.vmware.com/NSX-62/topic/com.vmware.ICbase/PDF/nsx_62_cli.pdf

  1. NSX vSphere API Guide

http://pubs.vmware.com/NSX-6/topic/com.vmware.ICbase/PDF/nsx_604_api.pdf

Differentiate NSX Edge logging and troubleshooting commands

Logging

show log [follow | reverse]

show log routing [follow | reverse]

follow Update the displayed log

reverse Show the log in reverse chronological order

Troubleshooting

debug packet [display|capture] interface <interface> [expression]

See Objective 10.1 for details

show process <list | monitor>

aka “top”

list List all currently running processes on the NSX Edge
monitor Continuously monitor the list of processes

show rpfilter

  1. Disable ‐ no reverse path confirmation will be performed
  2. Strict ‐ confirms the source address is reachable via the same interface from which the packet arrived
  3. Loose ‐ confirms the source address is reachable via any interface.

show rpfstats

Shows the reverse path filter statistics e.g.

show rpfstats

RPF drop packet count: 231

show service all

Show the status of all services

show service <service>

Show details status of services listed under “show service all”

show system memory

Shows the summary of memory utilization e.g.

show system memory

MemTotal: 498676 kB

MemFree: 249400 kB

MemAvailable: 388780 kB

show system storage

Shows the disk usage details for an NSX Edge e.g.

>show system storage

Filesystem Size Used Avail Use% Mounted on

/dev/root 444M 362M 59M 87% /

tmpfs 244M 164K 244M 1% /run

/dev/sda2 43M 2.0M 39M 5% /var/db

/dev/sda3 27M 413K 25M 2% /var/dumpfiles

/dev/sda4 32M 1.6M 29M 6% /var/log

show tech-support

Shows system information for tech‐support. It shows all the information contained in tech‐support tarball file

export tech-support scp url

Exports the system diagnostics to a specific location via Secure Copy Protocol (SCP)

NSX Manager

show dfw vnic vnicID

Show all filters configured on the specified vNIC

show dfw vnic 5026c7cd-b6f3-f4bc-e533-3d4b255c6277.000

Vnic Name web-sv-01a – Network adapter 1

Vnic Id 5026c7cd-b6f3-f4bc-e533-3d4b255c6277.000

Mac Address 00:50:56:a6:7a:a2

Port Group Id dvportgroup-198

Filters nic-54466-eth0-vmware-sfw.2

show dfw vm vmID

Show the vNICs protected by distributed firewall on the specified virtual machine

show dfw vm vm-36

Datacenter: ABC Medical

Cluster: Compute Cluster A

Host: esxcomp-01a.corp.local

VM: web-sv-01a

Virtual Nics List:

1.

Vnic Name web-sv-01a – Network adapter 1

Vnic Id 5026c7cd-b6f3-f4bc-e533-3d4b255c6277.000

Filters nic-54466-eth0-vmware-sfw.2

show dfw cluster (all | clusterID)

Show clusters protected by distributed firewall

show dfw cluster all

no. cluster name cluster id datacenter name firewall status

1 compute cluster b domain-c27 abc medical enabled

2 compute cluster a domain-c25 abc medical enabled

3 management and edge cluster domain-c7 abc medical enabled

show dfw cluster domain-c25

datacenter: abc medical

cluster: compute cluster a

no. host name host id installation status

1 esxcomp-01a.corp.local host-29 ready

2 esxcomp-02a.corp.local host-34 ready

show dfw host hostID

Show the VMs protected by distributed firewall on the specified host

show dfw host host-29Datacenter: ABC Medical

Cluster: Compute Cluster A

Host: esxcomp-01a.corp.local

No. VM Name VM Id Power Status

1 web-sv-01a vm-36 on

2 br-sv-02a vm-32 off

Show VPN service details

show service ipsec (cacerts | certs | crls | pubkeys | sa | sp)

dc1pgw01-0> show service ipsec

—————————————————————-

vShield Edge IPSec Service Status:

IPSec Server is running.

AESNI is enabled.

Total Sites: 1, 0 UP, 1 Down

Total Tunnels: 1, 0 UP, 1 Down

———————————-

Site: 172.16.200.1_172.16.20.0/24-any_172.16.30.0/24

Channel: PeerIp: %any LocalIP: 172.16.200.1 Version: IKEv1 Status: DOWN

Tunnel: PeerSubnet: 172.16.30.0/24 LocalSubnet: 172.16.20.0/24 Status: DOWN Reason: UNKNOWN

ESXi

Show DLR Routing Instance configured on an ESXi Host

List Logical Routers

net-vdr –instance -l

[root@dc1esx01:~] net-vdr –instance -l

VDR Instance Information :

—————————

Vdr Name: edge-10

Vdr Id: 0x00001388

Number of Lifs: 5

Number of Routes: 4

Number of Hold Pkts: 0

Number of Neighbors: 5

State: Enabled

Controller IP: 192.168.1.104

Control Plane IP: 192.168.1.101

Control Plane Active: Yes

Num unique nexthops: 1

Generation Number: 0

Edge Active: Yes

Pmac: 00:00:00:00:00:00

List Routes for a particular Router

net-vdr -R -instance <dlr name>

[root@dc1esx01:~] net-vdr -R -l edge-10

VDR edge-10 Route Table

Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]

Legend: [H: Host], [B: Blackhole], [F: Soft Flush] [!: Reject] [E: ECMP]

Destination GenMask Gateway Flags Ref Origin UpTime Interface HitCount

———– ——- ——- —– — —— —— ——— ——–

0.0.0.0 0.0.0.0 172.16.10.1 UG 3 AUTO 40654 138800000002 1704

172.16.10.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 40654 138800000002 16

172.16.11.0 255.255.255.0 0.0.0.0 UCI 3 MANUAL 40654 13880000000a 3812

172.16.12.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 40654 13880000000b 5

Verify NSX Controller cluster status and roles

nsx-controller # show control-cluster status

Type Status Since

——————————————————————————–

Join status: Join complete 07/26 16:29:34

Majority status: Connected to cluster majority 08/07 13:34:03

Restart status: This controller can be safely restarted 08/07 13:33:53

Cluster ID: 99b4d063-0e52-4b4f-8e0e-60ccc88803e3

Node UUID: 99b4d063-0e52-4b4f-8e0e-60ccc88803e3

Role Configured status Active status

——————————————————————————–

api_provider enabled activated

persistence_server enabled activated

switch_manager enabled activated

logical_manager enabled activated

directory_server enabled activated

Cluster status from vnet-controller:

—————————-

Active cluster members

isMaster: true

uuid=99b4d063-0e52-4b4f-8e0e-60ccc88803e3, ip=192.168.1.104

Configured cluster members

uuid=99b4d063-0e52-4b4f-8e0e-60ccc88803e3, ip=192.168.1.104

nsx-controller # show control-cluster role

Listen-IP Master? Last-Changed Count

api_provider Not configured Yes 08/07 13:34:04 58

persistence_server N/A Yes 08/07 13:34:04 60

switch_manager 127.0.0.1 Yes 08/07 13:34:04 58

logical_manager N/A Yes 08/07 13:34:04 58

directory_server N/A Yes 08/07 13:34:04 58

NSX Manager

show cluster all
No. Cluster Name Cluster Id Datacenter Name Firewall Status
1 Compute Cluster A domain‐c25 ABC Medical Enabled
2 Management and Edge Cluster domain‐c7 ABC Medical Enabled
3 Compute Cluster B domain‐c27 ABC Medical Enabled
show controller list all

NAME IP State

controller-4 192.168.110.203 RUNNING

controller-3 192.168.110.202 RUNNING

controller-1 192.168.110.201 RUNNING

Verify NSX Controller node connectivity

nsx-controller # show control-cluster connections

role port listening open conns

——————————————————–

api_provider api/443 Y 2

——————————————————–

persistence_server server/2878 – 0

client/2888 Y 2

election/3888 – 0

——————————————————–

switch_manager ovsmgmt/6632 Y 0

openflow/6633 Y 0

——————————————————–

system cluster/7777 Y 0

Check NSX Controller API service

See above: api_provider

Validate VXLAN and Logical Router mapping tables

List of VXLANs/Logical Switches

NSX Manager: show logical‐switch list all

NAME UUID VNI Trans Zone Name Trans Zone ID

DC1-TENANT-EDGE-A 64f3093a-821a-4bbc-b054-7fe40f0a0e56 5000 DC1_Local vdnscope-1

LB b9986c7b-32a8-4b4e-adb6-bbfe7e98a6ae 5001 DC1_Local vdnscope-1

HA 63903c1d-b282-41c4-a2a7-2f32b89eddbb 5002 DC1_Local vdnscope-1

test b6aee4ca-d115-48f6-be59-587e768700ad 5003 DC1_Local vdnscope-1

Controller: show control-cluster logical-switches vni-table

nsx-controller # show control-cluster logical-switches vni-table

VNI Controller BUM-Replication ARP-Proxy Connections VTEPs Active

5000 192.168.1.104 Enabled Enabled 1 1 true

5001 192.168.1.104 Enabled Enabled 0 0 true

5002 192.168.1.104 Enabled Enabled 1 1 true

5003 192.168.1.104 Enabled Enabled 0 0 false

VTEP table for a VNI

Controller: show control‐cluster logical‐switches vtep-table <vni>

nsx‐controller # show control‐cluster logical‐switches vtep‐table 5000
VNI IP Segment MAC Connection‐ID
5000 192.168.250.52 192.168.250.0 00:50:56:6b:37:64 5
5000 192.168.150.52 192.168.150.0 00:50:56:60:1e:dd 2

List Logical Router instances and statistics

List of logical-routers

NSX Manager: show logical-router list all

show logical-router list all

Edge Id Vdr Name Vdr Id #Lifs

edge-5 edge-5 0x00001388 0

Controller: show control-cluster logical-routers instance all

LR-Id LR-Name Universal Service-Controller Egress-Locale In-Sync Sync-Category

0x1388 edge-5 false 192.168.1.104 local Yes NORMAL

Controller: show control-cluster logical-routers stats

messages.query 2144
messages.update 64
messages.flush 1
messages.notification 0

Verify Logical Router interface and route mapping tables

View all the interfaces CONNECTED to the logical router

If no interfaces are connected to the router, an error is shown

show control‐cluster logical‐routers interface‐summary logicalRrouter_ID

LR‐Id
1
LR‐Name
perftest
Hosts[]
10.24.106.158
Edge‐Connection Service‐Controller
10.24.105.58

Show all routes learned by a logical-router

show control-cluster logical-routers interface-summary logicalRrouter_ID

LR‐Id Destination Next‐Hop
1 70.70.70.0/24 10.0.1.2
1 80.80.80.0/24 10.0.0.2

Verify active controller connections

show control-cluster core stats

messages.received 31576

messages.received.dropped 0

messages.transmitted 31604

messages.transmit.dropped 0

messages.processing.dropped 0

connections.up 11

connections.down 9

connections.timeout 0

connections.active 2

connections.sharding.subscribed 2

View Bridge instances and learned MAC addresses

Show bridge instance information for a logical router

show control‐cluster logical‐routers bridges logicalRouterID bridgeID

LogicalRouterID and bridgeID can be “all”

LR‐Id
1
Bridge‐Id
1001
Host
10.24.106.158
Active
true

Show Bridge mac records for a bridge of a logical router

show control‐cluster logical‐routers bridge-mac logicalRouterID bridgeID

LogicalRouterID and bridgeID can be “all”

LR‐Id Bridge‐Id Mac Vlan‐Id Vxlan‐Id Port‐Id Source
1 1001 01:00:00:01:00:00 0 65535 1 vxlan

Display Logical Router instances

List of logical-routers

NSX Manager: show logical-router list all

show logical-router list all

Edge Id Vdr Name Vdr Id #Lifs

edge-5 edge-5 0x00001388 0

Verify NSX Manager services status

NSX Manager GUI -> Summary

View Logical Interfaces and routing tables

See “Verify Logical Router interface and route mapping tables”

Analyze NSX Edge statistics

NSX -> Edges -> Monitor