Principles

Differentiate physical network topologies
- Differentiate physical network trends
- Understand the purpose of a Spine node
- Understand the purpose of a Leaf node
Differentiate virtual network topologies
- Enterprise
- Service Provider Multi-Tenant
- Multi-Tenant Scalable
Given a specific physical topology, determine what challenges could be addressed by a VMware NSX implementation
Differentiate physical/virtual QoS implementation
Differentiate single/multiple vSphere Distributed Switch (vDS)/Distributed Logical Router
implementations
Differentiate NSX Edge High Availability (HA)/Scale-out implementations
Differentiate Separate/Collapsed vSphere Cluster topologies
Differentiate Layer 3 and Converged cluster infrastructures

References

vmware-nsx-datasheet.pdf

http://www.vmware.com/files/pdf/products/nsx/VMware-NSX-Datasheet.pdf

vmware-nsx-network-virtualization-platform-white-paper

http://www.vmware.com/files/pdf/products/nsx/VMware-NSX-Network-Virtualization-Platform-WP.pdf

VMware-SDDC-Micro-Segmentation-White-Paper.pdf

http://blogs.vmware.com/networkvirtualization/files/2014/06/VMware-SDDC-Micro-Segmentation-White-Paper.pdf

vmw-nsx-network-virtualization-design-guide.pdf

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/nsx/vmw-nsx-network-virtualization-design-guide.pdf

Differentiate physical network topologies

Differentiate physical network trends

The traditional hierarchical 3 tier network topology: Core, Distribution, Edge is giving way to a flatter 2 layer Spine/Leaf topology.

The Core and Distribution layers are effectively being collapsed into a single layer – the Spine. Edge devices are now connected as Leaf nodes to this new collapsed core as shown below.

SDDC and Cloud computing has led to a large increase in East-West Intra-Datacenter traffic. Spine/Leave architectures are entirely routed and permit Equal Cost Multipath (ECMP) to be deployed to improve bandwidth utilisation inside the Datacenter core.

Benefits of Spine/Leave Architectures:

Improved Scalability

By limiting the dependence on Layer 2, the usual drawbacks of Spanning Tree and VLAN configuration are reduced. Layer 2 domains can expand beyond the 4092 VLAN limit.

Better network utilisation

Due to better scalability and mobility of workload placement, networks are better utilised and oversubscription is reduced i.e. hot-spots can be managed to better distributed the load across the Spine and Leaf nodes

Reduced configuration overhead

Once the core network has been configured the amount of network reconfiguration needed to expand capacity is reduced.

Standardised Design

The Spine/Leave architecture is standardised, thereby making it relatively straightforward to expand or contract the network according to needs

Understand the purpose of a Spine node

Spine Nodes from the backbone of the network that provide connectivity to the Leaf nodes

Links to Leaf nodes are all Layer 3 Point to Point routed connections
Spine nodes are not connected to one another
Provides seamless interconnect between racks

Understand the purpose of a Leaf node

Leaf Nodes provide connection to Physical Servers. A given Leaf node is connected to each Spine node providing full mesh. Uplinks to Spine switches can be bundled with Link Aggregation Control Protocol (LACP) to improve bandwidth.

Differentiate virtual network topologies

Enterprise

In an Enterprise Topology, various applications can share the same subnet located behind a common DLR. An Edge Services Gateway can be used to demark the boundary to the Physical Network although this is not strictly necessary.

Service Provider Multi-Tenant

In a Service Provider environment, Tenant networks can be segregated by placing them behind an Edge Services Gateway as shown below.

Multi-Tenant Scalable

The following provides a more scalable Service Provider Topology by removing the dependence on the number of available Interfaces on a single ESG for the DLR uplinks. Each DLR peers with a sub-interface on the ESG. BGP and Static routes are supported in NSX 6.1 with OSPF supported from 6.1.3 onwards.

Differentiate physical/virtual QoS implementation

In a Virtualised Environment, the Hypervisor is a trust boundary for the QoS packet marking. Therefore traffic reclassification at the server facing Leaf port is not necessary. Policing is still carried out in the Physical infrastructure. Traffic types carried in the virtual environment include:

Tenant
Storage
Management

NSX supports L2 CoS and L3 DSCP markings. DSCP markings from a VM can be trusted or over-written at the Logical Switch level. The resultant DSCP value is always carried in the outer IP header of VXLAN encapsulated frames.

Differentiate single/multiple vSphere Distributed Switch (vDS)/Distributed Logical Router implementations

Single-Multiple DVS Implementations

NSX Enabled vSphere clusters may be connected either to a common Distributed Switch or to separate switches. In order for workloads to communicate they must be in a same Transport Zone, regardless of whether they are configured on a single or dedicated switches.

When a vSphere cluster is NSX Enabled, NSX Manager allows the user to select the vSwitch to be used for that cluster. Therefore it is possible to have the Edge and Tenant clusters configured on dedicated switches if desired.

Advantages of separating Edge and Compute switches:

Operational Flexibility

Administration tasks can be assigned to teams based on their role e.g. Compute Team and Edge/Infrastructure Team

Flexible Uplink Topology

Uplink connectivity can be tailored on a per Cluster/DVS basis

vMotion Boundary Control

The vMotion boundary is the DVS. Therefore VMs can never cross from the Compute to Edge cluster or vice versa even though they are in the same Transport Zone.

Flexible VTEP Configuration

The number of VTEPs, VLAN and IP Address scheme can be set on a per-cluster basis

VLAN Isolation

VLANs configured on the Edge cluster do not have to be presented to the Compute cluster and vice versa

Single-Multiple DLR Implementations

A DLR may be connected either directly to the physical network or via an Edge Service Gateway. When deploying multiple DLRs, note that the following Topologies are not supported.

Hierarchical DLR

DLRs should not be connected in a hierarchical manner as shown below. An ESG should be used in place of “DLR Instance 3” instead.

Shared DLR -> ESG Peering

DLRs are not permitted to share a VXLAN for peering purposes as shown below. Each DLR should connect to the ESG over a separate VXLAN. Where more than 9 DLRs are need, either the Scalable Topology may be used or ESG Trunk Interfaces as discussed above.

Differentiate NSX Edge High Availability (HA)/Scale-out implementations

Active/Standby – Stateful Edge Services

In the Active/Standby HA model, an NSX is deployed with HA enabled as shown below.

Only one ESG forwards traffic at a time with the second unit in Standby mode
Keepalive messages are exchanged between Edges and the secondary unit
Standby Edge does not receive routing or forwarding information
Provides Stateful Edge Services e.g. Firewall

Keepalives are exchanged over an Internal Interface on the Edge
At least 1 Internal Interface is required for HA to work

Active

Under normal operating conditions, both Control and Data plane traffic traverse the Active Edge.

External routing adjacencies established with northbound router over External VLAN
Internal routing adjacencies established with DLR over the Transit VXLAN
ECMP configured on 2 x northbound connections for resilience
- Each uplink should be on a separate VLAN

Failover

Following failover, all traffic switches to the previously Standby Edge.

Dead Interval Timer default = 15s
- Minimum = 6s
- Min Recommended = 9s
Traffic continues to flow during failover with last known good configuration on Standby Edge
Synchronised services include:
- Firewall
- Load Balancer
- NAT

Traffic must be re-directed to the activated Edge by both the DLR and northbound routers
Graceful Restart allows an Edge to refresh routing adjacencies with a DLR and Physical Router without bringing adjacencies down
Hold-Down Timers:
- Must be long enough for routing services to start on standby Edge
- Timeout leads to adjacency dropping and hence second outage
- Timers should match Physical Router and DLR

Routing Protocol Timers	Default (seconds)	Recommended (seconds)
OSPF Hello Interval	10	30
OSPF Dead Interval	40	120
BGP Keepalive	60	60
BGP Hold Down	180	180

MAC Address change following failover means:
- Activated Edge sends Gratuitous ARPs (GARPs) on peering Interfaces to allow Physical Router and DLR to update MAC Tables
Recommend enable vSphere HA + DRS

Active/Active – ECMP

Equal Cost MultiPath operates in an Active/Active manner providing higher bandwidth and faster convergence compared with Active/Standby HA.

Active

ECMP is stateless, meaning that traffic cannot pass through stateful services such as Firewall due to asymmetric traffic flow i.e. traffic may return on a different Edge to that which it originally passed out on.

Supports up to 8 equal cost paths
Up to 80Gbps per tenant from DLR -> ESG
Reduced traffic outages due to increased resiliency
DLR distributes traffic based on hash of Source and Destination IP

Failover

Traffic outages are reduced due to the increased resiliency on offer with 8 simultaneously active paths.

When E1 fails:
- Traffic continues to flow towards it until the hold down time expires
- Adjacencies recalculated by DLR and Physical Router
- Flows redirected to remaining nodes
Minimum Routing Protocol Timers:
- Hello Interval: 1 Second
- Hold Down: 3 Seconds

Routing Protocol Timers	Default (seconds)	Recommended (seconds)
OSPF Hello Interval	10	1
OSPF Dead Interval	40	3
BGP Keepalive	60	1
BGP Hold Down	180	3

Deployment Considerations

Deploy a minimum of 2 ECMP enabled Edges for HA
Anti-Affinity rules are manually created
Use DGW and one-armed LB for stateful services
Use vSphere HA + DRS to maintain resiliency

Control VM Failure and Recovery

Aggressive routing protocol timer configuration in ECMP mode affects DLR control VM failover

ECMP Edges bring Routing Adjacencies down within 3 seconds of Control VM failure
Edge Forwarding Tables flushed => N->S traffic flow stops
Add a Static Route on ESG with higher Administrative Distance and redistribute into routing protocol to mitigate
- Static Route becomes active until routing adjacencies are re-established

Static Routes are not required on the DLR. In the event of Control VM failure the forwarding tables remain unchanged and traffic continues to flow. Once routing adjacencies are re-established any routing updates are pushed to the controller and hence the ESXi host kernels. A routing protocol is still required on the DLR to maintain South->North routing and to avoid black holes because the control VM uses routing protocol helloes to detect Edge failures.

Static Route configuration may be automated so that every time a new workload network is added, the Edges are updated accordingly
Summary routes may also be used to minimise the routes required as shown below
Static Routes only required where ECMP and aggressive hold timers used
Not needed in Active/Passive configuration because standby VM recovers prior to routing adjacencies going down

Avoiding Dual Failure of Edge and Control VM

In the scenario below, the active DLR Control VM is located on the same host as an ECMP Edge. When the host fails, both the Control VM and Edge devices go down simultaneously.

In this scenario, the Control VM is unable to detect Edge failure because it is down. Therefore the DLR continues to forward traffic to the failed Edge until the Standby Control VM is activated. To mitigate in one of the following ways:

Deploy Control VM and Edges on different ESXi hosts + use anti-affinity rules as appropriate
Deploy Control VM on Compute cluster instead
Deploy Control VMs on NSX enabled Management Cluster

Differentiate Separate/Collapsed vSphere Cluster topologies

A typical NSX deployment consists of the following components:

Management Stack

vCenter, NSX Manager, Controllers, Cloud Management Platform (CMP)

Compute Stack

Tenant compute resources

Edge Stack

NSX Edges, DLRs

The above stacks may be deployed either in one of two ways:

Separate Cluster Topology
Collapsed Cluster Topology

Separate Cluster Topology

This is the recommended topology for larger workload environments and has separate ESXi clusters dedicated for each function.

1 x Management Cluster
1 x Edge Cluster
1+ Compute Cluster(s)

Collapsed Cluster Topology

The collapsed topology lowers cost but increases management overhead and is composed of the following:

1 x Management Cluster
1 x Compute/Edge Cluster

In this configuration the Management and Edge Cluster combines the management components with NSX Edges.

vCenter Design

Two broad categories: Small/Medium and Large.

Small/Medium

In either case there is a one to one relation between vCenter and NSX Manager
NSX Controllers are always deployed on vCenter connected to NSX Manager
Typically a single vCenter is sufficient for Small/Medium deployments as shown below

Large

In large deployments it may be necessary to deploy multiple vCenters. An enterprise may for example have a dedicated cluster for management purposes. In this case a separate vCenter is deployed on the management cluster as shown below.

Management vCenter
- NSX-vCenter-A
  - NSX Manager-A
  - NSX Controllers [deployed on Edge cluster A]
  - Compute Clusters A
- NSX-vCenter-B
  - NSX Manager-B
  - NSX Controllers [deployed on Edge cluster B]
  - Compute Clusters B

The “Management VC” above may be located on a dedicated cluster. The additional NSX vCenters in turn manage a separate set of resources outside of the Management vCenter domain. Therefore NSX Manager is registered with the relevant NSX vCenter instead of the Management vCenter. Advantages of this approach include:

Separation of duties: the management vCenter does remains outside of the NSX sphere
Reduced inter-dependence. Each NSX environment can be maintained/upgraded independently of each other
NSX Domains not directly impacted by upgrades to Management vCenter as long as vCenter to NSX Manager compatibility is maintained

Differentiate Layer 3 and Converged Cluster Infrastructures

Layer 3 Infrastructure

Traditional Layer 3 infrastructures are highly modular in nature. Edge devices are aggregated at the Distribution Layer which also marks the L2->L3 boundary. Therefore traffic between racks connected to different distribution switches must be routed, thereby confining L2 networks to a group of hosts in a Pod. This approach works well when most of the traffic in North/South to and from the datacentre.

Converged Cluster

VXLAN networking combined with the advent of Spine/Leaf network architectures allow the span of Layer 2 networks to be extended beyond a group of Pods. The scaling limits imposed by the hierarchical nature of traditional networks gives way to a flatter topology whereby a layer logical network may now span across a datacentre or even across sites. North/South traffic is directed through a set of dedicated hosts and switches at the Edge of the network – also known as “Border Leafs”.