Principles
- Given a scenario, compare and contrast proper HA uses
- Determine service availability during an Edge High Availability failover
- Differentiate NSX Edge High Availability and vSphere High Availability
- Configure NSX Edge High Availability
- Configure heartbeat settings
- Configure management IP addresses
- Modify and existing Edge High Availability deployment
- Determine resource pool requirements for a given Edge High Availability configuration
- Configure Equal-Cost Multi-Path Routing (ECMP)
- Determine ECMP timers
- Understand process flows
- Combine ECMP with other stateful services
References
- NSX Administration Guide
http://pubs.vmware.com/NSX-62/topic/com.vmware.ICbase/PDF/nsx_62_admin.pdf
- NSX Design Guide
Given a scenario, compare and contrast proper HA uses
- Edges can be deployed Active/Active or Active/Passive
- Active/Passive: NSX Edge HA
- Active/Active: ECMP
- When using ECMP, stateful services (Firewall, LB, VPN) cannot be used
- ECMP improves throughput by balancing workloads across multiple edges
- Active/Passive allows the use of stateful services
Determine service availability during an Edge High Availability failover
- HA downtime is not zero as failover might require some services to be restarted
- Stateful services that require restart include Firewall and Load Balancer
- VPN services are interrupted during failover and clients must reconnect
- Dynamic routing may be affected during failover
- If 2 HA appliances are not able to communicate a split-brain scenario may occur whereby both edges become active
- Primary maintains a heartbeat with the secondary through an internal interface
- Default heartbeat time = 15s
- If heartbeat times out, secondary appliance becomes active and starts services
Differentiate NSX Edge High Availability and vSphere High Availability
- NSX Edge HA:
- Ensures active/standby edges are placed on different hosts
- Manual vMotion takes precedence and can result in edges on the same host
- If a previously active failed Edge comes back online, NSX Manager force syncs the configuration but it remains in standby mode
- If failed Edge does not come backup it is taken out of service and a new Edge needs to be manually deployed
- NSX Edge HA is compatible with vSphere HA
- If an NSX Edge host fail, it is restarted on the standby host
- Use anti-affinity rules to ensure Edges are automatically placed on different hosts
Configure NSX Edge High Availability
Configure heartbeat settings
- Manage -> Settings -> High Availability
- Settings:
- vNIC: Internal Interface for HA or “any”
- Multiple Edges can be connected to the same Logical Switch assigned for HA
- NSX Manager assigns unique IPs from the 169.250.0.0/16 to HA Interfaces
- Declare dead time: default = 15s
- Logging/Level: Enable HA Logging and set desired log level
- vNIC: Internal Interface for HA or “any”
Configure management IP addresses
- Management IPs:
- Override default IPs assigned by NSX Manager
- If using a common VXLAN/VLAN across multiple Edges, ensure uniqueness
Modify and existing Edge High Availability deployment
- HA configuration specified while installing NSX Edge can be modified
- From NSX 6.2.3 and later, enabling HA on an existing Edge fails if sufficient resources cannot be reserved for the second Edge Appliance VM
Determine resource pool requirements for a given Edge High Availability configuration
- Placing Edges on dedicated storage that is not over-provisioned improves performance
- Place primary & secondary appliances on separate resource pools and datastores
- All Edge services run on the active appliance
- In a cross-vCenter environment, both edges must be in the same vCenter
Configure Equal-Cost Multi-Path Routing (ECMP)
See Objective 2.1 – Differentiate NSX Edge High Availability (HA)/Scale-out implementations
Combine ECMP with other stateful services
- Use a one-armed load balancer to provide stateful services when ECMP is deployed
- Use Distributed Firewall to secure E-W traffic when ECMP is deployed
- VPN services require a second tier of NSX Edge between the ECMP Routers and DLR – compromises throughput as all traffic must now pass through a single Edge