First, I followed the lab from Nick Carlton’s blog, but I had trouble getting the multi-site setup to work.
Then, I found this article on LinkedIn that is detailed and easy to follow: Deploying Cisco VXLAN EVPN Multi-Site.
Both articles set up Border Gateway switches (BGW) on the spine and leaf architecture in two DCs. However, the BGW in the first article connects directly, while the one in the second article connects to an additional network.
The second setup helped me understand multi-site setups better.
Before proceeding with the multi-site setup, there are a few concepts I need to clarify.
IPV4 BGP is not required#
- VTEP/NVE serves as a tunnel interface.
- BGP functions as the control plane.
- The VTEP IP is advertised using BGP EVPN route type 3 across the fabric (SPINE & LEAF).
- LEAF nodes utilize this information to establish tunnels between VTEPs.
- For symmetric IRB, L3 routes are advertised using BGP EVPN route type 5, which is not classified as an IPv4 route.
- Consequently, BGP IPv4 connections are not necessary within the fabric.
Common LEAF BGP Config:
router bgp 65000
template peer SPINE
remote-as 65000
update-source loopback0
address-family ipv4 unicast
send-community extended
address-family l2vpn evpn
send-community extended
neighbor 1.1.1.1
inherit peer SPINE
It is indeed possible to configure BGP without establishing an IPv4 unicast peer relationship between LEAF and SPINE. While many tutorials include the IPv4 unicast address family in the BGP configuration, it is not a requirement for the setup.
router bgp 65000
template peer SPINE
remote-as 65000
update-source loopback0
address-family l2vpn evpn
send-community extended
neighbor 1.1.1.1
inherit peer SPINE
We expected that the command:
sh bgp ipv4 unicast sum
would return empty, indicating that no peer relationship exists with LEAF/SPINE.
Conversely, the command:
sh bgp l2vpn evpn sum
should show the peer relationship with SPINE/LEAF.
In the lab, I initially set up IPv4 BGP across the fabric. While configuring the BGW, I also advertised BGW routes into the IPv4 BGP. This led to the BGW routes propagating throughout the fabric, even though the fabric already had reachability through IGP (OSPF).
As a result, NVE peers at the remote site experienced fluctuations, with connections going up and down, and BGW routes appearing and disappearing intermittently.
This behavior suggests a potential looping or race condition within the fabric, likely due to overlapping routing protocols or unintended route propagation.
VTEP Reachability is Crucial#
In VXLAN, the primary focus is ensuring that VTEP endpoints are reachable:
- Within the site: Utilize IGP (e.g., OSPF).
- For remote sites: Implement BGP with IPv4 unicast.
In the first lab:
- Site A’s BGW is connected to Site B’s BGW.
- BGP sessions are established for both IPv4 and EVPN.
In the second lab:
- A DCI-CORE switch is positioned between the sites. (third AS)
- The DCI-CORE switch should not be aware of EVPN; it should only manage IPv4 routes to facilitate the exchange of BGW VTEP and BGW IPs between sites.
It turns out that BGW reachability and EVPN peering are distinct concepts:
- BGP IPv4 is utilized to maintain BGW reachability across sites.
- Once the BGWs achieve the reachability, they can establish BGP EVPN connections directly with one another.
In the first lab, this distinction was not clear because the BGWs were directly connected.
Asymmetric IRB does not work on NX9Kv#
- In theory, asymmetric IRB operates similarly to symmetric IRB, but it does not utilize L3VNI.
- The configuration of the VLAN SVI and anycast gateway is done in the same manner as with symmetric IRB.
Expected Behavior:
- Traffic directed to the leaf switch should be routed locally to the appropriate VNI and then switched to the destination VNI.
- For instance, traffic from VNI 10010 to VNI 10020 should be routed to VNI 10020 and encapsulated as VNI 10020.
Issue:
- In Wireshark, ARP requests are visible, but they indicate VNI 10010. The VLAN SVI is not functioning as intended, and the incorrect VNI is hindering ARP responses.
Debugging:
I examined bgp sh l2vpn evpn
, and the routes contain the necessary information:
- Route type 3 for the VTEP.
- Route type 2 for the MAC-VRF table.
- The VTEP displays both the destination and source in the IP ARP table.
The Missing Element:
- ARP resolution is not occurring correctly, resulting in traffic not being mapped to the appropriate VNI.
NX9Kv Bugs#
Interface NVE Not Recognized#
Occasionally, the int nve1
command may not be recognized, even with the NV Overlay feature enabled. The NVE interface configuration may disappear, indicating a malfunction. Restarting the switch sometimes resolves the issue, but it can be time-consuming.
Disabling and re-enabling the nv overlay
feature can temporarily fix the issue but may result in some configuration loss.
A quicker workaround is:
no feature nv overlay
copy start run
This method is faster than rebooting or full reconfiguration.
Extended Boot Time & Frequent Reboots#
The switch experiences extended boot times and frequent reboots, often triggered by recurring “kernel panic” messages due to memory depletion. Successful boots only occur when enough memory is available.
Upgrading the RAM from 8GB to 10GB significantly improves boot time and prevents further reboots.
L3VNI Configuration Problem#
When configuring L3VNI as follows:
VRF context XXXX
vni 10200 l3
This results in the interface module crashing, leading to the disappearance of all interfaces.
To resolve the issue, configure the VNI without the l3
keyword and then restart the switch:
Multi-Site Setup#
Border Gateway Devices
(BGWs) are deployed at each site, with a DC-CORE
connecting two data centers, built on a standard Spine and Leaf architecture.
The BGW configuration resembles that of a Leaf device but includes additional settings to support multi-site operations.
The BGW must config all L2VNIs that want to extend to other sites, as well as L3VNIs if L3 route exchange is necessary.
The following is the DC1-BGW setup:
Basic Config#
Enable Features
feature bgp
feature vn-segment-vlan-based
feature nv overlay
feature fabric forwarding
Create L2VNI
evpn
vni 10010 l2
rd auto
route-target import auto
route-target export auto
vni 10020 l2
rd auto
route-target import auto
route-target export auto
Create VLAN and Associate to L2VNI
vlan 10
vn-segment 10010
vlan 20
vn-segment 10020
Config SVI with Anycast Gateway
vrf context VXLAN
fabric forwarding anycast-gateway-mac aaaa.aaaa.aaaa
interface Vlan10
no shutdown
vrf member VXLAN
ip address 172.30.10.254/24
fabric forwarding mode anycast-gateway
interface Vlan20
no shutdown
vrf member VXLAN
ip address 172.30.20.254/24
fabric forwarding mode anycast-gateway
Config NVE (VTEP)
interface loopback0
ip address 1.1.1.1/32
ip ospf network point-to-point
ip router ospf 1 area 0.0.0.0
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0
member vni 10010
ingress-replication protocol bgp
member vni 10020
ingress-replication protocol bgp
Establish BGP EVPN connection with the SPINE
router bgp 65001
template peer Intra-DC
remote-as 65001
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
neighbor 1.1.1.1
inherit peer Intra-DC
Config L3VNI
vlan 100
vn-segment 10100
vrf context VXLAN
+ vni 10100
+ rd auto
+ address-family ipv4 unicast
+ route-target both auto
+ route-target both auto evpn
interface Vlan100
no shutdown
vrf member VXLAN
ip forward
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0
member vni 10010
ingress-replication protocol bgp
member vni 10020
ingress-replication protocol bgp
+ member vni 10100 associate-vrf
BGW-Specific Config#
Enable Multi-Site Gateway
We assign a border gateway ID to the BGW, where “1” represents site 1 and “2” represents site 2. Any number can be used for this purpose, as it simply helps differentiate BGWs located at different sites.
The delay restore time refers to the period needed to recover from a disconnection. A 30-second delay is applied to enable faster recovery, but this is intended solely for lab testing purposes.
Site1
evpn multisite border-gateway 1
delay-restore time 30
Site2
evpn multisite border-gateway 2
delay-restore time 30
Multisite tracking
We use dci-tracking
for the interface that connects to external fabrics, such as the DCI-CORE
or a remote BGW
.
interface Ethernet1/1
no switchport
ip address 10.11.12.11/24 tag 1
no shutdown
+ evpn multisite dci-tracking
We use fabric-tracking
for the interface that connects to the fabric, such as SPINE
or a local BGW
.
interface Ethernet1/2
no switchport
ip address 10.1.11.11/24
ip ospf network point-to-point
ip router ospf 1 area 0.0.0.0
no shutdown
+ evpn multisite fabric-tracking
Config BGW on VTEP
- Create and assign a loopback interface for the border gateway.
- Enable multisite ingress replication for all VNIs (both L3 and L2).
interface loopback100
ip address 100.1.1.1/32 tag 1
ip ospf network point-to-point
ip router ospf 1 area 0.0.0.0
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0
+ multisite border-gateway interface loopback100
member vni 10010
+ multisite ingress-replication
ingress-replication protocol bgp
member vni 10020
+ multisite ingress-replication
ingress-replication protocol bgp
member vni 10100 associate-vrf
+ multisite ingress-replication
BGW Reachability to remote site
In the lab, loopback0
serves as the VTEP IP, while loopback100
is the BGW IP, which is configured on the NVE interface.
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0 <<
multisite border-gateway interface loopback100 <<
member vni 10010
multisite ingress-replication
ingress-replication protocol bgp
member vni 10020
multisite ingress-replication
ingress-replication protocol bgp
member vni 10100 associate-vrf
multisite ingress-replication
Ethernet1/1 is the interface that connects to the external fabric (DCI-CORE for this lab).
interface Ethernet1/1 <<
no switchport
ip address 10.11.13.11/24
no shutdown
evpn multisite dci-tracking
First, we will tag these interfaces.
BGW -> DCI-CORE
interface Ethernet1/1
no switchport
- ip address 10.11.13.11/24
+ ip address 10.11.13.11/24 tag 1
no shutdown
evpn multisite dci-tracking
VTEP Loopback
interface loopback0
- ip address 11.11.11.11/32
+ ip address 11.11.11.11/32 tag 1
ip ospf network point-to-point
ip router ospf 1 area 0.0.0.0
BGW Loopback
interface loopback100
- ip address 100.1.1.1/32
+ ip address 100.1.1.1/32 tag 1
ip ospf network point-to-point
ip router ospf 1 area 0.0.0.0
If there are multiple BGWs at the site, they should config with same IP (100.1.1.1/32
).
Second, we use a route-map to redistribute all tagged networks and establish a BGP IPv4 connection between the BGW
and DC-CORE
.
route-map VXLAN permit 10
match tag 1
router bgp 65000
address-family ipv4 unicast
redistribute direct route-map VXLAN
neighbor 10.11.13.13 << this is DCI-CORE
remote-as 64999
update-source Ethernet1/1
address-family ipv4 unicast
BGP EVPN connections between BGWs
peer-type fabric-external
for BGW -> BGW peeringrewrite-evpn-rt-asn
to rewrite the ASN (65000->65001)
router bgp 65000
template peer Inter-DC
remote-as 65001
update-source loopback0
+ peer-type fabric-external
address-family l2vpn evpn
send-community
send-community extended
+ rewrite-evpn-rt-asn
neighbor 12.12.12.12 << this is Remote BGW
inherit peer Inter-DC
Expected Outputs#
Running the command:
sh nve peer
The tunnel should form with:
At the BGW:
- LEAFs
- BGWs (both local and remote)
At the LEAF:
- LEAFs
- Local BGW
Troubleshootings#
BGP EVPN#
sh bgp l2vpn evpn sum
This ensure all BGP EVPN connections are established.
If a connection is not established, follow these troubleshooting steps:
sh cdp nei
sh ip int br
This helps identify which device each interface is connected to, making it easier to spot if a wrong network or IP is assigned to an interface.
sh run bgp
Check if the correct interface is being used for the connection (update-source loX
) and if it is connected with the correct neighbor (neighbor x.x.x.x
)
Use command sh bgp l2vpn evpn
to verify EVPN routes.
- The key focus is to check for Route Type 3, which is used to create tunnels between VTEPs.
- If Route Type 3 is present, it should be reflected in the output of
sh nve peer
.
The route table typically contains many entries, concentrate on the section that includes Route Distinguisher: xxxxx (L2VNI yyyy)
.
...
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.0.1:10 (L2VNI 10010)
...
We can also filter the output by VNI using the following command:
sh bgp l2vpn evpn vni-id 10010
If routes are not propagating within the fabric, verify that the route-reflect-client
is configured, which is typically set on the SPINE
.
Alternatively:
sh bgp l2vpn evpn x.x.x.x
This can confirm if the route has been advertised to the peer.
If the route has been advertised but the receiver side does not have it, potential issues could be a mismatch in:
- VNI
- VRF
- route-target « likely forgot the config the route-target
IP reachability#
Use ping
to test reachability for all IPs, Verify that IGP is functioning properly.
To debug OSPF
sh ip ospf int br
This helps verify which interfaces are being advertised in OSPF.
sh ip ospf nei
This is useful for identifying if any OSPF peers are missing.
NVE Interface#
If NVE interface is “down”. Usually, the error message will be displayed in detail view. We can check the status with:
sh nve interface nve1 detail
Common reasons for this issue include:
- No
source-interface
assigned. - Links are down at
dci-tracking
orfabric-tracking
interfaces. - The interface is still in a shutdown state (forgot to config
no shutdown
explicitly). - Conflict
VPC
setting.
VNI#
sh nve vni
Codes: CP - Control Plane DP - Data Plane
UC - Unconfigured SA - Suppress ARP
SU - Suppress Unknown Unicast
Xconn - Crossconnect
MS-IR - Multisite Ingress Replication
Interface VNI Multicast-group State Mode Type [BD/VRF] Flags
--------- -------- ----------------- ----- ---- ------------------ -----
nve1 10010 n/a Up CP L2 [10] SA
nve1 10020 n/a Up CP L2 [20] SA
nve1 10100 n/a Up CP L3 [VXLAN]
- For L2VNI: Ensure the VNI is properly assigned to the corresponding VLAN.
- For L3VNI: Ensure the VNI is correctly associated with the appropriate VRF.
If some VNIs are missing, it’s usually due to forgetting to associate the VNI at the NVE.
Forgot to assign the VNI to a VLAN: Type [BD/VRF]
will not show the related VLAN, such as [10]
for 10010
.
Forgot to assign the VNI to a VRF: Type [BD/VRF]
will not show the related VRF, such as [VXLAN]
for 10100
.
Forgot to config the route-target: The VNI might appear as up but will not function correctly.