Skip to main content

221 - EVPN VXLAN Multi-Site 1

·
ccie notes
Hugo
Author
Hugo
DevOps Engineer in London

First, I followed the lab from Nick Carlton’s blog, but I had trouble getting the multi-site setup to work.

Then, I found this article on LinkedIn that is detailed and easy to follow: Deploying Cisco VXLAN EVPN Multi-Site.

Both articles set up Border Gateway switches (BGW) on the spine and leaf architecture in two DCs. However, the BGW in the first article connects directly, while the one in the second article connects to an additional network.

The second setup helped me understand multi-site setups better.

Before proceeding with the multi-site setup, there are a few concepts I need to clarify.

IPV4 BGP is not required
#

  • VTEP/NVE serves as a tunnel interface.
  • BGP functions as the control plane.
  • The VTEP IP is advertised using BGP EVPN route type 3 across the fabric (SPINE & LEAF).
  • LEAF nodes utilize this information to establish tunnels between VTEPs.
  • For symmetric IRB, L3 routes are advertised using BGP EVPN route type 5, which is not classified as an IPv4 route.
  • Consequently, BGP IPv4 connections are not necessary within the fabric.

Common LEAF BGP Config:

router bgp 65000
  template peer SPINE
    remote-as 65000
    update-source loopback0
    address-family ipv4 unicast
      send-community extended
    address-family l2vpn evpn
      send-community extended
  neighbor 1.1.1.1
    inherit peer SPINE

It is indeed possible to configure BGP without establishing an IPv4 unicast peer relationship between LEAF and SPINE. While many tutorials include the IPv4 unicast address family in the BGP configuration, it is not a requirement for the setup.

router bgp 65000
  template peer SPINE
    remote-as 65000
    update-source loopback0
    address-family l2vpn evpn
      send-community extended
  neighbor 1.1.1.1
    inherit peer SPINE

We expected that the command:

sh bgp ipv4 unicast sum

would return empty, indicating that no peer relationship exists with LEAF/SPINE.

Conversely, the command:

sh bgp l2vpn evpn sum

should show the peer relationship with SPINE/LEAF.

In the lab, I initially set up IPv4 BGP across the fabric. While configuring the BGW, I also advertised BGW routes into the IPv4 BGP. This led to the BGW routes propagating throughout the fabric, even though the fabric already had reachability through IGP (OSPF).

As a result, NVE peers at the remote site experienced fluctuations, with connections going up and down, and BGW routes appearing and disappearing intermittently.

This behavior suggests a potential looping or race condition within the fabric, likely due to overlapping routing protocols or unintended route propagation.

VTEP Reachability is Crucial
#

In VXLAN, the primary focus is ensuring that VTEP endpoints are reachable:

  • Within the site: Utilize IGP (e.g., OSPF).
  • For remote sites: Implement BGP with IPv4 unicast.

In the first lab:

  • Site A’s BGW is connected to Site B’s BGW.
  • BGP sessions are established for both IPv4 and EVPN.

In the second lab:

  • A DCI-CORE switch is positioned between the sites. (third AS)
  • The DCI-CORE switch should not be aware of EVPN; it should only manage IPv4 routes to facilitate the exchange of BGW VTEP and BGW IPs between sites.

It turns out that BGW reachability and EVPN peering are distinct concepts:

  • BGP IPv4 is utilized to maintain BGW reachability across sites.
  • Once the BGWs achieve the reachability, they can establish BGP EVPN connections directly with one another.

In the first lab, this distinction was not clear because the BGWs were directly connected.

Asymmetric IRB does not work on NX9Kv
#

  • In theory, asymmetric IRB operates similarly to symmetric IRB, but it does not utilize L3VNI.
  • The configuration of the VLAN SVI and anycast gateway is done in the same manner as with symmetric IRB.

Expected Behavior:

  • Traffic directed to the leaf switch should be routed locally to the appropriate VNI and then switched to the destination VNI.
  • For instance, traffic from VNI 10010 to VNI 10020 should be routed to VNI 10020 and encapsulated as VNI 10020.

Issue:

  • In Wireshark, ARP requests are visible, but they indicate VNI 10010. The VLAN SVI is not functioning as intended, and the incorrect VNI is hindering ARP responses.

Debugging: I examined bgp sh l2vpn evpn, and the routes contain the necessary information:

  • Route type 3 for the VTEP.
  • Route type 2 for the MAC-VRF table.
  • The VTEP displays both the destination and source in the IP ARP table.

The Missing Element:

  • ARP resolution is not occurring correctly, resulting in traffic not being mapped to the appropriate VNI.

NX9Kv Bugs
#

Interface NVE Not Recognized
#

Occasionally, the int nve1 command may not be recognized, even with the NV Overlay feature enabled. The NVE interface configuration may disappear, indicating a malfunction. Restarting the switch sometimes resolves the issue, but it can be time-consuming.

Disabling and re-enabling the nv overlay feature can temporarily fix the issue but may result in some configuration loss.

A quicker workaround is:

no feature nv overlay
copy start run

This method is faster than rebooting or full reconfiguration.

Extended Boot Time & Frequent Reboots
#

The switch experiences extended boot times and frequent reboots, often triggered by recurring “kernel panic” messages due to memory depletion. Successful boots only occur when enough memory is available.

Upgrading the RAM from 8GB to 10GB significantly improves boot time and prevents further reboots.

L3VNI Configuration Problem
#

When configuring L3VNI as follows:

VRF context XXXX
vni 10200 l3

This results in the interface module crashing, leading to the disappearance of all interfaces.

To resolve the issue, configure the VNI without the l3 keyword and then restart the switch:

Multi-Site Setup
#

Border Gateway Devices (BGWs) are deployed at each site, with a DC-CORE connecting two data centers, built on a standard Spine and Leaf architecture.

The BGW configuration resembles that of a Leaf device but includes additional settings to support multi-site operations.

The BGW must config all L2VNIs that want to extend to other sites, as well as L3VNIs if L3 route exchange is necessary.

The following is the DC1-BGW setup:

Basic Config
#

Enable Features

feature bgp
feature vn-segment-vlan-based
feature nv overlay
feature fabric forwarding

Create L2VNI

evpn
  vni 10010 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 10020 l2
    rd auto
    route-target import auto
    route-target export auto

Create VLAN and Associate to L2VNI

vlan 10
  vn-segment 10010
vlan 20
  vn-segment 10020

Config SVI with Anycast Gateway

vrf context VXLAN
fabric forwarding anycast-gateway-mac aaaa.aaaa.aaaa
interface Vlan10
  no shutdown
  vrf member VXLAN
  ip address 172.30.10.254/24
  fabric forwarding mode anycast-gateway
interface Vlan20
  no shutdown
  vrf member VXLAN
  ip address 172.30.20.254/24
  fabric forwarding mode anycast-gateway

Config NVE (VTEP)

interface loopback0
  ip address 1.1.1.1/32
  ip ospf network point-to-point
  ip router ospf 1 area 0.0.0.0
interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback0
  member vni 10010
    ingress-replication protocol bgp
  member vni 10020
    ingress-replication protocol bgp

Establish BGP EVPN connection with the SPINE

router bgp 65001
  template peer Intra-DC
    remote-as 65001
    update-source loopback0
    address-family l2vpn evpn
      send-community
      send-community extended
  neighbor 1.1.1.1
    inherit peer Intra-DC

Config L3VNI

vlan 100
  vn-segment 10100
 vrf context VXLAN
+  vni 10100
+  rd auto
+  address-family ipv4 unicast
+    route-target both auto
+    route-target both auto evpn
interface Vlan100
  no shutdown
  vrf member VXLAN
  ip forward
 interface nve1
   no shutdown
   host-reachability protocol bgp
   source-interface loopback0
   member vni 10010
     ingress-replication protocol bgp
   member vni 10020
     ingress-replication protocol bgp
+  member vni 10100 associate-vrf

BGW-Specific Config
#

Enable Multi-Site Gateway

We assign a border gateway ID to the BGW, where “1” represents site 1 and “2” represents site 2. Any number can be used for this purpose, as it simply helps differentiate BGWs located at different sites.

The delay restore time refers to the period needed to recover from a disconnection. A 30-second delay is applied to enable faster recovery, but this is intended solely for lab testing purposes.

Site1

evpn multisite border-gateway 1
  delay-restore time 30

Site2

evpn multisite border-gateway 2
  delay-restore time 30

Multisite tracking

We use dci-tracking for the interface that connects to external fabrics, such as the DCI-CORE or a remote BGW.

 interface Ethernet1/1
   no switchport
   ip address 10.11.12.11/24 tag 1
   no shutdown
+  evpn multisite dci-tracking

We use fabric-tracking for the interface that connects to the fabric, such as SPINE or a local BGW.

 interface Ethernet1/2
   no switchport
   ip address 10.1.11.11/24
   ip ospf network point-to-point
   ip router ospf 1 area 0.0.0.0
   no shutdown
+  evpn multisite fabric-tracking

Config BGW on VTEP

  • Create and assign a loopback interface for the border gateway.
  • Enable multisite ingress replication for all VNIs (both L3 and L2).
interface loopback100
  ip address 100.1.1.1/32 tag 1
  ip ospf network point-to-point
  ip router ospf 1 area 0.0.0.0
 interface nve1
   no shutdown
   host-reachability protocol bgp
   source-interface loopback0
+  multisite border-gateway interface loopback100
   member vni 10010
+    multisite ingress-replication
     ingress-replication protocol bgp
   member vni 10020
+    multisite ingress-replication
     ingress-replication protocol bgp
   member vni 10100 associate-vrf
+    multisite ingress-replication

BGW Reachability to remote site

In the lab, loopback0 serves as the VTEP IP, while loopback100 is the BGW IP, which is configured on the NVE interface.

interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback0 <<
  multisite border-gateway interface loopback100 <<
  member vni 10010
    multisite ingress-replication
    ingress-replication protocol bgp
  member vni 10020
    multisite ingress-replication
    ingress-replication protocol bgp
  member vni 10100 associate-vrf
    multisite ingress-replication

Ethernet1/1 is the interface that connects to the external fabric (DCI-CORE for this lab).

interface Ethernet1/1 <<
  no switchport
  ip address 10.11.13.11/24
  no shutdown
  evpn multisite dci-tracking

First, we will tag these interfaces.

BGW -> DCI-CORE

 interface Ethernet1/1
   no switchport
-  ip address 10.11.13.11/24
+  ip address 10.11.13.11/24 tag 1
   no shutdown
   evpn multisite dci-tracking

VTEP Loopback

 interface loopback0
-  ip address 11.11.11.11/32
+  ip address 11.11.11.11/32 tag 1
   ip ospf network point-to-point
   ip router ospf 1 area 0.0.0.0

BGW Loopback

 interface loopback100
-  ip address 100.1.1.1/32
+  ip address 100.1.1.1/32 tag 1
   ip ospf network point-to-point
   ip router ospf 1 area 0.0.0.0

If there are multiple BGWs at the site, they should config with same IP (100.1.1.1/32).

Second, we use a route-map to redistribute all tagged networks and establish a BGP IPv4 connection between the BGW and DC-CORE.

route-map VXLAN permit 10
  match tag 1
router bgp 65000
  address-family ipv4 unicast
    redistribute direct route-map VXLAN
  neighbor 10.11.13.13 << this is DCI-CORE
    remote-as 64999
    update-source Ethernet1/1
    address-family ipv4 unicast

BGP EVPN connections between BGWs

  • peer-type fabric-external for BGW -> BGW peering
  • rewrite-evpn-rt-asn to rewrite the ASN (65000->65001)
 router bgp 65000
   template peer Inter-DC
     remote-as 65001
     update-source loopback0
+    peer-type fabric-external
     address-family l2vpn evpn
       send-community
       send-community extended
+      rewrite-evpn-rt-asn
   neighbor 12.12.12.12 << this is Remote BGW
     inherit peer Inter-DC

Expected Outputs
#

Running the command:

sh nve peer

The tunnel should form with:

At the BGW:

  • LEAFs
  • BGWs (both local and remote)

At the LEAF:

  • LEAFs
  • Local BGW

Troubleshootings
#

BGP EVPN
#

sh bgp l2vpn evpn sum

This ensure all BGP EVPN connections are established.

If a connection is not established, follow these troubleshooting steps:

sh cdp nei
sh ip int br

This helps identify which device each interface is connected to, making it easier to spot if a wrong network or IP is assigned to an interface.

sh run bgp

Check if the correct interface is being used for the connection (update-source loX) and if it is connected with the correct neighbor (neighbor x.x.x.x)

Use command sh bgp l2vpn evpn to verify EVPN routes.

  • The key focus is to check for Route Type 3, which is used to create tunnels between VTEPs.
  • If Route Type 3 is present, it should be reflected in the output of sh nve peer.

The route table typically contains many entries, concentrate on the section that includes Route Distinguisher: xxxxx (L2VNI yyyy).

...
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.0.1:10 (L2VNI 10010)
...

We can also filter the output by VNI using the following command:

sh bgp l2vpn evpn vni-id 10010

If routes are not propagating within the fabric, verify that the route-reflect-client is configured, which is typically set on the SPINE.

Alternatively:

sh bgp l2vpn evpn x.x.x.x

This can confirm if the route has been advertised to the peer.

If the route has been advertised but the receiver side does not have it, potential issues could be a mismatch in:

  • VNI
  • VRF
  • route-target « likely forgot the config the route-target

IP reachability
#

Use ping to test reachability for all IPs, Verify that IGP is functioning properly.

To debug OSPF

sh ip ospf int br

This helps verify which interfaces are being advertised in OSPF.

sh ip ospf nei

This is useful for identifying if any OSPF peers are missing.

NVE Interface
#

If NVE interface is “down”. Usually, the error message will be displayed in detail view. We can check the status with:

sh nve interface nve1 detail

Common reasons for this issue include:

  • No source-interface assigned.
  • Links are down at dci-tracking or fabric-tracking interfaces.
  • The interface is still in a shutdown state (forgot to config no shutdown explicitly).
  • Conflict VPC setting.

VNI
#

sh nve vni
Codes: CP - Control Plane        DP - Data Plane
       UC - Unconfigured         SA - Suppress ARP
       SU - Suppress Unknown Unicast
       Xconn - Crossconnect
       MS-IR - Multisite Ingress Replication

Interface VNI      Multicast-group   State Mode Type [BD/VRF]      Flags
--------- -------- ----------------- ----- ---- ------------------ -----
nve1      10010    n/a               Up    CP   L2 [10]            SA
nve1      10020    n/a               Up    CP   L2 [20]            SA
nve1      10100    n/a               Up    CP   L3 [VXLAN]         
  • For L2VNI: Ensure the VNI is properly assigned to the corresponding VLAN.
  • For L3VNI: Ensure the VNI is correctly associated with the appropriate VRF.

If some VNIs are missing, it’s usually due to forgetting to associate the VNI at the NVE.

Forgot to assign the VNI to a VLAN: Type [BD/VRF] will not show the related VLAN, such as [10] for 10010.

Forgot to assign the VNI to a VRF: Type [BD/VRF] will not show the related VRF, such as [VXLAN] for 10100.

Forgot to config the route-target: The VNI might appear as up but will not function correctly.