Welcome to Part 2 of the blog series on NSX GRE tunnels where we discuss about GRE tunnel failure detection. This is a continuation to the previous article where we implemented GRE tunnels and configured BGP over GRE, and I highly recommend checking out Part 1 before continuing.
Part 1 – BGP over GRE:
https://vxplanet.com/2024/07/19/nsx-gre-tunnels-part-1-bgp-over-gre/
GRE tunnels are stateless in nature. This means that the GRE tunnel endpoint doesn’t maintain any information about the state of the remote endpoint, unless any keepalive mechanisms are in place. Remember, when we did the GRE tunnels between Zone A and Zone B in the previous post, we didn’t enable GRE keepalives on the tunnel. That means, failure of an edge node in any of the zones will still keep the GRE tunnel as up, resulting in traffic blackholing.
In this article, we will try out the below options for failure detection on the GRE tunnels, review the observations and decide on which option to implement for the fictitious customer Corp-XYZ.
- Routing protocol (BGP) timers based failure detection
- Routing protocol (BGP) with GRE keep-alive based failure detection
- Routing protocol (BGP) with BFD based failure detection
Let’s get started:
Table of Contents
Routing protocol (BGP) timers based failure detection
We used BGP as the routing protocol over GRE for dynamic route advertisement between Zone A and Zone B. BGP keepalives are sent between the edge nodes across the GRE tunnel, every 60 seconds (default). The hold down timer is negotiated to 3 x keepalive timer, which is 180 seconds. This means if an edge node is down, the remote edge node (in the other zone) detects the failure as soon as the hold-down timer is reached and BGP flushes out the routes learned from the failed member, from the BGP table. However, it’s possible that traffic over the GRE tunnel could be blackholed from the time the edge node failed until the hold down timer is expired.
Let’s test this:
We are on edge01 in Zone A, and we have BGP neighborship with edge03 and edge04 in Zone B over GRE tunnels.
Let’s simulate a failure of edge04 in Zone B.
We know that the GRE tunnels are stateless, and we still see the GRE tunnels as up.
As soon as the BGP hold down timer is expired, the neighbor adjacency to edge04 is lost.
At this moment, the BGP routes learned from edge04 is flushed, and the forwarding table is updated. We see that the remote testing networks in Zone B are no longer reachable through edge04 but have a redundant path through edge03.
Cons : We need to wait for the BGP hold down timer to expire, which is 180 seconds by default in NSX. This wait time can lead to traffic blackholing on the tunnel. Although the timers are tune-able, still this approach results in slower convergence.
Routing protocol (BGP) with GRE keep-alive based failure detection
Now let’s enable GRE keepalives on the GRE tunnels that we created. The default GRE keepalive timer is 10 seconds (tune-able) with a dead timer of 30 seconds. This means that if a GRE tunnel endpoint doesn’t receive a keepalive acknowledgement from it’s peer with in the dead timer interval, the tunnel is marked down.
Tunnel keepalive statistics shows info about the keepalives that are sent and received from the endpoints.
We are on edge01 in Zone A, and we have BGP neighborship with edge03 and edge04 in Zone B over GRE tunnels.
Let’s simulate a failure of edge04 in Zone B.
As the GRE dead timer expires (30 seconds), we see that the GRE tunnels from edge01 (Zone A) and edge02 (Zone A) to edge04 (Zone B) goes down.
We are now 60 seconds past the failure of edge04, but we still see that the BGP neighbor status is UP. This means that the tunnel status has not signaled BGP and no BGP notifications has been triggered regarding a link state failure. This is because GRE tunnels are just logical links.
As the BGP hold down timer expires (180 seconds), we see that the neighbor adjacency with edge04 is brought down and the BGP table is updated.
At this moment, the BGP routes learned from edge04 is flushed, and the forwarding table is updated. We see that the remote testing networks in Zone B are no longer reachable through edge04 but have a redundant path through edge03.
Cons : We saw that using GRE keepalives with BGP did not signal BGP regarding the GRE tunnel failure status, and we needed to wait for the BGP hold down timer to expire, (180 seconds). Similar to the previous scenario, this wait time can lead to traffic blackholing on the tunnel.
Routing protocol (BGP) with BFD based failure detection
Now let’s switch off the GRE keep-alives and enable BFD (Bi-directional forwarding detection) on the BGP neighbor configuration. This needs to be performed on all the BGP neighbor statements in both zones.
By default, the BFD keep-alives are set to 500ms duration with a dead interval of 1500ms. That means, whenever there is a failure on an edge node, BFD should detect the failure and signal BGP as soon as the BFD dead timer is expired (1500ms)
We are currently on edge01 in Zone A, and we have BGP neighborship with edge03 and edge04 in Zone B over GRE tunnels.
and we have the BFD sessions created from edge01 (Zone A) to edge04 (Zone B) successfully.
Now let’s simulate a failure of edge04 in Zone B. We see that the BFD session to edge04 went down immediately after the dead timer expired.
BFD has signaled BGP regarding the failure and the neighbor adjacency created with edge04 is brought down.
At this moment, the BGP routes learned from edge04 is flushed, and the forwarding table is updated. We see that the remote testing networks in Zone B are no longer reachable through edge04 but have a redundant path through edge03.
Pros : What we noticed was a quick failure detection (in less than 2 seconds) and the routing protocol was signaled to flush out the failed paths and update the forwarding table.
Conclusion
Based on the findings that we had across the three scenarios, we recommend using BFD with BGP over GRE for the GRE implementation that we did for the fictitious customer Corp-XYZ. We also decided to keep the GRE keepalives in disabled state as we don’t require multiple heartbeat mechanisms over the GRE tunnel.
Now that’s a wrap!!!
I hope this two-part blog series on GRE tunnels with a fictitious customer use case gave you a good understanding on the design and configuration to start implementing your own use cases. Feel free to reach out via comments or email if you have any questions.
See you soon with a new NSX topic!!!
I hope the article was informative. Thanks for reading.
Continue reading? Here are the other parts of this series:
Part 1 – BGP over GRE :
https://vxplanet.com/2024/07/19/nsx-gre-tunnels-part-1-bgp-over-gre/