There are number of terms we use in today’s High Availability Network like NSF (Non-Stop Forwarding), GR (Graceful Restart) and NSR (Non-Stop Routing). Companies these days want 99.5% availability of their networks and these High Availability features play a vital role in that. However have you ever wondered what’s the difference between all these terms??. Add to our confusion is different vendors and their usage of terms.
Let’s try to understand what’s basically these terms are and whether there is any commonality between terms used by different vendors!!!! We will compare Cisco and Juniper over here.
Modern high-performance routers physically separate the forwarding plane and the control plane and both have their own memory and processors. The control plane runs the routing protocols, and derives a forwarding table (FIB). The FIB is given to the forwarding plane, which is then responsible for actual packet forwarding through the router. The advantage of physically separating the forwarding and control planes is that in case of congestion i.e. huge traffic is flowing through the routers; forwarding plane becomes very busy however in that case it doesn’t impact the control plane’s ability to process new routing information. Similarly in case router’s routing plane/control plane becomes clogged due to route flapping or any other issues, it doesn’t impact the forwarding plane to continue forwarding packets as forwarding plane has a copy of the FIB which it previously got from Control plane. This is called Non-Stop Forwarding (NSF).
Now you must be thinking that this is not a good architecture as Router is forwarding on the path which is corrupt or not optimum at this moment or you can say that there might be good path somewhere which is not being used by Router. So why do I need NSF?
Well, you need NSF so that routers can use redundant control planes. Cisco calls their control planes as Route Processors and Juniper calls them Routing Engines. With 2 processors or routing engines, NSF switches from a primary to a backup control plane without disrupting forwarding. The FIB could still become invalid during the period between when the primary control plane goes down and the backup control plane takes over, but this is acceptable for time being 😉
So problem now is how you can make this switchover from primary to backup control plane shorter so that FIB is less prone to invalid information. Routers do this by maintaining the copy of the active configuration on backup processor/routing-engine as well. Now Cisco calls this process as Stateful Switchover (SSO) and Juniper calls it as Graceful Routing Engine Switchover (GRES). J
So what is Non-Stop Routing (NSR) then?
Ok as I stated above that Control plane has Stateful Switchover at its disposal to decrease the switchover time however problem is that once router do the switchover all the routing protocol adjacencies like OSPF, LDP, IS-IS etc. goes down. So when routing protocol goes down, neighboring routers by principle update their neighboring routers of this mis-happening and those routers will in-turn update other neighboring routers in chain. This all process will un-stabilize the network and CPU processing on all routers will increase. Same will happen at the time when back up control plane comes up. So you guessed it right, the use of NSR in that case is to minimize this un-stability.
Initially, to control this un-stability, GR (Graceful Restart) principle was proposed, where on router’s control plane switchover, router doesn’t report the switchover information immediately to its own neighbor rather it wait for certain period of time (which is called grace interval) and this saves the network from impact. However to have this GR capability all the neighbors should support GR which may not be the case everywhere like on small routers in Enterprise Networks..So they proposed NSR..
In NSR, router’s backup routing-engine/processor keeps the information of routing-protocol i.e. OSPF, LDP, IS-IS state as well and as this information is already with backup processors, switchover is transparent to neighbors. So why this doesn’t impact small routers? Because NSR is vendor specific and neighboring router doesn’t have to support it unlike GR.
Different vendors use all these terms differently. Juniper, for example, calls its graceful restart implementation as Graceful Restart, whereas Cisco calls it’s as Non-Stop Forwarding Awareness Also people consider Juniper’s GRES and GR as same however if you read above they both are two different things.
So, that’s all for NSF, GR and NSR. I hope you find this information useful and I am able to lessen your confusion. If you still have any questions, please let me know. 🙂