MC – LAG (Multi-Chassis LAG)

Networks of today demands lot of redundancy, be it link level redundancy or device level. Service providers employ various methods to provide this level of service for their customers and one of way is Link Aggregation Group (LAG) where multiple Ethernet links are combined to form a single aggregated group, thereby increasing bandwidth and providing redundancy. This layer 2 transparency is achieved by the LAG using a single MAC address for all the device’s ports in the LAG group. LAG uses a control protocol called LACP for its operation.

Instead of having multiple link i.e 2, 4 or 8 aggregated in one Lag, vendors suggested a proprietary solution to have the device level redundancy built as well in Lag and result was Multi-chassis Lag (MC-LAG). As this is proprietary solution, all the vendors do support it but with little difference from one another. Cisco calls this solution as Multichassis Ether Channel, Alcatel calls its MC-LAG.

The CE (Customer Edge) device is completely unaware that its Ethernet links that belong to the same LAG (Etherchannel in Cisco) are connected to two separate PE (Provider Edge) devices. We can assume PE device here as Alcatel 7750. The two PE routers each have one LAG connected to the same CE device. At a time, only one PE router’s LAG ports are active and carrying traffic. The other PE router’s LAG ports are standby and only become active when failure is detected in the active links. The PE routers perform election to decide the active and standby router.

Picture1

If you see the above figure, from CE’s perspective, all 4 ports belonging to a LAG are connected to a single service provider device (here PEs). All 4 ports are active, but only 2 ports are UP at a time; the other 2 ports are in DOWN state. On the both PE routers, as before we have to create a regular LAG towards the CE device and on top of that we configure MC-LAG separately where we define MC-LAG peer (i.e. 2nd PE address) and LACP parameters. The MC-LAG control protocol information is exchanged between PE routers. This exchange results in active/standby selection, and ensures only one PE router’s ports are active and carrying traffic.  MC-LAG control protocol runs only between MC-LAG peers. Both PE routers send an exactly same {Admin Key, System ID, System Priority} information in the LACP packets.

Link Aggregation Control Protocol (LACP) detects multiple links available between two devices and configures them to use as an aggregate bandwidth. The two sides detect the availability of the other side by sending LACP packets. One end is an Actor, while the other end is the Partner. During LACP negotiation, (Admin Key, System ID, System Priority) identifies the LAG instance. So, for a LAG, all participating ports on that device must have the same values for these 3 fields.

 

Regards

Mohit Mittal

 

OSPFv3 for IPv6 alone or IPv4 as well?

We know that OSPFv2 is Link state routing protocol developed by IETF as a robust IP routing protocol suitable for large networks and to carry Ipv4 addresses. OSPF was first documented as a standard by John Moy in RFC 1131 and further Improvements were made in OSPF version 2. OSPF was then extensively modified by IETF to support IPv6 and called OSPFv3. But do you know OSPFv3 can be used to carry IPv4 addresses as well?

Before going into it, let’s review some differences between OSPFv2 and OSPFv3

  • OSPFv3 introduces new LSA types
  • OSPFv3 has different packet format
  • OSPFv3 adjacencies are formed over link-local IPv6 communications
  • OSPFv3 runs per-link rather than per-subnet
  • OSPFv3 supports multiple instances on a single link, Interfaces can have multiple IPv6 addresses
  • OSPFv3 uses multicast addresses FF02::5 (all OSPF routers), FF02::6 (all OSPF DRs)
  • OSPFv3 Neighbor Authentication done with IPsec (AH)
  • OSPFv2 Router ID (RID) must be manually configured, still a 32-bit number

Ok, now coming back to original question. If an organization wanted to use OSPF for both their IPv4 and IPv6 routing protocol, then they would likely use OSPFv2 for their IPv4 routing and OSPFv3 for their IPv6 routing. This would give the organization dual control planes for dual forwarding protocols. In this configuration, if there was a problem with either routing domain then it would not affect the other IP version. The same separation could also be achieved by running two completely different routing protocols. For instance, an organization could use OSPFv2 for IPv4 and IS-IS in single-protocol single-topology mode for IPv6. The IETF has continued to develop OSPFv3 so that it is now capable of working with multiple address families. In much the same way as Multi Protocol Border Gateway Protocol (MP-BGP) can function as an IPv4 and IPv6 routing protocol.

Once again, Cisco changed the IOS configuration commands required for OSPFv3 configuration. The new OSPFv3 configuration uses the “ospfv3” keyword instead of the earlier “ipv6 router ospf” routing process command and “ipv6 ospf” interface commands. OSPFv3 is still configured on the interfaces similarly to how the previous OSPFv3 commands were used. However, the biggest change is in the configuration of the routing process. This new syntax is more like multi-Address Family configuration of BGP and you have both an IPv4 and an IPv6 address family configuration section under “router ospfv3 “. New OSPFv3 syntax is used to configure a dual-protocol interface and for multi-address-family configuration under the OSPFv3 routing process is:

ipv6 unicast-routing
ipv6 cef
!
router ospfv3 <process-id>
router-id <router-id>
auto-cost reference-bandwidth 1000
address-family ipv6 unicast
area 0 range <range>
area 1 range <range>
address-family ipv4 unicast
area 0 range <Ipv4 range>
area 1 range <Ipv4 range>

So, OSPF is now evolved into a fully dual-protocol multi-AF routing protocol. Organizations now have multiple options for deploying OSPF. Organizations can stick with OSPFv2 for IPv4, and then use OSPFv3 for IPv6-only for a configuration that separates the control planes and the forwarding planes. Organizations can now combine the configuration of IPv4 and IPv6 into a single OSPFv3 process that can work equally well for both IP protocols.

 

Regards

Mohit Mittal

Route-Reflection in JunOS

Let’s talk about one important concept in Route-reflection configuration in Junos.

To start with, there are 2 main IPv4 routing-tables in Junos which are inet.0 and inet3.0. inet.0 is main global routing table and inet3.0 is used in MPLS Layer 3 VPN and this table stores the egress address of an MPLS label-switched path (LSP), the LSP name, and the outgoing interface name. Only BGP accesses the inet.3 routing table. BGP uses both inet.0 and inet.3 to resolve next-hop addresses.

Now let’s configure the Route-reflection in Network. We will using 2 PEs and 1 RR

MPLS Network_1

Config on PE1:

PE1-re0> show configuration protocols bgp
local-address 10.198.123.204;
group L3VPN-RRs {
type internal;
family inet-vpn {
unicast;
}
authentication-algorithm md5;
authentication-key-chain BGP-L3VPN-key-chain;
export L3VPN-Export;
vpn-apply-export;
neighbor 10.198.123.235;   <<<<<<<<<<<<<<———- Router ID of RR
}

Config on PE2:

PE-2-re0> show configuration protocols bgp
local-address 10.198.123.205;
group L3VPN-RRs {
type internal;
family inet-vpn {
unicast;
}
authentication-algorithm md5;
authentication-key-chain BGP-L3VPN-key-chain;
export L3VPN-Export;
vpn-apply-export;
neighbor 10.198.123.235;
}

Config on RR (relevant configs only):

RR.re0> show configuration logical-systems l3vpn-RR
interfaces {
lo0 {
unit 3 {
family inet {
filter {
input Protect-RE;
}
address 10.198.123.235/32;
}
}
}
}
protocols {
bgp {
local-address 10.198.123.235;
mtu-discovery;
log-updown;
family inet-vpn {
unicast;
}
group l3vpn-client-group {
type internal;
authentication-algorithm md5;
authentication-key-chain BGP-L3VPN-key-chain;
cluster 10.198.123.235;
neighbor 10.198.123.204;
neighbor 10.198.123.205;
}
.
.
.
.
routing-options {
graceful-restart {
restart-duration 500;
}
router-id 10.198.123.235;
autonomous-system 65004;
}

BGP is established between PEs and RR

PE-2-re0> show bgp summary | match 10.198.123.235
10.198.123.235       65004      19154     12204       0       5 3d 23:20:04 Establ

PE-1-re0> show bgp summary | match 10.198.123.235
10.198.123.235       65004     19154     12326       0       1 3d 23:20:38 Establ

RR-re0> show bgp summary logical-system l3vpn-RR | match 10.198.123.204
10.198.123.204       65004     12336     19179       0     34 3d 23:25:10 Establ

RR-re0show bgp summary logical-system l3vpn-RR | match 10.198.123.205
10.198.123.205       65004     12212     19179       0     10 3d 23:24:31 Establ

PE-1 is advertising routes towards RR with next-hop address as its own loopback. All well n good.

PE-1-re0> show route advertising-protocol bgp 10.198.123.235
Data-VPN.inet.0: 22 destinations, 22 routes (22 active, 0 holddown, 0 hidden)
Restart Complete
Prefix                          Nexthop             MED     Lclpref   AS path
* 10.12.204.128/32     Self                       100       I
* 10.12.240.0/30         Self                         100       65012 I
* 10.12.240.128/32     Self                        100       65012 I
* 10.204.12.0/30         Self                         100       I

M10i-L3VPN.inet.0: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
Restart Complete
Prefix                         Nexthop            MED     Lclpref   AS path
* 10.0.0.240/30           Self                         100       65020 I
* 100.100.100.0/30   Self                         100       I

However wait a minute, we are not seeing any routes under BGP table on RR

RR-re0> show route receive-protocol bgp 10.198.123.204 logical-system l3vpn-RR
inet.0: 96 destinations, 96 routes (96 active, 0 holddown, 0 hidden)
Restart Complete
bgp.l3vpn.0: 89 destinations, 178 routes (0 active, 0 holddown, 178 hidden)
Restart Complete

Why is this??.. Now this is fundamentally an issue with how the things were setup.

As I mentioned above inet.3 table stores the egress address of an MPLS label-switched path (LSP) which is used by BGP table to resolve next-hop addresses which in our case is loopback ip of PEs however as RR is not in forwarding path there are no MPLS LSPs configured on it and in-turn no inet.3 table entries which is a problem and that’s why you can see all entries in output above are hidden as bgp table is not able to resolve the next-hop IPs in inet3 table.

So there are number of ways to resolve this and will be discussing two of them here. Simplest one and most widely used method is to configure a static route for loopback IP subnet under inet.3 rib as below.

[edit logical-systems l3vpn-RR routing-options]
RR.re0# load merge terminal relative
[Type ^D at a new line to end input]
rib inet.3 {
static {
route 10.198.123.0/24 {
discard;
metric 65535;
}
}
}
load complete
[edit logical-systems l3vpn-RR routing-options]
RR.re0# commit
re0:
configuration check succeeds
re0:
commit complete

Once you configure this, inet.3 table is populated with static entry and now BGP can use this to resolve the next-hop IP Address for each route and all entries are visible now in routing table.

RR.re0> show route logical-system l3vpn-RR table inet.3
inet.3: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
Restart Complete
+ = Active Route, – = Last Active, * = Both
10.198.123.0/24   *[Static/5] 00:00:08, metric 65535
Discard
[edit logical-systems l3vpn-RR routing-options]
RR.re0# run show route logical-system l3vpn-RR table bgp.l3vpn.0
bgp.l3vpn.0: 89 destinations, 178 routes (89 active, 0 holddown, 0 hidden)
Restart Complete
+ = Active Route, – = Last Active, * = Both
.
.
.
.
10.198.123.204:12:10.0.0.240/30
*[BGP/170] 4d 04:25:15, localpref 100, from 10.198.123.204
AS path: 65020 I, validation-state: unverified
to Discard
[BGP/170] 05:34:11, localpref 100, from 10.198.123.238
AS path: 65020 I, validation-state: unverified
to Discard
10.198.123.204:12:100.100.100.0/30
*[BGP/170] 4d 04:25:15, localpref 100, from 10.198.123.204
AS path: I, validation-state: unverified
to Discard
[BGP/170] 05:34:11, localpref 100, from 10.198.123.238
AS path: I, validation-state: unverified
to Discard
10.198.123.204:116:10.0.0.24/30
*[BGP/170] 4d 04:25:15, localpref 100, from 10.198.123.204
AS path: I, validation-state: unverified
to Discard
[BGP/170] 05:34:11, localpref 100, from 10.198.123.238
AS path: I, validation-state: unverified
to Discard

Another option is to let inet3.0 use the rib already calculated by inet.0 table by using the below command.

[edit logical-systems l3vpn-RR routing-options]
RR.re0# show
graceful-restart {
restart-duration 500;
}
router-id 10.198.123.235;
autonomous-system 65004;
resolution {
rib inet3.0 {
resolution-ribs inet.0;
}
}

Both of these methods are valid and it depends upon which one you want to use in your network. For 2nd method you can configure prefix-list to list down only the specific network you want to exchange.

So that’s all for today. I hope I was to make it easy for you to understand. Let me know in case you have any comments or queries. J

R
Mohit

MPLS TTL

I have written about MPLS in my previous blogs but this time I want to highlight one crucial concept in case of MPLS within Service Provider environment. I hope you already know how TTL (Time to Live) works in IP network and its usage in helping the network by disallowing any mis-routed packet to loop indefinitely inside the Service Provider or Enterprise Network. Plus we know Traceroute command also uses TTL functionality to get the Intermediate hops towards the destination.

As you must know that inside MPLS environment we don’t use any IP functionality and IP packets gets encapsulated with MPLS packets. There is one TTL field which is part of IP Packet Header and when IP packets from customer comes to Service Provider; those IP packets gets encapsulated with MPLS packets and IP TTL fields gets hidden. Then how TTL in this case will prevent the Loop in MPLS environment? MPLS needs a loop prevention mechanism as does any other forwarding protocol. And instead of having to modify the IP header of every packet that passes through an interface on an LSR (Label Switch Router like P and PE) within the cloud, “it copies the IP packet’s TTL header information into a new label being pushed onto the IP packet” as it enters the MPLS cloud; Thus preventing having to touch the IP header information on the ingress packets.

As the packet traverses the MPLS cloud, each LSR will decrement the TTL within the MPLS header, just like in a typical IP network. As the packet reaches the egress Edge LSR, the E-LSR will pop the label on that packet, subtract the cost of one interface (the egress interface) from the current TTL value, and then apply/copy that value to the header of the IP packet that will be forwarded on.

Service Providers are usually providing some sort of L3 services, and their customers sit outside of their MPLS network. The result of the traceroute command with MPLS’s default configuration (i.e. copy of IP TTL to MPLS TTL field) allows for customers to see every next-hop in the path that one of their packets traverses. This can cause some headache for customers, as they don’t need to know what the provider’s topology consists of, but it also poses potential security risk for the Service Provider themselves.

The output below shows what it looks like for a customer to run the traceroute command with the default TTL copy behavior of MPLS.

1 192.168.15.1 44 msec 12 msec 24 msec

2 192.168.14.4 [MPLS: Labels 16/16 Exp 0] 80 msec 496 msec 88 msec

3 192.168.37.3 [MPLS: Label 16 Exp 0] 48 msec 60 msec 48 msec

4 192.168.37.6 56 msec 60 msec *

As you can see, the LSP’s or next-hops are present in the output. Again, it may be that the SP doesn’t want the customer to have that type of insight into their network, or the customer doesn’t want to see the next-hops while attempting to troubleshoot issues.

Thankfully, we have one solution. Under Cisco IOS we can use the “no mpls ip propagate-ttl [local | forwaded]” command to suppress the copying of the TTL from the IP packet’s header. For  Junos no-propagate-ttl under protocol mpls does the same thing. When this command is issued, the ingress Edge LSR will assign a generic TTL value of 255 to the label, instead of copying the IP packet’s current TTL.

As a result, the entire MPLS network will look as though it is just a single hop to the customer. As you can see below 🙂

1 192.168.15.1 28 msec 36 msec 16 msec

2 192.168.37.6 108 msec 108 msec *

The “local” option in the command above references traffic originated by the LSR itself. Such as when a Service Provider’s engineer logs into a router and issues the traceroute command, this would let SP engineer to see the LSR’s within the MPLS cloud. Whereas the forwarded option pertains to traffic that is originated external to the MPLS cloud, usually within the customer sites.

So, that’s all for MPLS TTL, I hope you have liked this blog 🙂

Regards

Mohit

MPLS Explicit and Implicit null Label

I have often seen that people often confuses between MPLS explicit null and Implicit null label value (I was among those) so in this post will try to ease your confusion on this 🙂

Before starting with these label values, I hope you are aware of MPLS PHP (Penultimate Hop Popping) feature. If you are not let me explain….

Take a look at the below picture, Ingress LSR and Egress LSR are routers which have customer connections on each side and LSRs in figure are inner MPLS routers.

MPLS Label

In Normal MPLS operation, IPv4 packet when comes to Egress LSR, will have MPLS Label on the top of IP Header. Egress LSR will do 2 operations and 2 look ups. One in MPLS table and other in IP Routing Table to send the packet to appropriate Customer interface. However these 2 operations increases the memory and CPU consumption on the Egress LSR. To avoid these 2 lookups on Egress, Egress LSR initially send a special label value of 3 to “next-to-last LSR” (called the penultimate LSR). This label 3 is called the IPv4 Implicit Null label. When an LSR receives an MPLS header in which the label is set to 3, it always POPs the header i.e., it removes the top label.

This procedure  is called Penultimate Hop Popping (PHP).

So what is Explicit Null?

Ok, when a packet or Ethernet frame is encapsulated in MPLS, you have the option of copying the IP precedence or 802.1p bits to the three CoS bits of the MPLS header i.e. EXP Bits.

If a POP is performed at the penultimate LSR, the EXP bits in the MPLS header are no longer available as a reference for queuing and the packet is queued on the outgoing interface according to the CoS behavior of the underlying payload (in Ipv4 packet, it will be ToS field). An explicit null (Label Value 0 for IPv4), on the other hand, leaves the MPLS header in place until it reaches the egress, preserving the LSP CoS behavior across the entire LSP.

So that’s all for these reserved label values. If you still have any queries, please let me know.

Regards

Mohit

Juniper Northstar — WAN-SDN Controller

Biggest misconception I think currently with SDN is that in order to run SDN we need to have OpenvSwitch equivalent switch supporting Openflow protocol between Controller and Switch. Most software vendors are promoting their software under this category however Hardware vendors who have spent considerable money on building hardware platforms are not just selling switches supporting Openflow. They are using SDN to come up with other applications which are centrally controlling the network and influencing the network from single point but not using Openflow protocol.

Juniper has one of the similar product under WAN–SDN controller category named Northstar. I just happen to assess it recently for my Telco.

Northstar comes in two flavours, Controller and Planner. Controller enables granular visibility and control of IP/MPLS tunnels in large service provider and enterprise networks.

Northstar Planner is more of modelling tool which can help you in understanding the behaviour of new LSP addition, deletion, failure of node/link etc. on your network before you actually provision the network.

The NorthStar Controller relies on PCEP (Path Computation Element Protocol) to deploy a path between the PCC routers. The path setup itself is performed through RSVP-TE signaling, which is enabled in the network and allows labels to be assigned from an ingress router to the egress router. Signaling is triggered by ingress routers in the core of the network. The PCE client runs on the routers by using a version of the Junos that supports PCEP.

The NorthStar Controller provisions PCEP in all PE devices (PCCs) and uses PCEP to retrieve the current status of the existing tunnels (LSPs) that run in the network. By providing a view of the whole network state and bandwidth demand in your network, the NorthStar Controller is able to compute optimal paths and provide the attributes that the PCC uses to signal the LSP.

Example Topology

Northstar Controller Topology

If your network supports Point to Multipoint LSPs, then you need minimum of 15.1F6 version on your ingress PE to view P2MP LSPs on Northstar controller however Egress PE can be on any version. Northstar initiates an iBGP-Link state session between itself and Ingress PE and PCEP attributes are shared over this session.

Home Page:

Northstar Controller_Home Page

With Northstar, we can view all the LSPs in network from the point of view of Ingress PE supporting PCEP and from there on we can initiate an LSP and delegate LSP from PE to Controller to manage it.

Northstar lets you create single LSP, multiple LSPs at once and even diverse LSP by site or link which could be very useful in case of primary backup paths in order to protect against single source of failure.

Diverse LSP Provisioning

Northstar Controller_Provision Diverse LSPs

There are three types of TE LSPs used with PCEP

  • CLI-controlled LSPs—The LSPs that do not have the lsp-external-controller pccd statement configured are called CLI-controlled LSPs. Although these LSPs are under local router control, the PCC updates the connected PCE with information about the CLI-controlled LSPs during the initial LSP synchronization process.
  • PCE-controlled LSPs—The LSPs that have the lsp-external-controller pccd statement configured are called PCE-controlled LSPs or delegated LSPs. The PCC delegates the PCC-initiated LSPs to the main PCE for external path computation.
  • Externally-provisioned LSPs (or PCE-initiated LSPs)—The LSPs that have the lsp-provisioning statement configured are called PCE-initiated LSPs. A PCE-initiated LSP is dynamically created by an external PCE; as a result, there is no LSP configuration present on the PCC.

In its current version, Northstar is really impressive and only thing lacking at the moment in its in-ability to create P2MP LSPs which is must for broadcast applications in NG-MVPN environment and Juniper has plans to include this in their coming releases by end of 2017.

I am sure Service providers will surely think of using Northstar in their IP/MPLS Network where they are using Traffic Engineering LSPs in order to give them more flexibility and control over there traffic bandwidth demands.

Let me know your views on it and would you be interested to deploy this onto your network 🙂

R

Mohit Mittal