The next fashion

By now just about everyone has realized that OpenFlow is just vaporware. Technically, there was never any content behind the hype. The arguments used to promote OpenFlows revolutionary properties where simply the ignorance of all previous technologies that used the exact same design ideas from SS7 to PBB-TE.

Rumor has it that even the most religious OpenFlow supporters from Mountain View to Tokyo have realized that OpenFlow is pretty much dead. If you look back at it, it was a pretty silly set of assumptions to start with: that hardware design and not software the the limiting factor in network devices; and that you can define a low-level forwarding language based on the concept of a TCAM match that is going to be efficient across general purpose CPUs; ASICs and NPUs. Both assumptions can easily be proven to be false.

But OpenFlow’s promise was “too good to be true”. So a lot of people preferred to ignore any hard questions in search of the illusory promises of a revolution in networking. By now though, everyone gets it.

As an industry, what is the expected reaction to the OpenFlow hangover ? One would expect a more down-to-earth approach. Instead we get “Segment Routing”. Another “magic wand” proposal that is being presented by a bunch of industry luminaries and as such it really must be “the next thing”.

Segment routing consists of using the an “IP loose source routing header” approach to indicate the intermediate hops that a packet should traverse. So instead of creating state in network devices that correspond to a FEC, state is carried in the packet header.

The argument here is that this is supposed to simplify the network by the need to create less state in network elements. But of course, this is before you consider that the same state still needs to be managed: FECs still need to be calculated, bandwidth allocated, fast failover needs to be handled, etc. Not to mention the fact that carrying state in the packet would make the network really hard to debug.

Networking technology is about networking: the ability to build a distributed system among pieces that are provided by different vendors. There are cases in which centralized computations may be useful, but signaling is not an operation that needs to be or benefits from being centralized.

Distributed signaling is a technology that works well. It allows central policy control when required but it also allows for distributed decisions such as local repair. Reinventing signaling with a hop-by-hop header which is being proposed by the segment routing crowd hardly seems an answer to any real problem. It does however promise to attempt reinvent all the basic functionality that has been developed on MPLS over the past 15+ years.

I can’t wait until the next fashion comes up… hopefully with something a bit more orginal and more concerned about providing new functionality rather than re-inventing the wheel.

The Target data breach

According to news reports, credit card information from Target’s point of sales systems was stolen after hackers gained access to the systems of an HVAC contractor that had remote access to Target’s network.

Network virtualization is an important tool that can be used to prevent (or at the very least place barriers) to similar attacks in the future. Increasingly retail stores deploy multiple applications that must be accessible remotely. HVAC systems are an example, but retail locations also often support signage applications (advertisement panels), wifi guest networks, etc.

Most of these applications will contain a mix of physical systems on the branch, applications running in the data-center, as well a remote access to contractors.

From a network segmentation perspective, it is important to be able to create virtual networks that can span the WAN and the data-center. The obvious technology choice for network virtualization in the branch is to be use MPLS L3VPN. It is a technology that is supported in CE devices and that can be deployed over a enterprise or carrier managed private network.

The branch office CE will need to be configured with multiple VLANs, per virtual-network, where physical systems reside. In order to have a solution that is manageable these VLANs should be associated with a VRF in order to prevent unauthorized traffic. It is also possible that the branch will require servers that run virtual-machines that should be associated with different virtual-networks.

On the data-center, it is important to be able to interoperate with the WAN virtual-networks. That is where a technology such as OpenContrail shines. Giving the network admin the ability to extend a Neutron virtual-network across the WAN.

Note that the data-center in question could be a private data-center, a remotely hosted application or a contractor. All of these use cases can be achieved by using networking technology built on interoperable standards.


interoperability testing

Recently, one customer prospect asked the Contrail team to build a POC lab using only non-Juniper network gear. The team managed to find a cisco ASR 900 as a loaner device and we had to make that device work as a data-center gateway.

Typically we use the Juniper MX as a the data-center gateway in our clusters. When you use an MX, the system somehow feels dated. It does feel like a 10+ year old design, which it is. But it is incredibly solid and feature rich. So one ends up accepting that it feels a bit dated as a tradeoff to its “swiss army knife” powers.

The cisco ASR 900 belongs to the 1k family and runs IOS as a user space process on Linux. I’d not used IOS in 3 years. My first impression was: this artifact belongs to the Computer History Museum. In fact the CHM (which is a fantastic museum) has several pieces in exhibition that are more recent that 1984, the year IOS debuted.

And IOS (even the version 15 in this loaner box) is a history trip. You get to see a routing table that precedes classes internet addresses, the config still outputs “bgp no-synchronization” despite the fact that IGP synchronization was already an obsolete concept in 1995 when i first started using IOS.

It took us forever to get to the configuration we needed. The online documentation is visibly incorrect to the point where some examples contain names of configuration objects which are presented with as “some_name” in one line and “some name” a few lines bellow. IOS “show running-config” is this magic entity that “eats up” configuration that is typed in when it believes it to be a default; “show” commands are broken. It is just awfull.

Cisco does have second and third operating systems: IOS XR and NXOS, which i don’t have direct experience using myself. However judging from the configuration guides available online, they look like a small incremental improvement rather than systems that re-thought some of the initial assumptions of IOS. They hardly look like next-generation systems.

The problem is that whether one likes or not cisco is the thought leader of the networking industry. If cisco doesn’t really believe in software engineering as a discipline, putting all emphasis in domain-experise, that thought is copied and carried over. If cisco does system level-test only (often only automated as an after-thought), that is the methodology adopted across the industry. Thus is cisco does a poor job, then typically the rest of the industry follows.

The interesting bit here is not to bash a specific company or product; my apologies for the direct criticism which is necessarily unfair. The question is whether that particular observation (my shock after a long time not using this software) has greater significance to the industry.

In a previous post i’ve argued that networking companies sell software cheaply compared to enterprise software. If one considers “average sales price” (ASP) networking gear has an approximately 65% margin. Servicing the hardware devices themselves will probably require a 30% margin: lets not forget that the vendor must do qualification, repair defective units and handle a large number of logistical challenges.

Thus the argument that networking software is actually inexpensive and that there is very little room for a vendor to enter the software market along with a whitebox strategy. The major flaw is that logic is an assumption that the quality of such software is good or at least acceptable.

I’m starting to question whether there isn’t indeed the market opportunity to build a networking software centric company that is successful. Not based on price but solely based on software quality (or lack thereof).

Egress Selection

A very creative use of BGP L3VPN technology is the use of multiple routing table topologies in order to select the egress point for different types of traffic by service providers. Over time several network architects have explained to me the when and how to apply this technique.

One example that comes to mind was a design used by a service provider in Europe with several gaming companies as customers. While any reasonable ISP core network is usually uncontested, peering points often experience congestion. Part of it is difficulty in getting capacity, some is the fact that smaller outfits run with very little wiggle room in terms of capital and fail to update their links on time and it does seem that at least a decent amount of it is intentional. ISPs both cooperate and compete and they often put pressure on each other by intentionally delaying the upgrade of peering capacity.

Gaming is all about latency. Web browsing and file transfers which consists of the bulk of the traffic are reasonably latency tolerant. But gaming applications are very latency sensitive and gamers are willing to spend money online. Thus if you do carry gaming traffic in your network it is desirable to be able to steer this traffic through the shortest path and make sure that bulk traffic is sometimes steered across longer paths in order to avoid congesting links.

For instance, if one peers with network A in points P1, P2 and P3 and P1 is getting congested it makes sense to steer bulk traffic that would otherwise go to P1 to points P2 and P3. The most scalable way to accomplish this is to have an MPLS core and steer the traffic at ingress. This requires having multiple routing tables at the ingress router and the ability to use FBF to select what types of traffic to assign to each routing table.

BGP L3VPN is the ideal technology for this problem. Using L3VPN it is possible to export all the routing tables from the peers, before route selection in the main routing table. In the ingress point, multiple topologies can be constructed using this original routing information.

As an example, assume that the ISP has as potential congested peers CP1 and CP2. In the peering gateways that face these peers one would place the peering sessions on a VRF (specific to the peer) and route leak the BGP routes into inet.0 by using a rib-group. Note that it is possible filter the routing information from being installed in the forwarding table in order to conserve forwarding memory. The next-hop advertised by BGP L3VPN is an MPLS next-hop by default; thus the IP forwarding information is not required for egress bound traffic.

In gateways that act as ingress points for traffic (which may overlap with the ones that peer with congested peers), one would configure a bulk VRF table that imports routes from congested peers. By manipulating the local-pref attribute, it is possible to make the congested egress points less preferable overriding the IGP decision. An FBF rule can direct bulk traffic to this routing-instance, with a default route of “next-table inet.0” in order to fallback to standard route selection for peers that are not subject to special treatment.

Over time I’ve seen a few variants of this design. Some ISPs allow some of their customers to perform upstream selection and always traverse one of their upstreams which they believe provides better service even if a preferred route is available from another peer.

One interesting application of OpenContrail is that it can allow a network operator to apply the same technique for an application, rather than a customer circuit. With OpenContrail it is possible to place an application (running in either bare-metal or a virtual machine) into a VRF. This can be, for instance, the front-end that is responsible to generate video traffic or gaming updates.

While OpenContrail doesn’t have the same policy language capability available in JunOS, since it is open source, the control plane code can be customized by anyone in order to perform a specialized path selection decision that satisfies a particular application.

An OpenContrail deployment ressembles an L3VPN PE/gateway router. Typically 2 servers (for redundancy) run the control plane, which can interoperate directly with L3VPN capable routers at the network edge. These control plane servers can typically control up-to 2k compute servers running applications (virtualized or not). Encapsulation can be MPLS over GRE end-to-end or MPLS over GRE within the data-center followed by MPLS over MPLS on the WAN, by using a pair of data-center gateways.

By reusing BGP L3VPNs OpenContrail not only reuses technology that has been “battle tested” in large deployments as well as a lot of network design tricks that have been discovered over time.

Application specific networking

Linux namespaces and OpenContrail can be combined to create application specific networks where a specific process on a Linux host can be associated with a virtual-network.

With OpenContrail this application can be placed directly in a Layer 3 VPN which can extend across the WAN; load-balancing can be performed via “floating-ip” addresses associated with multiple instances of the application; routing between virtual-networks is performed in a fully distributed manner; ACLs can be configured; and flow records are collected in a time-series data-base for subsequent analysis.

As an example, lets start an apache web server in a virtual-network on a Linux machine.

The first step is to install OpenContrail in the machine. This can be achieved via a binary distribution or by compiling the source code and manually installing.

This script contains all the steps involved in preparing a build VM, compiling the code and starting the software. While it has been developed to run as part of devstack it can be executed independently. OpenContrail can run independently of OpenStack/CloudStack. For a production deployment, you will need to start the configuration and control components of OpenContrail in a couple of servers (for redundancy). For test purposes one can run the configuration, control and virtual router components in the same server.

Once the software is installed, one needs to define application instances and their networking properties using the contrail API.

This script contains an example of how to define an instance and associate it with a virtual-network. The contrail configuration management component allocates an IP address and mac address for the application instance. Currently the API uses the terminology “virtual-machine” and “virtual-machine-interface” but there the implementation supports any kind of application instance.

Next we need to define a networking namespace on the server running the application:

ip link add veth0 type veth peer name veth1
ip netns add service-1
ip link set veth0 netns service-1


And associate the “veth1” interface with the “virtual-machine-interface” created on the OpenContrail configuration API. This script can be invoked with the VM and VMI uuids created above. It takes as arguments the VM uuid and VMI uuid and veth1.

Once this is done, the veth1 interface should be visible in the OpenContrail vrouter agent (http://server-ip:8085/Snh_ItfReq?name=veth1).

Now we need to configure the veth0 mac and set the peer up. It is important that the veth0 mac address be the same address as defined for the “virtual-machine-interface” on the contrail API. Incoming traffic will be rejected by the Linux kernel otherwise.

ip netns exec service-1 ifconfig veth0 hw ether xx:xx:xx:xx:xx:xx
ip set link veth1 up

Now we can setup the networking on the namespace:

ip netns exec service-1 dhclient veth0

And start an application:

APACHE_LOG_DIR=/var/log/httpd APACHE_RUN_USER=www-data APACHE_RUN_GROUP=www-data sudo -E ip netns exec service-1 /usr/sbin/apache2

Once this is done any other host in the virtual-network should be able to access this HTTP server. OpenContrail network policies can then be used to define connectivity between virtual-networks as well as access control lists (ACLs). And OpenContrail’s standards based interoperability means that the virtual network can be extended from the overlay to a WAN via any RFC 4364 capable router.

OpenContrail is a great replacement for the current combination of haproxy, vrrp, ipchains, dnsmasq and who-knows what else is currently necessary to deploy a load balanced application.