Virtualized network appliances

The business model of a network equipment vendor works roughly like the following: assume that you have a piece of equipment that has a Cost Of GoodS (COGS) of 1 unit; multiply by 10 to get list price for an enterprise product or by 20 in the case of a service provider product. Then discount by 60% or 80% respectively to get an Average Sales Price (ASP) of 4 units.

That would yield a margin of 75%. Unfortunately the numbers never quite add up to that. There always is a bit of a cost overrun; and large volume customers always end up with a few extra discount points. If you dig into the financial reports of a major vendor, margins over around 65%.

Out of this 65% gross margin the revenue must be allocate to 3 major buckets: cost of sales and marketing, R&D and bottom line. Investors are looking for an operating margin of 20%; sales can easily consume more than 30% of revenue; R&D must be funded because when you sell the equipment you also sell a perpetual software license and customers expect maintenance and new functionality for a period of at least 5 years after purchase.

Yes, an equipment vendor will “price” a $6,500 PC for $65,000; but that number is entirely fictional. The average customer pays $22k out of which $6k could be COGS; leaving $16k for a perpetual software license which is not at all outrageous when compared with licenses of enterprise software. There is some wiggle room in this business model: one could sell the gear at COGS + 30% and then monetize the software with license plus yearly subscription. But personally, i don’t expect the math to change by much between physical or virtual based appliances. There is not that much wiggle room.

Equipment vendors play a lot of attention to cost of goods. As we’ve described above, it underlies the entire economic model. It is the first decision to be taken in a new project: the curiously named “not to exceed” cost (you can add cost overrun to “death and taxes” when it comes to life’s certainties).

A physical appliance is always going to be more efficient than a virtualized one when it comes to cost of goods and power efficiency. In one end of the spectrum you have a forwarding device (e.g. router/switch) which is going to be probably 10-20x most cost efficient in the non-virtualized form. At the other end, one has boxes like security appliances that tend to be made out of general purpose CPUs (although probably using something most cost/power efficient than intel/amd). Virtualizing a particular appliance will increase cost, often in a non-trivial way, when compared to its physical counter part.

So why do it ? What is the idea behind virtualizing network equipment ?

There are two complimentary answers: resource utilization and agility in terms of time to deploy a service.

The first one is the usual Time Division Multiplexing vs Time-sharing equation. If one deploys a firewall service, to take an example, as a distributed appliance in every branch office then the firewall capacity required is the sum total of every branch office max throughput; although the utilization of each is likely to be very low in average. A virtualized offering deployed in a metro/regional POP can be scaled for the maximum aggregated throughput that is actually seen in the network. The delta can be very significant. Note that this concentrated service could be deployed with either physical or virtual appliances in order to get the same benefits. Subscriber management architectures (DSL, Wireless) tend to already follow this design aggregating a very large number of subscribers to a concentrated set of resources.

The other key factor to consider is agility. Even when less efficient, an environment where one doesn’t have to wait for physical appliances to be procured and provisioned may pay for itself. An infrastructure that is pre-positioned and is ready to accept new services can allow for rapid deployment of new service offerings.

This however requires a completely different approach to provisioning and management of network services. One model that can potentially deliver on the agility promise is one where the service is entirely managed by the tenant. This is after all the service model of AWS, Meraki and other players that have been re-imagining the space.

My fear is that carriers maybe tempted to marry the cost inefficiencies of appliance virtualization with their traditional management solutions in the form of complex operational procedures glued by the OSS/BSS. I hope that by openly discussing the cost structure of network equipment we can at least bury the PCs-are-cheap-so-by-using-servers-i-m-going-to-reduce-cost argument. That should not be the goal as the numbers do not add up.


Networks are about connectivity

The service that OpenContrail provides to virtual-machines is quite simple. I always find myself struggling trying to explain simple concepts. Please bear with me…

A virtual machine exists to run applications; the goal of an orchestration system such as OpenStack is enable application developers to deploy and manage application components in a self-serve manager. That means that the resources that a virtual machine provides (memory, cpu, storage and network) should be simple to understand to the application developer.

When it comes to storage, the answer is simple. A virtual-machine has access to block storage, object stores and distributed file-systems (e.g. NFS). When it comes to networking there are plenty of unnecessarily complex models behind proposed; networking doesn’t have to be complex.

From an application developer’s perspective, a network is a collection of devices that can communicate with each other; each device having an unique identifier (ideally an hostname). It is useful to be able to group some of these devices by their characteristics such as who is allowed to manage them and what their role is. We can call these groups “sub-networks”. In practical terms each of these sub-networks corresponds to an application tier.

Application stacks are made of several different components with different management characteristics. Typically the database layer is a centralized service serving multiple applications; an application may include multiple application servers, some specific to that user visible application; others following a SOA model. Each of these “subnets” contains the collection of virtual-machine that implement that service; and needs connectivity to all the other collections of devices that it needs to communicate with.

A virtual-network in OpenContrail models that “subnet”: it is an IP subnet plus a set of policy that determines the connectivity of that subnet. That is it.

In what way is this new ? For all the talk around Software Defined Networks, most people are still trying to model networks in terms of bridges and routers and other artifacts that irrelevant to the goal of providing connectivity to the application.

Taking a physical appliance and instantiating it as a virtual-machine doesn’t necessarily make the network simpler to manage. It may actually create additional issues, since now all these appliances have to be managed. These virtual appliances will have lower capacity than existing physical based ones; they will need to address issues such availability and reliability.

What OpenContrail does is use a Logical Model of the connectivity. With this logical model we create an implementation that provides connectivity to virtual-machines based only in the ingress and egress hypervisor software. On top of any IP capable physical network. It gets rid of routers and switches in the application network topology.

Both hardware and software are poor choices to define networks around. Networks are about connectivity.

With the OpenContrail vrouter as the hypervisor switch, a virtual-machine interface (e.g. the tap interface used by KVM) is associated with a isolated forwarding table (VRF). In the absence of any network policy configuration, this VRF contains host routes for all other virtual machines in the same virtual network. When network policies are configured the OpenContrail control-node automatically informs the vrouter of any virtual-machine host routes or external routes that this VRF has connectivity to. Network policy expresses connectivity and traffic filtering policies. For instance, it is trivial to create a policy such that only a specific TCP port is forwarded between two different virtual-networks. Traffic forwarding and policy enforcement is performed directly at the ingress hypervisor. There is no need to bounce the traffic to a “virtual router” VM that adds latency and management complexity.

OpenContrail and OpenDaylight

On and off there is a discussion of potentially integrating OpenContrail with OpenDaylight. This may sound reasonable at 10,000 foot but once we look at the technology the problems become apparent.

First, lets start with what OpenContrail does:

  • OpenContrail provides a virtual network service to virtual-machines managed by an orchestration system such as OpenStack;
  • OpenContrail uses an orchestration system such as OpenStack to provide virtualized network services both to data-center virtual networks as well as L3VPNs.

The first challenge to integrate OpenContrail with OpenDaylight is that the later doesn’t have a VM scheduler that can start and manage virtual machines; it lacks some of the critical functionality of an orchestration system.

It is however attempting to solve the same problems that OpenStack already solves:

  • It is trying to provide services APIs competing with Firewall-as-a-Service, Load-balancer-as-a-Service, VPN-as-a-service APIs from OpenStack;
  • It is trying to enable multiple plugins that manage an underlay providing L2 services; a problem already solved by the OpenStack ML2 plugin.
  • It is attempting to decouple the virtual appliances service configuration from the network topology: a problem that OpenContrail already solves.

From a problem definition, it seems to me that OpenDaylight is in full collision course with OpenStack; the later has already solved the problems of how to integrate orchestration and network, specially if used with OpenContrail which allows the underlay network to be managed as a simple layer 3 network that doesn’t need to be managed.

One can always argue that it would be desirable to have neutral APIs for networking that are not tied to OpenStack; on CloudStack, the other orchestration system that OpenContrail integrates with, using OpenStack APIs isn’t really a big deal: the storage subsystem supports Swift for instance. And CloudStack by itself is already quite far along in solving the same problem: managing services independently of managing the network topology.

It is hard to understand the role of OpenDaylight unless it becomes a full fletched orchestration system; this is not such a bizarre idea as it may sound. Open source orchestration systems are still in their infancy when compared to proprietary systems that run large scale data-centers; current orchestration systems do a poor job of handling transient failures for instance: software failures in the compute-node always leave both my OpenStack and CloudStack clusters with VMs in wrong state that require manual reset.

At the moment, with OpenDaylight being a small subset of OpenStack/CloudStack it would be technically unfeasible to integrate OpenContrail.

How does it integrate with the OSS ?

I’ve been recently in a set of conversations at work about service enablement in carrier networks. Somewhere in the middle of the discussion someone inevitably asks: “How does your proposal integrate with the OSS ?” and down the rabbit hole we go…

The dissonance between my personal views and what seems to be the majority of the industry starts with the name of the conversation. The meeting invitation will invariantly be about “NFV” (that is Network Function Virtualization). I strongly dislike the acronym. In my mind it conveys the meaning that the issue is how one migrates existing network equipment to a x86 based CPU as a cost saving measure. Defining the problem that way misses the forest for the trees, in my opinion.

When it comes to wireline services, carriers drive most of their revenue from enterprise customers. These customers are now being served by a range of infrastructure (e.g. Amazon), software (e.g. Salesforce, Workday) and services companies that are capable of rolling dozens of new offerings a year.

Defining the problem as “Virtualization” misses the mark. The problem definition is how can carriers manage to roll out new value added services in time frames that are comparable to over-the-top offerings. They may not ever achieve the agility of an Amazon or Google but they need at least be within an acceptable range if that enterprise revenue stream is not going to be fully captured by over-the-top players.

The correct problem definition is, in my opinion, “how to you roll out a new service every 3 months ?”.

It is easy to come up with some self-evident “DOs” and “DON’Ts”. Chiefly on the “to be avoided” list is: form a design team to create a proposal on how to build an infrastructure that can provide support any arbitrary service.

Realistically, the only way to achieve the goal stated above is to take an experimental approach: take one concrete service; remove all obstacles out of the way and get to the point where the new service can be validated by customer. Scale is a problem to be solved only for services that customers are willing to pay for; over designing a solution before validation is not cost effective.

There are a set of principles that quickly become evident by chasing a concrete goal:

  • One can’t wait for hardware to be procured; the initial instances of a service must be deployed in a small set of regional resource pools (i.e. data-centers) that consist of general purpose hardware;
  • The service must be self-service and automatically provisioned; humans can’t be involved in the provisioning and operation: 3 months is not enough time to device operational procedures and train an operations staff.

The latter point brings us back to the OSS. A significant piece of the answer around OSS integration is that the service should not be managed or operated in a traditional way. The standard mode of operations requires an integration cycle that is 12 to 18 months at best; that is just not acceptable in the current environment. As much as that may be a culture chock, tools need to satisfy business goals.

Of course there still needs to be a billing system. There still needs to be a way for customer support to understand what services are enabled for a particular customer and what is their status. But these will most likely have to be developed as a parallel system to the existing OSS.

This is not a very popular answer. It is however the logical conclusion of following our problem definition.

The reality is that cloud based services and wireless devices have been slowly eroding the traditional value that carriers bring to their larger enterprise customers. “NFV” doesn’t do anything to address that problem. Rapid service enablement may…

Adding a BGP knob to OpenContrail

This article is intended as simple tutorial on how to add a configuration option to the OpenContrail control-node that controls the behavior of the BGP implementation.

The configuration node (the schema-transformer process) automatically assigns route-targets to routing-instances that are created in order to implement network virtualization. These route-targets are assigned from the space corresponding to the autonomous-system that the data-center cluster is on (64512 by default).

The user can additionally define an external route target for networks that extend beyond the data-center boundary (e.g. the public network). For this tutorial we will implement a knob that strips out route target communities that contain private ASes.

In order to get started lets create a sandbox and initialize it. In a unix shell type the following commands:

The first step towards defining a new knob is to add it to the schema. OpenContrail auto-generates the REST API that stores the configuration and makes it available through the IF-MAP server. It also generates the API client library that is can be used to set the configuration parameters. The BGP related schema is present in controller/src/schema/bgp_schema.xsd.

Use your favorite text editor to modify this file and add a new XSD element called “rtarget-strip-private” in the type BgpSessionAttributes. This is the data type that is associated with bgp peering sessions.

After editing this file, execute the command scons controller/src/api-lib. This command builds the Python client api library that we will use later on to set the new configuration parameter. You can poke around at the generated code:

grep rtarget-strip-private build/debug/api-lib/vnc_api/gen/*

Should yield a couple of references to the newly added configuration parameter.

In order to use this new parameter we will modify the helper script typically used to configure BGP peering sessions with an external BGP speaker (controller/src/config/utils/ As an example, we will add the new attribute only when configuring BGP sessions of for a ‘router_type’ of “mx”.

At this point, we need to generate a test api-server in order to validate our code. The following sequence of shell commands will build and start a mock api-server on port 50000.

The next step is to create a test script that will configure 2 BGP peers and a session between them. This can be achieved by creating a new file (controller/src/config/api-server/tests/ that executes our test case.

After executing this script, we should be able to see the contents of the “bgp-routers” table using the command:

wget -O- http://localhost:50000/bgp-routers | python -mjson.tool

This should display 2 entries, each with the router name specified in our test script. Examine the “cn-test” router by using the “wget” command with the “href” of the “cn-test” router. It will display the new knob (rtarget-strip-private).

Implementing the BGP component

Before proceeding with the implementation we need to look into how  BGP update generation works. In the OpenContrail implementation, the routing table is responsible for calculating the desired advertisement attributes (the RibOut attributes). The RibOut class has a dual purpose: it holds the head of the update queue; it also groups the peers that advertise this table with the same export policy, represented by the RibExportPolicy class.

Additional background information on the design of the BGP component can be found here.

Since our new knob modifies the BGP attributes that are generated, the first thing that we need to do is to add a new member to the RibExportPolicy class (rtarget_strip_private) and modify the constructor to include the new parameter.

By modifying the constructor we can discover that the “policy_” initializer in BgpPeer needs to be modified in order to pass the new parameter.

The remainder of the invocations of the RibExportPolicy constructor should receive the new parameter as “false“. There is one invocation in the production code, while initializing an Xmpp peer, and several calls in the test code while initializing test scenarios. There will need to be edited to get the code to compile again.

Now we are probably ready to implement our knob. Following the Test Driven Development methodology the first step is to build our unit test. We want to build a test that has a test case with a route that only has non-private AS route targets and is unaffected by this change; and a test case where a route-target community is stripped from the route advertisement.

The UnitTest code is in the file controller/src/bgp/l3vpn/test/

It works by creating an InetVpnTable and a corresponding RibOut with an export policy where the rtarget_strip_private option is true. We use a mock BgpMessageBuilder to record the attributes that are encoded to the peer. The BgpMessageBuilder interface is called when an update is removed from the queue and needs to be encoded on the wire via either BGP or XML encoding. In this case, our mock records the communities that are advertised to the test peer.

The test cases themselves work by creating a route with a specific list of route targets, adding the route to the table and verifying the list of communities seeing by the message builder interface.

Once this test is compiled, the “Match” test case should fail since we have not yet implemented the functionality. We can do that now:

Rerun the Unit Test. It should now pass.

You can find the code used in this example in the “tutorial” branch of the contrail-controller repository in GitHub.