Kubernetes and OpenContrail

I’ve been working over the last couple of weeks in integrating OpenContrail as a networking implementation for Kubernetes and got to the point where i ‘ve a prototype working with a multi-tier application example.

Kubernetes provides 3 basic constructs used in deploying applications:

  • Pod
  • Replication Controller
  • Service

A Pod is a container environment that can execute one or more applications; each Pod executes on a host as one (typically) or more Docker processes sharing the same environment  (including networking). A Replication Controller (RC) is a collection of Pods with the same execution characteristics. RCs ensure that the specified number of replicas are executing for a given Pod template.

Services are collections of Pods that are consumable as a service. Through a single IP end point, typically load-balanced to multiple backends.

Kubernetes comes with several application deployment examples. For the purpose of prototyping, I decided to use the K8PetStore example. It creates a 4-tier example: load-generator, frontend, redis-master and redis-slave. Each of these tiers, except for the redis-master) can be deployed as multiple instances.

With OpenContrail, we decided to create a new daemon that listens to the kubernetes API using the kubernetes controller framework. This daemon creates virtual networks on demand, for each application tier and connects them together using the “Labels” present in the deployment template.

A plugin script running on the minion then connects the container veth-pair to the OpenContrail vrouter rather than the docker0 bridge.

The network manager daemon logic is doing the following:

  • For each collection (i.e. group of Pods managed by an RC) it creates a virtual-network. All these virtual networks are addressed out of the cloud private space (10.0.0.0/16 in my example).
  • Each Pod is assigned an unique address in the Private space (10.0.x.x) and by default can only communicate with other Pods in the same collection.
  • When a service is defined over a collection of Pods, that service implies the creation of a new virtual network in the services space (a.k.a Portal network in kubernetes).
  • Each pod in a service is assigned the floating-ip address corresponding to the PortalIP (i.e. the service VIP); thus traffic sent to the service will be equal cost load balanced across the multiple back ends.
  • In the k8petstore example, the collections use the kubernetes labels “name” and “uses” to specify what tiers communicate with each other; the network manager automatically creates network access control policies that allow the respective Pods to communicate. The network policies are being provisioned such that when a collection X has a deployment annotation that it “uses” collection Y, then X is allowed to communicate with Y’s virtual IP address.

The current prototype is very interesting in terms of highlighting how a tool like kubernetes makes application deployment easy, fast and reproducible; and how network micro-segmentation can fit in in a way that is transparent to the application owner and provides isolation and access control.

The OpenContrail kubernetes network-manager, can automate the deployment of the network since it is exposed to the collection (RC) and service definition. While advanced users may want to customize the settings; the defaults can be more useful and powerful when compared with an API such as AWS VPC and/or OpenStack Neutron.

One important difference from a usage perspective vs the tradition OpenStack + OpenContrail deployments, is that in the kubernetes prototype, the system is simply allocating private IP addresses that are unique within the cloud while isolating the pods of the same collection. For instance, in our example, if the frontend Pod has the address 10.0.0.2 and redis-master the private address 10.0.0.3 and the VIP (aka Portal IP) of 10.254.42.1 the topology is setup such that:

  • The frontend network contains 10.0.0.2 (but is enable to forward traffic to 10.0.0.3);
  • The frontend network is connected to the redis-master service network (which contains the floating-ip address 10.254.42.1).
  • The redis-master network contains 10.0.0.3.

Traffic from the front-end to the service VIP is forwarded to the OpenContrail vrouter in the minion where the service is executing (with an ECMP decision if multiple replicas are running). The destination IP is then translated to the private IP of the redis-master instance. Return traffic flows through the redis-master service network which has routing tables entries for the frontend network.

With OpenContrail, the kubernetes cloud no longer uses the “kube-proxy”. The default kubernetes deployments uses a TCP proxy between the host and the container running in a private address on the docker0 bridge. This creates a need for the service/pod definition to allocate host ports and prevents 2 docker containers that want the same host port to execute in the same minion. OpenContrail removes this restriction; the host network is completely out of the picture of the virtual-networks uses by the pods and services.

I’m looking forward to rewriting the prototype of the network-manager daemon over the next few weeks and add additional functionality such as the ability to deploy source-nat for optional outbound access from the containers to the internet as well as LBaaS, for scenarios where fine grain control of the load-balancing decision is desirable.

The trajectory of Open Daylight

When the Open Daylight project started, it was clear that the intent on the part of IBM and RedHat was to replicate the success of Linux.

Linux is today a de-facto monopoly on server operating systems. It is monetized by Redhat (and in smaller part by Canonical) and it essentially allowed the traditional I.T. companies such as IBM, Oracle, HP to neutralize Sun microsystems which was in the late 90s, early naughts, the platform of choice for Web application deployment.

Whether the initial target of these companies was Sun or Microsoft, the fact is that, by coming together in support of a open source project that had previously been an hobby of university students they inaugurated the era of corporate open source.

This was followed by a set of successful startup companies that used open source as a way to both create a much deeper engagement with their customers and of marketing their products. By originating and curating an open source project, a startup can achieve a much greater reach than before. The open source product becomes a free trial license, later monetized in
production deployments that typically need support. Open source also provides a way to engage with the consumers of the product by allowing them to become involved, contribute and help define the direction. It has been very effective in examples such as MySQL, Puppet Labs and Docker.

This landscape became more complex with the advent of OpenStack and the OpenStack foundation which not only includes a set of open source projects but also serves as a strong marketing organization. The OpenStack foundation, with its steep membership fees, was a mechanism by which, initially, Rackspace was able to share the costs of its marketing initiatives around its public cloud in its, until now, unsuccessful attempt to compete with Amazon Web Services (AWS).

It also created a group of very strange bedfellows. The companies investing the most on OpenStack are the giants of the I.T. world which, most likely, initially targeted AWS and VMWare as the common adversary. They are however
finding themselves incresingly in a situation where they are most interested in competing with each other in terms of providing private cloud solutions to enterprises, either hosted or on-premisse. This landscape keeps involving and the most interesting question at the moment is how are these companies going to be able to cooperate and compete with each other.

They must cooperate in order to create a bigger ecosystem around private cloud that can match the public cloud ecosystems of AWS, Azure and GCE. They must compete in order to differentiate their offerings. Otherwise we are left with two options for monetization: it is services services and Mirantis will get the cake OR (less likely) installing and operating a private cloud becomes a shrink wrapped product and Redhat wins.
Either way, non-differentiation implies a winner takes all monopoly.

It was at the height of the OpenStack euforia that Open Daylight was conceived. The assumption of its I.T. centric founders was that the networking vendors, which have the domain expertise, would collectively developed the software to be then monetized as part of an OpenStack distribution as software support.

I’d the opportunity to ask one of the people involved in the creation of the project for his thoughts on monetization and i got a clear answer: “we expect network vendors to monetize switches just like server manufacturers monetize NICs, video cards, storage”.

It is not surprising that the most technically savvy of the network vendors have adopted an approach of participating in ODL, in order to be aware of what is going on, but focus their energies on other strategic initiatives.

Network gear is a software business. Switches themselves are mostly manufactured by ODMs using third-party sillicon. Routers, specially service provider routers require much more flexible forwarding engines than switches, typically network processors. But those also are available from Broadcom (although lower capacity and capability than the special purpose sillicon of top vendors).

Being a software business allows network vendors to hit gross margins of 60%+. ODMs have margins that are much much lower. A networking company must have an independent (and differentiated) software strategy. This is a business imperative.

Some vendors, typically the weakest in their engineering resources, have tried to build such
a strategy on top of ODL. Use ODL as a marketing machine since it has a budget at least an order of magnitude higher than one would have for “SDN” in a networking vendor. At first glance this sounds like sensible a sessible strategy: build your own differentiated product wrapped in ODL-marketing aura.

The problem is that it transformed ODL into a frankenstein of multiple vendor projects with very little or not relationship to each other. The several controllers being driven by the multiple vendors participating in ODL are all different. One vendor had recent press where they where describing 3 distinct controllers. Given the lack of commonality between the goals of these different projects it is completely unclear what ODL is at the technical level. Which is not entirely surprising given that the term “SDN” has no technical definition in itself.

At this point ODL becomes a brand, not an open source project. And a tainted one at that. Given that none of the multiple controllers can be seen currently in any qualification testing in either carrier or enterprise one can only conclude that there isn’t a single one that quite works yet. Software engineering projects are built in timeframes that are longer than the hype cycles generated by a large marketing machine. This inevitably leads to disappointment.

That disappointment is growing and will likely snowball.

The meaning of Cloud

The term “Cloud” refers to a software development and delivery methodology that consists of decomposing applications into multiple services (a.k.a. “micro-services”) such that each service can be made resilient and scaled horizontally, by running multiple instances of each service. “Cloud” also implies a set of methodologies for the application delivery (how to host the application) and application management (how to ensure that each component is behaving appropriatly). The name, “Cloud”, seems to only capture the delivery piece of the equation, which is the proverbial tip of the iceberg.
An example that would help us break down the jargon into something a bit more concrete: a simple web application that lets a user add entries to a database table (e.g. “customers”) and perform some simple queries over this table. This is what is known as a CRUD application, from the initials of Create, Read, Update, Delete.
The “classic” version of this application would be a VisualBasic/Access (in the pre-Web days), .NET/SQLServer or Ruby On Rails/MySQL application. The software component is responsible to generate a set of forms/web pages for the user to input its data, execute some validation and access the database. In most application development frameworks (e.g. RoR), this example can be made to work in hours or a few days.
One minor issue with our example above is that, typically, not every user in an organization has the same level of access. Even when that is the case there is often the need to audit who does what. Thus our application needs to have also a “user” table and some sort of access control. Not a big deal: a couple of more forms and a new database table of users is created.
Until someone else creates another CRUD application (e.g. to manage inventory) that also needs users and access control rules. Clearly components that are common to multiple applications should be shared. Lets assume that our developers built a simple web-API that can use an LDAP backend for authentication and manages a set of access control list rules. Both our CRUD applications can use this authentication service to go from username/password to a cookie and then query the authorization information from cookie to access permissions with then application.
By now we have a reasonable description of what a simple “classic” application looks like from a development standpoint. In our “example, each of our CRUD applications and the authentication service consist of a single VM built manually by the development team. These VMs are then handed off to the system administration group which configures monitoring and network access.
The above roughly describes the state of the art in many enterprises; except that the number and complexity of the applications is significantly larger. And that “customer” and “inventory” applications are actually not typically developed in house; these are often components of CRM software suites built by third parties. They only serve in our story as examples.
The key issues with our “classic” application are:
  •   Reliability
  •   Scale
  •   Agility
Of these three, scale, is often not the major concern unless this application is being delivered to a large audience. That is the case in both consumer and SaaS markets but less so in enterprise. We can think of scale in terms of the number of concurrent sessions that the application needs to serve. In many cases this number is low and does not warrant a significant investment.
Reliability comes from two different vectors: the correctness of the software itself (which we can think of a function of the test coverage); and the availability of the infrastructure.
The “classical” approach to infrastructure availability has been to try to make the infrastructure as resilient to failure as possible. A significant factor behind this approach is the handoff point between software and infrastructure management. If those responsible for running the application (infrastructure teams) are not aware of the design or availability requirements of the application they can only assume worst case scenario.
For the infrastructure to completely mask power, network, server and disk failures without understanding the application semantics is so prohibitively expensive as to be considered practically impossible. Still, infrastructure teams are typically measured in terms of uptime. They attempt to mask single disk failures and single server failures with virtual machine restart, which does have impact to the application. Network card or switch failures can be masked with server link redundancy also. It is common to have a goal of 99.999% availability.
That begs the question of what happens 0.001% of the time, which corresponds statistically to roughly 8 hours per year. The problem with statistical averages is that as the number of application servers increase so do the failures. Assuming 1000 application servers and a perfect distribution of failure, one can assume that there is at least 1 failure occurring at any particular point in time, despite the significant resource and performance cost of infrastructure based availability.
It also turns out that masking failures also ends up making the impact of a failure worse from a software reliability perspective. Events that happen less frequently may not be tested; which then may lead to catastrophic failures such as an application loosing transactions or leaving data in invalid state.
The “cloud” approach to availability is to expose a (micro)service directly to infrastructure failures but hide them at the micro(service) level; this means that the authentication service in our example above would be responsible to serve its APIs successfully independent of data-center, power, server or disk failures. It goes further: it stipulates that one should purposely trigger failure on the production infrastructure in order to verify that the services are still operational.
Google, for instance, simulates large scale disasters yearly in order to ensure that its services are still operational in the event of a major disaster such as a earthquake or other large natural disaster that could affect its infrastructure. Netflix created a software tool called “chaos monkey” whose job is to randomly kill production servers as well as produce other types of havoc.
This is not as crazy as it seems: users care about the total availability of the system of which software reliability is the most important component. Application software is more complex in terms of functionality than the infrastructure it runs upon and thus more prone to failure.
The financial crisis of 2008 highlighted the “black swan” effect. The  consequences of events with very low probability but with catastrophic effects, which tend to disappear in statistical risk models such as 99.999% availability. The “cloud” philosophy is to increase the probability of failure in order to avoid “black swans”.
One reasonable criticism of this approach is that it creates more complex software architectures with additional failure modes.
Perhaps instead of discussing “cloud” one should focus on modern software engineering practices. These have the goal of taking software from a an ad-hoc artisan mindset and transforming it into a first class engineering discipline. Modern software engineering has the dual goals of maximizing agility and reliability; its corner stone is testing. And testing requires a very different hand-off and service model between developers and infrastructure.
Modern software engineering practices typically imply release cycles within a range from 2 weeks to 3 months. The intend is to release to production incremental sets of changes that are tested and from which one can gather real world feedback.
Software is expected to be tested in:
  • unit test
  • integration test
  • system test
  • Q/A and staging
  • A/B testing
While unit and integration test happen in developer workstations (or a cloud application that pre-verifies all proposed commits); system test, staging, A/B testing and troubleshooting require the ability to create production like application environments that are the exact mimic of the production configuration. Testing against triggered infrastructure failures is typically a requirement of both system and Q/A testing.
A software release cycle implies a carousel of application execution environments that execute simultaneously. If release X is the stable release in production, there may be release X+1 soaking in production for A/B testing; potentially multiple environments running release X+2 for a Q/A environment with simulated traffic and system test environments on the pre-released version X+3. Development may need to go back and create a system test environment for an arbitrary version previous to X in order to do troubleshooting.
This development methodology requires that all interactions with the infrastructure are based on version controlled deployment templates (e.g. CloudFormations, Heat, etc) that exercise an API. Trouble tickets or GUIs are not desirable ways to interact with the infrastructure because they do not provide a repeatable and version controlled method to describe the resources that are in play.
In summary, cloud is the result of the prescribed approach of modern software engineering practices that attempt to improve reliability and agility for software. The main driver to adopt a cloud infrastructure is to serve a community of application developers and their requirements.

IPv6

Recently, I’ve heard several people suggest that the advent of IPv6 changes the requirements for data-center virtual network solutions. For instance, making the claim that network overlays are no longer necessary. The assumption made is that once an instance has a globally unique IP address that all requirements are met.

In my view, this analysis fails in two dimensions:

  • In the assumption that it is desirable to give instances direct internet access (via a globally routed address);
  • In the assumption that overlay solutions are deployed to solve address translation related problems;

Neither of these assumptions hold when examined in detail.

While there are IaaS use cases of users that just want to be able to fire up a single virtual-machine and use it as a personal server, the most interesting use case for IaaS or PaaS platforms is to deploy applications.

These applications, serve content for a specific virtual IP address registered in the DNS and/or global load-balancers; that doesn’t mean that this virtual IP should be associated with any specific instance. There is layer of load-balancing that maps the virtual IP into the specific instance(s) service the content. Typically this is done with a load-balancer in proxy mode.

As an aside, enabling IPv6 in the load-balancer is typically the best approach to make an application IPv6-ready; it is also the best way to do so. While IPv6 doesn’t add functionality to IPv4 (other than more addresses) it does add an additional burden in terms of manageability and troubleshooting; dual-stack applications or dual-stack networks are twice as much operational load, with no benefit as compared to terminating the IPv6 session on the load-balancer.

Back to our application, this is typically implemented as a set of multiple instances; often with specialized functions: front-ends (web-page generating); business-logic and caches; databases. The reasonable default, from a data security perspective, is to disallow internet access to and from these instances. RFC1918 addresses come in as a real benefit since they prevent someone from accidentally routing traffic directly.

In several IaaS platforms that i work closely with, as soon as the platform is operational, there is an incident with VMs generating DDoS attacks from the cloud (usually benefiting from high bandwidth internet connectivity). My recommendation currently, is for applications to LBaaS from inbound traffic plus a jump host with a SOCKs proxy for outbound access. That still leaves the LB and jump host as threat vectors but it is easier to monitor and manage than having direct internet connectivity from the VMs with either floating-ip (bi-directional) or source-nat.

Thus, to me at least, globally routed addresses are hardly a feature. Lets now examine the rational for using overlays in data-center networks.

The first thing we need to notice is that overlays are replacing vlan based designs (based on IEEE 802.1D) that already support the main functionality of the overlay: the separation between identifier and the “locator”. The identifier being the IP address of the instance and the locator the ethernet address in the case of 802.1D or IP address in the case of the IP based overlay.

The reason this separation is needed is because transport sessions, configurations and policies are tied to IP addresses. Assigning an IP address based on the server that is currently executing an instance doesn’t work for a multitude of reasons:

  • Instances need to exist independently of their mapping to servers;
  • Many configuration based systems are tied to IP address rather than DNS;
  • Network based policies are tied to IP addresses;

Operators deploy applications by defining a set of relationships between instances; this is typically done in an orchestration tool such as CloudFormations or OpenStack HEAT. When instances are defined and typically before they are executed and scheduled, IP addresses are assigned; these IP addresses are then used in configuration files and access control lists.

When an instance is scheduled, it is assigned temporarily to a server. Even in scenarios where life-migration of the instance is not supported, the “logical” concept of the Nth cache server for application “foo” must be able to span its scheduler assignment. The server can die for instance; or the instance may have to be rescheduled because of interference with other workloads.

Of course, it is possible to construct a different mapping between identity and location. One that it is named based, for instance. One example would be for every service (in the SOA sense of the word) to publish its transient address/port in a directory; given the limitations of the default Docker networking implementation, some operators are doing just that. The drawback is that it forces one to tweak every single application in order to use this directory for any resolution and policy enforcement. This provides the equivalent functionality of what the network overlay is doing.

There is at least one very very large cloud provider that does just that. They have a directory that provides both for name resolution as well and authorization. The tradeoff is that all services must speak the same RPC protocol; and no externally developed application can be easily deployed inside the infrastructure. The RPC protocol as become what most of us think of the TCP/IP layer: the common interoperability layer.

Overall, It doesn’t seem to me that IPv6 changes the landscape; it just creates an additional manageability burden in the case of dual-stack deployment through the application. If one does terminate the IPv6 session at the load-balancer, however, it should be mostly a NOP.

2014 in review

Looking back at 2014, it feels like a lot of progress was achieved in the past year in both the cloud infrastructure and NFV infrastructure markets. Some of that progress is technical, some is in terms of increased understanding of the key business and technical aspects. This post is my attempt to capture some changes I’ve observed from my particular vantage point.

This December marks the second anniversary of the acquisition of Contrail Systems by Juniper Networks. In the last year the Contrail team managed to deploy the Contrail network virtualization solution in several marquee customers; to solidify the image of the OpenContrail project as a production-ready implementation of the AWS VPC functionality; but, probably, more importantly to help transform attitudes at Juniper (and in the industry) regarding NFV.

In the late 90s and early naughts, the carrier wireline business went through a significant change with the deployment of provider managed virtual networks (using BGP L3VPN). From a business perspective, this was essentially outsourcing the network connectivity for distributed enterprises. Instead of a mesh of frame relay circuits managed by the enterprise; carriers provide a managed service that includes the circuit but also the IP connectivity. This is a service that has proven to be fairly profitable for carriers and a key technology for networking vendors such as Juniper. The company’s best selling product (the MX) earned its stripes in this application.

Carrier wireline is going through a similar change in the next few years. This time outsourcing network based services that are still present at either the branch office or central offices: security (firewalls), VPN access, NAT, wireless controllers, etc… The business case is very similar to managed connectivity. I believe it became clear during 2014 that these virtualized services are going to run on OpenStack clusters and that the OpenStack network implementation is going to have a similar role that the L3VPN PE had in connectivity.

The timeline for this transformation also seems to be accelarating. Some carriers with a more competitive outlook for their wireline business are planning on trials with live traffic for Q1/Q2 2015; some of these projects are in a rather advanced stage.

Across all the major carriers, I see a demand for a production-ready OpenStack networking implementation now. The consensus seems to be that there are two alternatives available on the NFV market: OpenContrail and Alcatel’s Nuage. Not surprisingly, both of these solutions where built on the technologies that made connectivity outsourcing successful: BGP L3VPN and EVPN. Technical experience provides an advantage in delivering production capable solutions.

While OpenStack seems to be solidifying its position in both the public cloud space and in NFV solutions, 2014 is the year that containers (which docker greatly helped popularize) went mainstream for SaaS/enterprise developers. A significant percentage of operators considering docker/container solutions, that I spoke with recently, are still uncertain of what orchestration system they will use. Most are considering something other than OpenStack. I expect that this space will keep changing very rapidly in 2015.
For the OpenContrail project this implies the need to integrate with multiple additional orchestration systems.

The “enterprise switching” space has also advanced significantly in 2014 when it comes to perception. I often explain to network engineers that OpenContrail implements the functionality that was traditionally present in the aggregation switch in a traditional 3-tier design. This is where, traditionally, access control policies and network based services where applied to traffic transitioning administrative domain.

The need to transition from an aggregation switch to a solution such as Contrail comes from the fact that increased bandwidth requirements force network engineers to opt for a CLOS fabric design. As the fabric bandwidth increases it is important to simplify the role of the fabric switch node. These switches are becoming increasingly commoditized to the point where most switch vendors offer pretty much the same product, with the variation being the software. Often network engineers attempt to reduce the functionality running in that software to the minimum.

2015 is likely to bring additional movement towards switch commoditization. There is still space for “premium” switching solutions but my understanding is that most industry observers would expect this to fall into a 80-20 rule with 80% of the market preferring an OCP-like switch.

Against this backdrop, the Contrail product is finally starting to excite the Juniper sales-teams. Taken on a per-server basis, potential revenue of the Juniper Contrail solution is in-line with selling switch ports, when taken at a 3-year time interval. Software has better margins and Contrail is a differentiated product with one viable competitor in each of the markets it plays in. My expectation is that we will see in 2015 a much greater interest from the parent company in the Contrail business unit; <smirk> which will undoubtably be a mixed blessing </smirk>.

Simultaneously, I believe that in 2015 OpenContrail will become much less of a Juniper project and much more of a partnership of different vendors and cloud operators. From the perspective of Juniper’s commercial interests that is not a bad thing. It is much preferable to have a smaller share of a bigger pie than 100% of a small one.

OpenContrail seems to be at a juncture where its starting to attract significant interest from people that have become disillusioned with other approaches that lack its problem statement and execution focus. The challenge will be to retain the later properties while creating a “bigger tent” where others can meaningfully participate and achieve both their technical as well a business goals. It promises to be both challenging as well as a great learning opportunity.

The Paris OpenStack Summit

I had the opportunity to attend last week’s OpenStack summit. With 4500 attendees, it clearly demonstrates that OpenStack is the clear mindshare leader for organizations interested in building cloud infrastructure. It is also significant to note that approximately half of the participants came from Europe which demonstrates that the “Old World” is not far behind the “New” when it comes to the desire to adopt cloud technology.

Parallel to the summit, the OpenContrail community organized both a user group meeting as well as an Advisory Board meeting. Both of these events ended up focusing the discussion in operations. While the user group presentations typically started with a description of the goals of the project most of the discussion in the room focused on topics such as automating and documenting deployment, provisioning, software upgrades and troubleshooting.

As a software developer, one often tends to focus on expanding the feature set. In both of these events there was a clear message that the user community takes reliability, scale and performance as the main reasons they adopted OpenContrail but is grappling with operational aspects. This means in one hand that testing, specifically unit testing of each component, is absolutly key is maintaining users confidence; and that the developer community needs to do more in order to enable users to automate deployment, upgrade and monitoring.

Some of that additional effort may simply to better organize existing documents that describe, for instance, upgrade procedures so that they are easily available (and editable). Most will require additional interacting between users and developers so that for instance, we can build a list of parameters that are important for an operator to be able to monitor.

The advisory group session also covered operational concerns; in additional several of the members brought up issues related with data security audits. A majority of the cloud deployments using OpenContrail targets business applications where is important to be able to understand aspects such as what are the network isolation guarantees in place; be able to easily deploy certified security appliances and audit and monitor configuration changes.

While connectivity is a solved problem, OpenContrail is essentially an automation tool and must be able to focus on addressing these twin issues of operations and data security/auditing. With that goal in mind the advisory group agreed to create 2 working groups with participation of both operators and developers in order to start chipping away at the problem.

At the OpenStack summit itself, in regards to networking, the general consensus seems to be that the reference implementation of Neutron has scaling and stability problems; OpenContrail is one of the few solutions generally recognized to be production-worthy that implement the Neutron API. From a purely analytical perspective that would make OpenContrail the ideal candiate to be the reference implementation for Neutron:

  • It is 100% open source (under the Apache v2 License);
  • It is built on a proven control plane architecture which is prevalent in Service Provider networks (BGP L3VPN / EVPN);
  • It includes a special purpose light-weight message bus built on top of XMPP, rather than relying on AMQP;
  • It uses a purpose built forwarding plane rather than a patchwork of OVS, ip-tables, dnsmasq, etc…

In short, it addresses the most common architectural issues with the current reference implementation.

The concerns most often raised about OpenContrail are that the size of the community is relatively small and that currently the work is mostly sponsored by a single company. While I certainly accept the first criticism the size of the Neutron community isn’t necessarily playing in its favor currently. That is because the Neutron community seems to be largely fragmented in a large number of different groups pursing typically very different visions and ideas of how to implement networking. In terms of the latter, while Juniper Networks seems to be the only company currently offering commercial support for OpenContrail there are now multiple companies that offer services such as custom development of OpenContrail projects.

Interesting enough, most of my time at the summit was spent discussing with a few OpenStack vendors that are looking for a commercial implementation of the Neutron API they can bundle with their products. While it is far from certain that these vendors will select OpenContrail, there seems to be a common realization that OpenStack deployments need a scalable solution now.

Other vendors where, on the other hand, promoting the message that the Neutron API itself is the problem and that a new API must be developed; although no proposals where really being put forth. I find this position pretty difficult to understand given the obvious similarities between the Neutron and AWS VPC APIs. Clearly, there is plenty of evidence that AWS VPC API serves the needs of most IAAS/SAAS users.

Docker networking

When docker launches a linux container it will, by default, assign it a private IP address out of RFC 1918 space. It connects this container to the host OS using a bridged interface (docker0). Connectivity between the outside world and the container depends on NAT.

Outbound traffic is NATed using the host’s IP address. Inbound traffic requires explicit port mapping rules that map a port on the host to a port in the container. Given that typically one runs multiple containers in the same host there needs to be a map between a host port (in the dynamic port range) and a service port on the container.

For example, the HTTP service port (80) in container-1 will be mapped to port 49153 while container-2 would see its HTTP port mapped to host port 49154. Ports that are not explicitly mapped cannot receive incoming traffic. Also containers within the same host will see different IP address ports than containers across different hosts (not very ‘cloudy’).

This is the reason why using a network virtualization solution such as OpenContrail is so appealing. OpenContrail, replaces docker’s networking implementation which can be disabled by using –net=none. It provides each container its own IP address in the overlay without the need to do port mapping.

OpenContrail can then be used to provide network based isolation, network based policies, floating-ip address, load-balancing and traffic monitoring capabilities.

The basic construct used by OpenContrail is a virtual-network. Virtual-networks are used to map a set of instances that have a common administrative characteristic. Typically an application tier maps to a virtual network.

Administrators can then define which virtual-networks are allowed to communicate with each other and control which services (ports) are allowed between a set of networks. For instance, one may define that a set of front-end instances can send its logs to a management network running syslog servers by establishing a TCP connection to the appropriate port; but not ssh into the same servers.

With floating-ips an external address can be assigned to a specific container; with the LBaaS API the floating-ip (or external address) can be associated with a set of back-ends.

The OpenContrail vrouter records all traffic flows to a centralized time-series database. The analytics component can then access this database and serve queries that provide visibility into the traffic patterns of the network.

OpenContrail can be used with and without OpenStack. The same OpenContrail install can span both domains providing a consistent API interface as well as direct distributed routing functionality.