One of the key decisions in designing a compute infrastructure is how to handle networking.
For platforms that are designed to deliver applications, it is now common knowledge that application developers need a platform that can execute and manage containers (rather than VMs).
When it comes to networking, however, the choices are less clear. In what scenarios are designs based on single layer preferable vs. overlay networks ?
The answer to this question is not a simplistic one based on “encapsulation overhead”; while there are overlay networking projects that do exhibit poor performance, production ready solutions such as OpenContrail have performance characteristics on both throughput and PPS similar to the Linux kernel bridge implementation. When not using an overlay, it is still necessary to use an internal bridge to demux the container virtual-ethernet interface pairs.
The key aspect to consider is operational complexity!
From a bottoms-up perspective, one can build an argument that a network design with no encapsulation that simply uses an address prefix per host (e.g. a /22) provides the simplest possible solution to operate. And that is indeed the case if one assumes that discovery, failover and authentication can be handled completely at the “session” layer (OSI model).
I’m familiar with a particular compute infrastructure where this is the case: all communication between containers uses a common “RPC” layer which provides discovery, authentication and authorization and a common presentation layer.
In the scenario I’m familiar with, this works well because every single application component was written from scratch.
The key take away for me from observing this infrastructure operate was not really whether an overlay is used or not. The key lesson, in my mind, is that it is possible to operate the physical switching infrastructure independently of the problems of discovery, application failover and authentication. I believe that those with a background in operations in traditional enterprise networking environments can fully appreciate how decoupling the switching infrastructure from these problems can lead not only to simpler operations but also to driving up the throughput of the network fabric.
In environments where not all application components use the same common RPC layer or where a “belt and suspenders” approach is desirable, discovery, authorization and failover are a function that the infrastructure is expected to deliver to the applications.
Whenever that is the case, using an overlay network design provides the simplest (from an operations standpoint) way to implement this functionality because it offers a clear separation between the application layer and the physical infrastructure.
In the work we’ve been doing with OpenContrail around kubernetes/openshift, OpenContrail provides:
- access control between application tiers and services (a.k.a micro-segmentaion);
- service discovery and failover, by mapping the service’s virtual IP address (aka ClusterIP) to the instances (Pods) that implement the service;
- traffic monitoring and reporting on a per app-tier basis;
While OpenContrail introduces its own software stack which has an inherent operational complexity; this functionality can be operated independently of the switching infrastructure. And it is totally agnostic to the switching infrastructure. It can operate the same way
on a private cloud environment or on a public cloud; on any public cloud.
Using an overlay, allows OpenContrail to carry with the packet the information of the “network context” for purposes of both access control and traffic accounting and analysis. It also provides a clean separation between the virtual IP address space and the physical
For instance, address pools do not have to be pre-allocated on a per node basis. And while IPv6 can bring an almost infinite address pool sizes, that doesn’t address the problem of discovery and failover for the virtual IP addresses associated with a service (a.k.a ClusterIPs in k8s).
In addition to this, standard based overlays such as OpenContrail can carry the information of the virtual network segment across multiple clusters or different infrastructure. A simple example: one can expose multiple external services (e.g. an Oracle DB and a DB2 system) to a single cluster while maintaining independent access-control (that doesn’t depend on listing individual IP addresses).
From an operational perspective, it is an imperative to separate the physical infrastructure, which is probably different on a cluster by cluster basis, from the services provided by the cluster. When discovery, authorization and failover are not a pure session layer
implementation it makes sense to use a network overlay.