Recently, I’ve heard several people suggest that the advent of IPv6 changes the requirements for data-center virtual network solutions. For instance, making the claim that network overlays are no longer necessary. The assumption made is that once an instance has a globally unique IP address that all requirements are met.
In my view, this analysis fails in two dimensions:
- In the assumption that it is desirable to give instances direct internet access (via a globally routed address);
- In the assumption that overlay solutions are deployed to solve address translation related problems;
Neither of these assumptions hold when examined in detail.
While there are IaaS use cases of users that just want to be able to fire up a single virtual-machine and use it as a personal server, the most interesting use case for IaaS or PaaS platforms is to deploy applications.
These applications, serve content for a specific virtual IP address registered in the DNS and/or global load-balancers; that doesn’t mean that this virtual IP should be associated with any specific instance. There is layer of load-balancing that maps the virtual IP into the specific instance(s) service the content. Typically this is done with a load-balancer in proxy mode.
As an aside, enabling IPv6 in the load-balancer is typically the best approach to make an application IPv6-ready; it is also the best way to do so. While IPv6 doesn’t add functionality to IPv4 (other than more addresses) it does add an additional burden in terms of manageability and troubleshooting; dual-stack applications or dual-stack networks are twice as much operational load, with no benefit as compared to terminating the IPv6 session on the load-balancer.
Back to our application, this is typically implemented as a set of multiple instances; often with specialized functions: front-ends (web-page generating); business-logic and caches; databases. The reasonable default, from a data security perspective, is to disallow internet access to and from these instances. RFC1918 addresses come in as a real benefit since they prevent someone from accidentally routing traffic directly.
In several IaaS platforms that i work closely with, as soon as the platform is operational, there is an incident with VMs generating DDoS attacks from the cloud (usually benefiting from high bandwidth internet connectivity). My recommendation currently, is for applications to LBaaS from inbound traffic plus a jump host with a SOCKs proxy for outbound access. That still leaves the LB and jump host as threat vectors but it is easier to monitor and manage than having direct internet connectivity from the VMs with either floating-ip (bi-directional) or source-nat.
Thus, to me at least, globally routed addresses are hardly a feature. Lets now examine the rational for using overlays in data-center networks.
The first thing we need to notice is that overlays are replacing vlan based designs (based on IEEE 802.1D) that already support the main functionality of the overlay: the separation between identifier and the “locator”. The identifier being the IP address of the instance and the locator the ethernet address in the case of 802.1D or IP address in the case of the IP based overlay.
The reason this separation is needed is because transport sessions, configurations and policies are tied to IP addresses. Assigning an IP address based on the server that is currently executing an instance doesn’t work for a multitude of reasons:
- Instances need to exist independently of their mapping to servers;
- Many configuration based systems are tied to IP address rather than DNS;
- Network based policies are tied to IP addresses;
Operators deploy applications by defining a set of relationships between instances; this is typically done in an orchestration tool such as CloudFormations or OpenStack HEAT. When instances are defined and typically before they are executed and scheduled, IP addresses are assigned; these IP addresses are then used in configuration files and access control lists.
When an instance is scheduled, it is assigned temporarily to a server. Even in scenarios where life-migration of the instance is not supported, the “logical” concept of the Nth cache server for application “foo” must be able to span its scheduler assignment. The server can die for instance; or the instance may have to be rescheduled because of interference with other workloads.
Of course, it is possible to construct a different mapping between identity and location. One that it is named based, for instance. One example would be for every service (in the SOA sense of the word) to publish its transient address/port in a directory; given the limitations of the default Docker networking implementation, some operators are doing just that. The drawback is that it forces one to tweak every single application in order to use this directory for any resolution and policy enforcement. This provides the equivalent functionality of what the network overlay is doing.
There is at least one very very large cloud provider that does just that. They have a directory that provides both for name resolution as well and authorization. The tradeoff is that all services must speak the same RPC protocol; and no externally developed application can be easily deployed inside the infrastructure. The RPC protocol as become what most of us think of the TCP/IP layer: the common interoperability layer.
Overall, It doesn’t seem to me that IPv6 changes the landscape; it just creates an additional manageability burden in the case of dual-stack deployment through the application. If one does terminate the IPv6 session at the load-balancer, however, it should be mostly a NOP.