Building Sandcastles

Every few years, in the networking industry, there is a new promising technology proposal around the concept of controlling individual flows of traffic. The analogies this invokes in my mind is the idea of trying to build a sandcastle by moving individual grains of sand. The idea that “fine grain” control is appealing is illusive: the sand moves.

In telecommunications, the challenge has always been how to “bucketize” traffic in a way that one can make statistical inferences about it. Traffic flows in data networks are transient. No system is ever going to be build to control individual flows on a one off basis. A flow must be classified into a bucket by a policy for any meaningful control to be possible.

Thus rather than talk about flows, the interesting question is what policies can be define and expressed in a network. This is not a new debate. When MPLS was  designed, the architects of the protocol had the clarity of though of building the protocol around controlling traffic “trunks” (Forward Equivalence Classes to use the proper terminology). This was at a time where Ipsilon was actively promoting its technology proposals centered around controlling individual traffic flows. Without ever quite explaining how these would be managed in aggregate.

Assuming one can express polices that can classify flows into meaningful buckets there is no possible rational for attempting to perform flow classification in a centralized way. It is perfectly reasonable to attempt to have a central location to configure such policies or even help compute some of the implementation details. But classifying flows is a function that must be implemented as distributed as possible in any system that is designed to do anything more than be an educational tool.

It is encouraging that a few years into the OpenFlow meme the majority of the people that initially ignored “networking 101” have managed to re-discover it. Hopefully in a couple more years we can collectively move on.

Advertisements

Virtual Router

I’m often asked how the OpenContrail vrouter differs from other approaches such as OVS. The answer is that the vrouter is designed to accomplish different goals and such the approach taken is quite different.

As Parantap Lahiri so clearly explains, traditional data-center designs use the aggregation layer as the cornerstone of the network. The aggregation was traditional the L2 to L3 boundary providing the levers that manage the network: inter-subnet routing, access-control-lists, service insertion and traffic monitoring capabilities. This design works until the point where the aggregate bandwidth requirements change drastically.

Traditional data-center networks rely on I/O locality; data and storage traffic is contained within a rack as much as possible using the fact that within the rack there is no oversubscription, while there is often a 20:1 or 10:1 oversubscription factor from the rack to aggregation. This reduces the cost of the network.

The problem is that in a modern data-center, servers, power and cooling drive approximately 80% of the cost. And I/O locality is in inverse relationship with server utilization. It is intuitive that if one can distribute compute load arbitrarily to a larger pool of machines then a much high utilization can be achieved. This requires that the network must never be the bottleneck.

I’ve been asked in the past whether there is a size of a cluster that is large enough such that beyond it no more efficient can be achieved. It may be the case but to my knowledge no has yet built one. The people that have 10,000 machine clusters are busy trying to grow them by an order of magnitude to improve utilization and avoid resource fragmentation.

Back to networking: In a CLOS fabric design there is no aggregation layer. That is the mission of the OpenContrail vrouter. Provide the ability to route traffic between networks with the necessary levels of policy control, in a distributed way.

This is a very different application than way OVS is typically used for. The typical OVS deployment is designed to provide the L2 service that brings traffic from the server to the L2/L3 boundary. This is typically a virtual machine that uses the Linux kernel capabilities to forward traffic between networks.

The OpenContrail vrouter looks at the virtual machine interface as the L2 domain. It then associates that virtual interface with an instance-specific routing table. For then on it uses a standards based approach to provide virtual-networks. This allows it to interoperate directly with existing network equipment and forward traffic across networks without the need to intermediate gateways.

I believe that the OpenContrail vrouter represents the happy marriage of zero though provisioning that is required in a modern data-center with the accumulated learnings of how to implement network virtualization at scale.  BGP L3VPNs have been deployed in both some of the larger service provider networks as well as large scale enterprises. The resiliency, power and scalability of this approach has been proven beyond any doubt in many multivendor networks.

Using devstack plus OpenContrail

devstack is a set of scripts that can download, compile and deploy an OpenStack development environment. The OpenContrail team has created a github fork of the devstack repository which can also download and install the several components that are part of OpenContrail. This article contains an example of how devstack can be used to test OpenStack plus OpenContrail integration.

The first step in order to create a development environment is to install a server. For the purposes of this article I used a virtual machine running ubuntu 13.10. This VM is running on an OpenStack cluster which has been configured to support nested virtualization.

Once the server is available and a user is configured, one can clone the devstack repository using the command:


git clone https://github.com/dsetia/devstack

This creates a subdirectory called devstack which includes the script stack.sh that can be used to install the development environment.

In order to enable OpenContrail, we need to create the file localrc in the devstack directory with the following content:

After this file is created, we execute the shell command “./stack.sh” in the devstack directory. It will ask questions for passwords for several services and will automatically generate them if not provided. This script will clone both the OpenStack and OpenContrail repositories and compile them as well as download additional packages for the operating system.

Once the script completes, OpenStack will be running and the network plugin will be set to the OpenContrail network virtualization plugin.

In order to test the network virtualization functionality, we can create a devstack exercise based on exercises/neutron-adv-test.sh. Download the test script and place the file in the exercises directory with the name contrail-tutorial.sh. This script creates a new tenant “demo1” and two virtual-networks; two virtual machines are also created, the first with a single interface in network “demo1-net1”; the second with two virtual-interfaces one in network “demo1-net1” and the second in network “demo1-net2”.

In order to get the test in the desired state, execute the sequence of commands:

./exercises/contrail-tutorial.sh -t
./exercises/contrail-tutorial.sh -n
./exercises/contrail-tutorial.sh -v

We can now use the horizon UI to login to the virtual-machine consoles. In my test setup, in order to be able to use horizon remotely the configuration file

/etc/apache2/sites-available/horizon.conf

must be modified such that it includes the following changes:


<Directory />
Options FollowSymLinks
AllowOverride None
Require all granted
</Directory>

<Directory /opt/stack/horizon/>
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
allow from all
Require all granted
</Directory>

Using horizon’s instance tab we can select the virtual-machine “demo1-server1” and access its console. After login it should be possible to ping the second virtual-machine since it also has an interface on the same virtual-network.

Now, lets login into “demo1-server2”. By default only a single interface “eth0” is initialized by this image. We can use the following command in order to bring up the second interface:

sudo udhcpc -i eth1

In order to verify the routing table, issue the command “netstat -rn”. You will notice that there are two default routes, one to each interface. That is probably undesirable. OpenContrail at the moment doesn’t yet implement the neutron “host routes” property of a “subnet”; According to RFC 3442 when DHCP static routes are used the “Default-Gateway” option (3) is ignored. But we may also want to refrain from advertising a default gateway even if no static routes are present. Lets modify the OpenContrail code as an exercise.

The first step is to look at the data-model. OpenContrail models IP subnet configuration as an association between a virtual-network object and a IP address management (IPAM) object.

The file “controller/src/config/schema/vnc_cfg.xsd” contains the following:

<xsd:complexType name="IpamSubnetType">
  <xsd:all>
    <xsd:element name="subnet" type="SubnetType"/>
    <xsd:element name="default-gateway" type="IpAddressType"/>
    <xsd:element name="advertise-default" type="xsd:boolean"/>
  </xsd:all>
</xsd:complexType>
<xsd:complexType name="VnSubnetsType">
  <xsd:all>
    <xsd:element name="ipam-subnets" type="IpamSubnetType" maxOccurs="unbounded"/>
  </xsd:all>
</xsd:complexType>
[...]
<xsd:element name="virtual-network-network-ipam" type="VnSubnetsType"/>
<!--#IFMAP-SEMANTICS-IDL
    Link('virtual-network-network-ipam',
         'virtual-network', 'network-ipam', ['ref']) -->

The “Link” definition between virtual-network and network-ipam objects uses the type “VnSubnetsType” which is a sequence of “IpamSubnetType” objects. This object currently contains the IP prefix and default-gateway defined on that subnet. In order to control the advertisement of the default route, we can add the highlighted element above: “advertise-default”.

The code for the OpenContrail api-server is generated from this schema, as well as the API library that is used to interface between Neutron and OpenContrail. The DHCP configuration is ultimately implemented by the “vrouter” user space agent (“vnswad”).

The configuration information is stored by the api-server in an IF-MAP server. The OpenContrail “control-node” process receives that information and calculates the minimum graph that each agent is interested in, based on the information of which virtual-machines are executing on which compute nodes (as communicated by the vrouter agent in the form of a subscription message). The code (C++) in the control-node and vrouter agent that understand the configuration datatypes is also auto-generated based on the schema.

Running “scons controller/src/schema” generates the new datatypes.

By default, we still want subnets to have a default route. The most common use case is to have a single virtual-interface per virtual-machine. Let’s modify the neutron contrail plugin in order to unconditionally set “advertise-default” to “true”. For the purposes of this tutorial we can then modify this flag on the “demo1-net2” network via the OpenContrail API directly.

In the file neutron/neutron/plugins/juniper/contrail/ctdb/config_db.py the highlighted change sets advertise-default to “true”:

    def _subnet_neutron_to_vnc(self, subnet_q):
        [...]
        subnet_vnc = vnc_api.IpamSubnetType(subnet=sub_net,
                                            default_gateway=default_gw,
                                            advertise_default=True)
        return subnet_vnc

This should result in the same default behavior as previously when the knob is implemented in the vrouter agent.

In order to modify the agent, we need to touch 2 points: the code that retrieves the configuration objects and stores these in the agent “operational” database; the DHCP service code.

The first is relatively mechanical. The following patch will copy the “advertise_default” field from the auto-generated code into the internal type “VnIpam”.

The DHCP service code requires also the advertise_default parameter to be passed into an additional temporary data structure:

As well as the actual implementation:

It is the last diff hunk, the modified if statement on whether to insert the DefaultRoute option (DHCP_OPTION_ROUTER in the code) that actually modifies the behavior.

In order to test the new code we need to reset the current state via the “unstack.sh” script and then re-run “stack.sh”.

When the re-compilation finishes and the OpenStack and OpenContrail code starts again, we will want to create the tenant and networks in our test via “exercises/contrail-tutorial.sh” with options “-t” and “-n” but not yet start the virtual machines.

Before starting the virtual-machines, we will use the following script to turn off the advertise default setting on the “demo1-net2” virtual-network:

We should be able to observe that the parameter is turned off by looking at the information present in the control-node. Pointing a browser at the URL http://devstack-server-ip:8083/Snh_IFMapTableShowReq?table_name=virtual-network-network-ipam should display both subnets for the demo1 project, where demo1-net2 displays:

<iq>
	<value>
		<ipam-subnets>
			<subnet>
				<ip-prefix>10.20.0.0</ip-prefix>
				<ip-prefix-len>24</ip-prefix-len>
			</subnet>
			<default-gateway>10.20.0.1</default-gateway>
			<advertise-default>false</advertise-default>
		</ipam-subnets>
	</value>
</iq>

At this point we want to start the virtual-machines again using the contrail-tutorial.sh script with the “-v” option. After the instances are up, login via horizon to demo1-server2 console and issue the dhcp client request via:

sudo udhcpc -i eth1

The interface should receive a DHCP response that does not contain the Default Gateway option.

While this is a limited example, i hope it can illustrate the process to develop and test functionality within OpenContrail as well as its integration with an orchestration system.