building a rest API for ExBGP

The last couple of years there is a trend to extend layer three to the top of rack switch (TOR). This gives a more stable and scalable design compared to the classic layer two network design. On major disadvantage of the layer 3 to the TOR switch is IP mobility. In the classic L2 design it was a simple live migration of a vm to a  different compute host in a different rack. When L3 is extended to the TOR IP mobility isn’t that simple anymore. A solution for this might be to let the VM Host advertise a unique service IP for a particular VM when it becomes active on that VM host. A great tool for this use case is ExaBGP.

ExaBGP does not modify the route table on the host itself it only announces routes to its neighbours. After ExaBGP starts the routes it advertises can be influenced by sending messages to STDIN

Below is the config used by the ExaBGP daemon

Most of this is pretty self explanatory the important stuff happens on line 9-11. These lines start a script and all output of this script is parsed by ExaBGP.

The script provides a rest API which outputs on STDOUT the announce and withdraw commands for ExaBGP.

For testing purposes I created a simple setup within KVM and two hosts, docker1 which runs ExaBGP and firewall-1 which runs the birdc bgp daemon. There is a L2 segment between those clients over which BGP peering is created

The python script is only 75 lines long.

The heavy lifting of the web service is handled by this is a powerfull library to create a webserver in Python. I am a network engineer with very limitted experience with Python but creating the script only took me a couple of hours.

The script in action

We start with starting the ExaBGP Daemon

By default the service is started at port 8080

The BGP neighbor is also shown as established by bird

adding a route is as simple as doing a simple curl on the host on which the ExaBGP is running

ExaBGP gets the announce message

the bgp daemon on the firewall also knows the route

the REST API also accepts communities and meds

which is shown by the bird daemon as well

Withdrawing routes can also be done easily with a curl statement

And the route is gone

At the moment there is only limitted input validation. The REST API does check if the ip address entered is valid but no other checks are implemented at this moment. I might add this if need arises.

The script and configs used in this blog can be found on my Github


Using the Python UCS library

Recently some VCE vBlocks have been taken into production at my current job. Although VCE installs everything for you they didn’t configure all the required production Vlans. The vlans need to be added to various components in the vBlock

  • Nexus 9000
  • Nexus 1000V
  • UCS-FI

configuring them on the Nexus devices is pretty straight forward but configuring them on the FI as a chore for the operations team. First add the Vlan to the system and them add the VLAN to every vNIC template

As I am still trying to improve my Python skills I just wrote a script to add a vlan from the cli to do this for me.

It starts with downloading the the Python SDK from Cisco and install them on you management system. After installation you are good to go an you can start wrting your own scripts. The documentation provided is not very elaborate but sufficient for a script like this.

First some modules are to be loaded. Besides the ones required for the UCS related stuff I add a few to make the script “nice” argparse is a library to support command line options and getpass allows entering passwords without showing them on screen

The argument parser is created.

This arguments parser adds a number of command line options

  • –fi the ip or hostname of the fabric interconnect
  • –add to add a vlan
  • –del to remove a vlan
  • –id the vlan id (the number)
  • –name the vlan name

When one of the options is missing an error is raised and some help tekst is provided. Argparser also prevents you from providing both add an del together.

Line 9 and 10 prompts for the username andpassword. Getpass prevents the password to be echoed on screen.

line 1 and 2 store the entered values for the vlan ID and vlan name in a more recognizable variable name.  A try expect structure is started and an handle to the UCS is created. All actions on the UCS will be done via this handle. The first thing to do now is do a login with the supplied credentials and ip address or hostname of the FI.

line 9 retrieves every vnic template on the system. This is simply done by retrieving all objects of the the class “vnicLanConnTempl” this string is the ouput of VnicLanConnTempl.ClassId(). The hardest part of writing scripts for UCS is determining the required ClassId. The easiest way to do this in my opinion is to dump the XML from the UCSM gui and find the required classes. Open the UCS GUI and select the object you want some info about. Press the right button and select Copy XML

Copy XML

The XML for this object is placed on the clipboard.

This is a lot of informartion but the most important part is vnicLanConTempl the ClassId of this object. It is also obvious that the children of vnicLanConnTempl are the vlans which are allowed on this. So we already know that objects of ClassId vnicEtherI needs to be added if we want to modify the allowed vlans.

Line 11 retrieves the LanCloud. Under the LanCloud all objects related to L2 are stored. In line 12 the Lancloud is used as a starting point for a search for the vlan with the name which needs to be added. If it is present it should not be added or deleted later on in the script.

This part of the script handles the adding of a vlan to the UCS. Line 4 and 5 check if the vlan already exists. When this is true the scripts logs a messages and continues with a logout. If the vlan is not found another try except structure is created. On line 9 the second UCS API command in the script, AddManagedObject, is used. This command is adds an object below another object. In this case we are adding a vlan below the LanCloud. The parameters used to create the vlan are the name and the id.

When the addition of the vlan  is successful another try expect is started. This one is to add the vlan to the vNics obtained earlier. For some reason the Dn of the new VnicEtherIf needs to be supplied as one of the parameters. I have not been able to find a list of required parameters of the various ClassIds.

The format of the Dn was again obtained by using the XML retrieved from the GUI. One important thing to notice is the True value in the AddManagedObject. This prevents the API to raise an error if the vlan is already part of the allowed vlans on the vNic.

The last line close the various try statements.

The final section of the script handles the removal of the Vlan from the vNics and the vlan from the LanCloud. Line 4 searches for all VnicEtherIf with the name of the Vlan which needs to be removed. The base for this search is are the vnics obtained earlier. Line 5-7 removes all these VnicEtherIfs in one operation, but only if there is at least one Vnic. Line 9 and 10 do the same for the vlan.

The last lines closes the try, except and does a logout from the script.

Seeing the script in action

Best way is to keep the UCS GUI open while executing the script so you can see the vlans appear magically when executing this simple script.

Finding smallest subnet for two host

@netmanchris asked for a method to determine the smallest common subnet for two hosts. Below is my solution based on the python netaddr library

On line 2 and 3 the ip are used to create to an IPNetwork. On line 4 a list is created containing al supernets of the IPNetwork. As the list is from large to small it needs te be reserved. By looping over eacht of the supernets and checking of the second ip is part of the subnet the common subnet is determined.

The Python netaddr is very versatile and can help you with various tedious ip operations

POAP and Ansible integration part 4

In the last part of the series I will look at the boot process of a POAP installation
First thing to do is run the playbook to populate the tftpboot folder and create all the files.

There were no changes required for the DHCP server but as I removed all files from the tftp root all files were created or copied in case of the NXOS files.

At my dev system at home. I didn’t have the NXOS files available So i just created bogus files for demonstration purposes. The boot process below did use the correct software images.
Now all files are in place and the DHCP server is ready it is time to start the POAP process
To get a switch after it already has been configured back in poap mode a special boot option needs to be configured. Save the config and reboot the switch.

The system boots with the software 6.0.2.U2.2 (line 13) and POAP is enabled (line 26)

Obviously we do not want to abort POAP. So we wait until the device does a DHCP request on its management port which happens on line 9 and after about 25 seconds the switch decided on use this offer and continue the process (line 10).

The bootfile is downloaded and execution starts. Not sur why on line 23 it is stated that the MD5SUM is not verified because and incorrect MD5 in the file results in a failed boot process. All other messages are self explanatory.

The switch reboots after the succesfull POAP process and reboots with the specified software version and we are able to login with the username specified in the configuration file.


As you can see POAP is very powerfull to quickly upgrade and configure a large number of new switches. It would also be possible to modify the playbook to use the configuration of a failed switch. Imagine sending a replacement switch to the datacenter, the field engineer repalces the switch. You only need to change one line in a YAML file, run the playbook and the POAP files are prepared and the DHCP server is reconfigured and restarted.

POAP and Ansible integration part 3

The third part of the series will be about all the files required for the boot process. The boot process follows the diagram below.


All configuration files required for the boot process are generated by TFTPD role in the playbook. The tasks associated with this roles are defined in the YAML file.

The bootfile is a Python script based on an example which can be downloaded from CCO when you have correct entitlement. I have modified the Python script a bit and removed one bug which prevented the script to recognize the switch as a Nexus 3048. In the script all the details for the POAP process are specified.

  • software version
  • configuration file
  • download credentials
  • transfer method
  • download server

As I wanted to be flexible in the software version I used the templating system of Ansible to generate custom py files for booting.

The handler called when the py file changes is used to create the actual py file provided via the DHCP offer. In the actual file an extra line is added with the md5sum of the file without this extra line. When executed by the switch the python script will remove the line with the md5sum, calculate the md5sum and verify the script.  The handlers are specified in a separate YAML file

The handler for the py file is add md5 This handler executes a bash script to calculate the md5, add it to the file and store it as a new py file without the md5 suffix. This is the file downloaded by the switch during the POAP process and needs to be supplied by the DHCP server als Boot file.

When the md5 of the bootfile matches with the md5 included in the file the complete script is executed. In the script the transfer method is specified. If another method than tftp is used for transfer of the configuration and software credentials need to be specified. Please be aware that these credentials will be sent unencrypted to the switch when the py file is transferred. The files to be transferred are also specified in the script. In this example the name of the configuration file to be downloaded is derived from the serial number. This can be seen in line 8 of the roles/tftpd/tasks/main.yml file.

The next task is to create the actual configuration files. This is pretty straightforward. More about this can be found In a previous blog on this site. The only special thing is the handler generate md5 which calculates the md5sum of the configuration file and places this value in a textfile. This textfile has the same name as the configuration file with an .md5 suffix. The format of the string is md5sum=12345abcdef. The Python script executed by the POAP process will download these file automatically and verify the MD5SUM.

The last task in the playbook copies all the NXOS images to the TFTP server. Again a handler is called to create the md5 files like with the configuration files.

It is important to realize that Ansible is indempodent. It will always strive to keep everything in a consistent state regardless of how many times a playbook is run.  This also means that files generated by Ansible must not be changed by hand. The next time the playbook is run the changes made by hand will be lost.

In the last blog in the series I will show how everything works together and the switch will do a POAP.

Part 4

POAP and Ansible integration part 2

In this part of the serie I will discuss the isc-dhcpd server configuration. isc-dhcpd is a DHCP server which is available on most linux distributions. It has many options but for this setup only a minimal configuration is required.

The directory layout  for the ansible-playbook for the DHCPD role

The tasks for the DHCPD role are defined in roles/dhcpd/tasks/main.yml.

in role/dhcpd/vars/main.yml basic settings are configured for the DHCP server.

I my lab I used two scopes and one range to allocate addresses from. These settings are used in the dhcpd.conf.j2 template to create the main dhcpd.conf

At the end of the configuration an additional configuration file called static_clients has been included, in which the reservations for the statich (POAP) clients are defined. I have placed these in a separate file for a reason. In a normal environment there would be at least two DHCP servers. Each server would be responsible for a part of the subnet to allocate address from. Or there would be a master/slave relation between the two servers which requires different configurations on both. The reservations however must be the same on both servers.

This template is used by the task Generate dhcpd main config files. The handler is instructs Ansible to restart the DHCPD service but only when the configuration has changed.
The next task is to include an additional YAML file globals_poap_clients.yml with data about the various poap clients. The file is placed in a different directory than the normal vars directory belonging to the role because it will also be used by the TFTPD role.

This files specifies two Nexus devices. The data is being used in the task create client dhcpd config files and fed to the template for the POAP clients.

This configuration will provide for each poap client:

  • Hostname
  • Bootfile
  • Bootserver

Settings like the IP address/mask/gateway/DNS are provided via the global scope. The ip details specified in the YAML file will be used for the generation of the actual switch configuration files.

Normally reservations are made based on the MAC address. In this setup I have chosen to make the reservation based on the serial of switch. This is possible because the serial is used as the client-identifier in the DHCP request. The serial of a new switch is often more easilly obtained than the mac address and I hate entering mac addresses as each vendor/tool requires a different format.

It took a Wireshark capture to get it working because Cisco prepends the client- identifier with an ASCII NULL. That is why the \000 in front of the {{client.serial}} is required on line 5

Again when dhcp settings have changed like adding a POAP client the DHCPD service will be restarted by Ansible.

After running the playbook the configuration for the DHCP server is generated.

Overall the DHCP server configuration is pretty simple. In my lab the DHCP server is running on the same hosts as the ansible-scripts In a real world deployment this will most likely be different remote servers. How to configure Ansible to connect to remote DHCP servers is beyond the scope of this series but can be found on the internet easilly

This was part 2 of the series. In part 3 I will discuss about all the various files which need to be generated to make the POAP work.

Part 1

POAP and Ansible integration part 1

Everyone who has every installed a Nexus switch is familiar with the following message.

I always pressed y and be done with it. Since I have been using Ansible to create config files and to deploy Linux clients I have been wondering if I can do it all with Ansible. In a number of blogs I will describe  how to setup everything and never touch your console cable anymore. Please follow me on Twitter for the other blogs on this subject.

The flowchart for the setup is below


Everything is being specified in a number of YAML files. In the YAML files details about the POAP clients like, serial number, desired software version, hardware platform and ip details are specified. Also the basic DHCP server configuration parameters are specified in a YAML file.

The YAML files are used to create the following files via the templating system.

  • isc-dhcpd configuration files
  • bootfiles for the Nexus devices
  • configuration files for the Nexus devices

The creation of all these files has been split in two roles


In the next blog post I  will describe how the DHCPD role is responsible for the isc-dhcpd service.


Using Ansible to create config files

Recently I had to roll out a number of access switches. In the past I created the config files with either Excel/Word via a mailmerge or custom perl scripts. Both methods were not ideal. Mailmerge is inflexible and although I know my way around in Perl my colleagues often do not. After reading the excellent Ansible blog by Kirk Byers I gave it a try.

Ansible is primarily a tool like Chef and Puppet for server management. To make Ansible do something it has a concept named playbooks. A playbook defines which roles a specific host has. Each role has it specific tasks which need to be executed on that  hosts. For example a hosts has a role as DNS server . Tasks associated with this role could be make sure the latest version of Bind is installed and all the zone files are up to date. But also a task of creating the zone files by means of using a template system. This template system will be used to create the configuration files in this example

 Almost all files used by Ansible are written in the YAML format.
Below is the playbook used in this example.

Normally the tasks indicated by the roles would be executed on a remote host (remember the DNS server from above). For this example the files are generated on the same host as the Ansible script is being run this could also be a remote TFTP server for example.
The tasks belonging to the switch role of localhost are defined in a separate YAML file.

The task executed on the local host is creating files based on the Jinja2 template. The variables being used are also defined in a YAML file. The template is being completed by looping over item of the dictionary access_switches

The contents of the Jinja2 file.

This is a fairly simple Jinja2 file and is easy to read even without knowledge of the Jinja2 language. Everything between double curly brackets are variables which are being replaced with the actual value. Everything enclosed by a curly bracket and the percent sign is a function of the Jinja2 templating system. In this case a simple include for very static things like vlans and snmp stuff.
The directory layout for an Ansible script is very important. All files are expected to be found in specific directories. Below is the layout for this tutorial. Don’t worry about the router subdirectory for this moment.

The magic happens by running the playbook.

and the configuration files can be found in the config directory.

Although it might seem to be a lot of work to create all these YAML and Jinja2 files to generate a couple of configuration files it can save a lot of work later on. Imagine that you have generated 40 configurations and all of a sudden there is an additional vlan which needs to be included in all configurations. Now it is just a case of modifying one single file and generate all the configuration files by simply running the playbook again.

ERSPAN on the Nexus7000

To troubleshoot some performance issues A span port was required on a Nexus7000. Off course the port to span was not located on the same switch as the SPAN destination.

On the Nexus 7000 it is not possible to use an RSPAN vlan as a SPAN destination. It can only be used as a span source. So this was not an option.

ERSPAN can be used as a SPAN destination but the N7K where the ERSPAN traffic needed to be decapsulated and sent to the monitoring tool didn’t have the correct sofware to do this.  So again not a feasible solution

However it is possible to give the monitoring tool the ip address of the ERSPAN destination and place it in a segment reachable by the N7K generating the ERSPAN traffic.

The basic configuration looks like this

In the admin VDC the source-ip for the ERSPAN traffic needs to be specifed

Not sure why this is needed in the admin VDC.
Give a simple linux VM the ip and capture the data with tcpdump.

ERSPAN uses the GRE protocol to encapsulate the packets and sent them to the collector so we filter on those.
Opening the file in wireshark shows us the data received. In the red box ERSPAN traffic can be seen and in the blue box the actual encapsulated packets.


Recently I have been following the VMWARE VCP-NV course and have been reading about het VXLAN MP-BGP eVPN control plane. In this post I will give a very brief overview of the Layer 2 operation of both solutions
Multicast no longer required

In previous implemenations of VXLAN BUM (Broadcast, Unknown Unicast, Multicast) traffic was sent via multicast to all VTEP’s which might be interested in these packets. Multicast, especially L3 multicast, is rare in a datacenter and the dependency on multicast by VXLAN was a huge limitation for adoption of VXLAN in the datacenter.

They both no longer require multicast to handle BUM . Via the control plane each VTEP knows about the other VTEPs interested in traffic for a particular VNI. BUM traffic is replicated by the local VTEP as a multiple unicast packet to all other VTEPs.

NSX also has a hybrid mode. In hybrid mode BUM traffic destined for VTEP in the local VTEP segment is being sent via a local Multicast group. Traffic towards remote VTEP segements is being forwarded to the forwarder in the remote segment which replicates the packet as a multicast packets in its local segment.

Besides learning of interested VTEPs for a particular VXLAN segment the controlplane is also used to propagate MAC reachabillity between the VTEP. The control plane removes the need for a flood and learn mechanism for MAC learning

Open vs closed control plane
The control plane used by NSX is a proprietary protocol. The VTEP on the ESX servers can only work with the NSX controllers. At this moment the only hardware switch which can be part of the NSX VXLAN cloud is from Arista. The controllers used by NSX are VMs running on the control cluster. This are a dedicated ESXi machines running the various control functions within a NSX deployment. At least three controllers are required which should not run on the same ESXi.
The MP-BGP VXLAN solution is based on open standards. The EVPN address familly of BGP is used to propagate all the required information like VNI, MAC reachabillty between the VTEPs. Vendors like Juniper, Huawei, Cisco and Alcatel Lucent are already supporting this. Although it should be possible to create a full mesh of iBGP between the VTEP’s it seems logical to use BGP route-reflectors for scallabillity.

EANTC has done some interoperability testing with the vendors above and made the white paper available

in a next post I will describe how the routing has been implemented by NSX and the VXLAN MP-BGP solution