Cisco ACI and Nutanix Foundation Discovery

Nutanix uses an IPv6 multicast system for its foundation discovery. When you do this on a flat l2 network this is no problem. However, when attempting to do this on ACI you need to enable a specific setting for this to work.

Nutanix itself doesn’t know which setting to enable, if you ask them they only give you instructions for enabling the IPv6 multicast settings on a ‘normal’ Cisco network. For those using ACI this is useless. I can also imagine when working on a production network you don’t want to test several settings.

For us it worked by enabling a ND policy in the Bridge Domain. For this you need to know which vlan is configured on the Nutanix node itself.

This setting can be found when going to Networks, Bridge domains and then selecting the correct bridge domain. Here you can go to L3 networking. On the bottom of this page you’ll find the ND policy option. When you select the default policy and submit this it should work.

There might be other solutions for this. Please share them if you know them.

Cisco ACI upgrade from 1.2 to 1.3

So last week I attempted an upgrade of our ACI environment from version 1.2 to version 1.3. I know 2.0 is already available, but does not offer anything we need at this point and the upgrade to 1.3 was done because of an annoying bug.

A minor upgrade shouldn’t be a big issue but apparently it was.

We started the upgrade normally. We uploaded the new software and started the APIC upgrade. Easy. Just follow the upgrade instructions and you’ll be fine. The APICs will all install the new software and reboot when required. ACI is even smart enough to wait for a rebooting controller to come back online and pass all the health checks. You can’t do anything wrong.

 

At least, thats what we thought. Apparently after installing the new version we couldn’t reach the APICs using HTTPS anymore. After some troubleshooting we had the following information:

  • Ping doesn’t work
  • SSH does work
  • HTTP(S) doesn’t work

We started looking further. A collegue of mine tried to access the APIC from a server in the same network as the APICs (we use the out-of-band addresses on the APIC). That worked. We were baffled. We knew because of this behaviour it had to be a policy on the APIC itself. Fortunately, using the server in the same subnet as the APIC we had HTTPS access to ACI again which made it possible to troubleshoot. However, since we’re both fairly new at this and weren’t the guys who implemented the network we didn’t know where to look.

Fortunately the supplier did know where to look and helped us fix the problem. It was indeed a policy. I’ll come back on this topic in a bit.

 

Unfortunately this issue was my own fault. It is documented in the release notes for version 1.2(2), which I glanced over when preparing for the change. The actual text from Cisco is:

When upgrading to the 1.2(2) release, a non-default out-of-band contract applied to the out-of-band node management endpoint group can cause unexpected connectivity issues to the APICs. This is because prior to the 1.2(2) release, the default out-of-band that was contract associated with the out-of-band endpoint group would allow all default port access from any address.  In 1.2(2), when a contract is provided on the out-of-band node management endpoint group, the default APIC out-of-band contract source address changes from any source address to only the local subnet that is configured on the out-of-band node management address. Thus, if an incorrectly configured out-of-band contract is present that had no impact in 1.2(1) and prior releases, upgrading to the 1.2(2) release can cause a loss of access to the APICs from the non-local subnets.

These release notes can be found here.

For all of you preparing to do the upgrade from 1.2(1) to a version higher, please remember this one as it will bite you.

To check whether you will encounter this you can go to Tenants > mgmt > Node Management EPGs > Out of Band EPG – default.

Here you can view whether you use the default contract. In our case a non-default contract was specified here. You can look up this contract at: Tenants > mgmt > Out of Band Contracts > Name of your contract

You need to specify HTTPS access in this contract to be able to reach the APIC.

Unfortunately I can’t post any screenshots here as the referenced environment is a production environment which I’m not allowed to show, but if you have any questions or need for clarification, please let me know.

OTV terminology

Within OTV there are several terms that might be confusing. This is a list of all the terms I’ve encountered so far with an explanation.

  • OTV edge device: This is the switch that performs all the OTV operations. Layer 2 frames enter the switch and if they need to be transported over the overlay network they will be encapsulated here and sent out of the join interface. All the OTV configuration is done on these devices (with the possible exception of MTU configuration)
  • OTV internal interface: This is the interface on the edge device that points toward the datacenter network. It is a layer 2 interface and it needs to support all the vlans that need to be extended across the overlay network. To be able to do so it must be a (dot1q) trunk port.
  • OTV join interface: This is the interface on the edge device that points toward the routed network that carries the data toward the other datacenters.
  • OTV overlay interface: A logical interface on the edge device. A large part of the OTV configuration is done on this interface. It ensures the encapsulation of the layer 2 frames into layer 3 packets.
  • Transport network: This is the IP network that will carry the encapsulated layer 2 frames across to the other datacenter(s). The only requirement to this network is that it needs to be an IP network. It must also support jumboframes or you need to reduce the MTU in your network to ensure the additional overhead of OTV does not cause any problems (OTV does not support fragmentation)
  • Overlay network: The logical network that connects OTV devices.
  • Site: Most often a datacenter, but nothing prevents you from connecting a campus to a OTV network, so it might be a campus as well.
  • Site vlan: The vlan used by OTV edge devices within a site to communicate. This vlan must not be extended across the transport network. All OTV edge devices within a site need to agree on the site vlan. Because the vlan is not extended you could in theory use the same vlan on all sites.
  • Authoritative Edge Device: When using multiple OTV edge devices an AED is required. This is the device responsible for forwarding the data on a specific vlan. An AED is configured per vlan, so when using two OTV edge devices you can use both actively but for different vlans. (Only one Edge device is allowed to forward traffic for a specific vlan because of the removal of BPDU’s. If both devices were to forward traffic this might cause loops or MAC flapping.)
  • Site-ID: An identifier for the site that all Edge devices in that site must agree upon.
  • OTV Shim: OTV specific header which is added to the OTV encapsulated data to clarify among others the vlan from which the frame originated (and must be restored to)

I might encounter other OTV specific terms, and if I do I will list them here.

Overlay Transport Virtualization basics

I’ve had the privilege of working with OTV on several occasions the last couple of years and I’ve fallen in love with the technology. The simplicity of the technology is striking, even though the configuration can be complex in some cases.

OTV is designed solely as a datacenter interconnect (DCI) technology, and as such it is very efficient in it. Even though other solutions are available based on newer technologies like VXlan (for example ACI) OTV still has its merits (I believe).

The goal of OTV is to extend layer 2 segments across multiple datacenters. In early years this was a no-go area for many companies, limiting their (virtual) environments to a single datacenter. This severely limited their options for scalability and robustness. The reason many companies did not want to extend their layer 2 segments across datacenters is that it introduced many risks for catastrophic failure. With an extended vlan between datacenters you risked broadcast storms or spanning tree misconfigurations to bring down your entire environment. To be honest, I’ve seen this happen on several occasions.

What makes OTV so special is that it enables you to extend your layer 2 segments without any danger. It is built in such a way that it creates multiple failure domains. It does this by blocking BPDU’s at the datacenter edge and using a IP transport network. The layer 2 frames are encapsulated in a GRE like manner and transported to the other datacenter (only when needed). There they are decapsulated and sent on their way.

OTV has several smart constructs to limit the amount of traffic between two datacenters, for example ARP suppression, in which ARP’s are handled locally when possible. Only if the system that needs to respond to the ARP is in another datacenter and is not yet known to the OTV edge switches will the ARP request be forwarded to the other datacenter(s).

There is a lot to tell about OTV and probably I will create more posts about OTV in the future.

Cisco FabricPath

Cisco FabricPath is the answer to Spanning Tree. Whereas Spanning Tree offers no multipathing, inefficient routes, slow convergence, limited scalability and a high risk of a total network meltdown, FabricPath has none of these issues. FabricPath is scalable, supports multipathing, converges very fast, has efficient route selection and is easy to configure.

The reason we need a protocol like STP is because ethernet does not have any possibility to detect wether a frame should be dropped. There is no TTL like in IP. This causes frames to traverse the network endlessly. FabricPath does not need STP because it solves the ethernet challenges differently.

First of all, FabricPath does not forward frames in the same manner as a traditional network does. When a frame enters a FabricPath network it gets encapsulated in a FabricPath header. Within this header there is a different source and destination address. These addresses (the Outer mac Destination Address, or ODA and the Outer mac Source Address, or OSA) are based on the devices in the FabricPath network. When a frame enters the FabricPath network the FabricPath switch does a lookup on the destination address. If it is local it handles it locally, but if not it will find during the lookup the switchID of the destination switch the frame should be sent to. The switchID of this switch is put into the ODA. Now the FabricPath network knows where to send the frame.

But how does the FabricPath switch know which switch it should send the frame? It does this by MAC address learning. The same principle like normal switches, but with the exception that the addresses are only learned when there is a bi-directional traffic flow and not the outgoing port is recorded, but the destination switchID.

Bi-directional, or conversational learning of mac addresses works as follows:

  • For frames received on edge ports (the ports on which systems or ‘normal’ ethernet networks can be connected) the mac address is learned as usual.
  • Frames received on root ports (ports which are part of the FabricPath network) will only have their mac address learned if the destination mac address is already in the mac address table as a local address.
  • Broadcast frames will not trigger address learning, but will update existing entries
  • Multicast frames will trigger mac learning

An important part of FabricPath is the way switches can find each other. This is based on a implementation of the IS-IS routing protocol. This protocol is slightly changed to offer support for all the FabricPath features.

Configuring FabricPath is easy. This requires a Nexus switch with the correct licenses (enhanced L2) and the hardware support for FabricPath. If you have those available you can configure it with the following commands:

install feature-set fabricpath
feature-set fabricpath
vlan 
  mode fabricpath
interface 
  switchport mode fabricpath

And that’s it really.

1 2 3 5