Welcome to Mat's Cloud: May 2017

System Integrators, buckle up, DevOps is coming, and if you play your cards right - your role is about to get crazy important.

Let me start this post by telling a story. It's a story that involves a stubborn customer, 3 big vendors and a Cloud. The reason I need to start this way is simple - the same scenario with different "players" has happened so many times in the last few years that someone should sum up what we've all learned (or haven't, in some cases). I guess this would be a great place for a Disclaimer, and I'll quote my favourite disclaimer ever, from South Park: All Customers and events in this post, even those based on real people, are entirely fictional.

The story starts with Customer learning that Cloud is cool, and starting wanting it. The problem is that there is no manual on Google on how to build a personalised private cloud. That's no problem, why not just promote (rename) your head systems engineer to a Cloud Architect and follow his ideas and his experience_ "But… he's got no experience" you'll say, and you'd be right. What he does have is a lot of vendors at his disposal with fancy PowerPoint presentations explaining how cool and awesome OpenStack is. Everything is awesome, (check out the LEGO movie soundtrack below, but be warned - It will stay in your head for days), not let's go build a cloud!

Now comes the tricky part. A majority of the bigger customers prefer working directly with the Vendors Professional Services, in order to lead the deployment themselves and hold someone accountable for a potential lack of functionalities, or problems with the product. While there is logic to this philosophy, it applies more to the legacy infrastructure, where there is no complexity of the integration between different technologies. So… why doesn't it apply to the Cloud? In the past few years I have seen so many different scenarios where a customer followed this strategy and either rolled back the entire environment, or is still struggling to make a Lab working. For now - lets just state the 3 most obvious you should not use this strategy:

"Lost in Translation" bug in the Integration phase: Just like your Network, Systems and Apps engineers don't really understand each when you try putting them in the same room and making them collaborate, the Vendors of different types also wont be able to easily collaborate. Don't think that you'll be able to lead this collaboration, you will most definitely end up with different vendors pointing a finger at each other when asked why the integration is not working.
Support: Each Vendor will support their own product, but no one will give you a support of the Integration, which is the most complex part. A Cloud environment is difficult to build, but easy to operate. If something goes wrong - it goes REALLY wrong, and you will need an expert who understands the integration of the components in depth. It's impossible to demand this from your Cloud Architect or a Lead Engineer. Once the environment was initially built, they most probably didn’t spend their afternoons reviewing the Plugins/Drivers/Manual code modifications done in the implementation phase, and they will not be able to troubleshoot anything.
Upgrades/Modifications: Now imagine the moment you realize that your OpenStack/SDN/Orchestrator/etc. is obsolete, and you need to Upgrade. How does Upgrading each of the components impact the stability of the entire system? You will basically go back to the 1st problem, each time you need to modify anything on your Cloud.

What should be your strategy when deploying the Cloud?

The answer is rather simple actually. You need a partner, most likely a System Integrator, with a strong partnership with all of the Vendors whose products you wish to include in your Cloud environment. Here are the main reasons to involve a System Integrator:

They also had the "Lost in Translation" problem, but it was most likely a long time ago. At this phase different area specialists know how to talk to each other, and they can even help you teach your own employees how to do the same.
All the disputes between the different Vendors will be transparent to you, and the System Integrator is more likely to figure out why the Integration isn't working, and either work with a Vendor to resolve the issue giving you a full transparency. They can even engineer a custom code within the solution for you and give you support for it.

Conclusion

This is not an easy task for the System Integrator, but as soon as everyone starts understanding how the new system should work, it will be so much easier to deploy a stable and fully supported and Upgradable Cloud environment, without an Engineering department that companies like Google, AWS and Facebook have managing their clouds.

Before we start, lets once again make sure we fully understand what Bridge Domain is. The bridge domain can be compared to a giant distributed switch. Cisco ACI preserves the Layer 2 forwarding semantics even if the traffic is routed on the fabric. The TTL is not decremented for Layer 2 traffic, and the MAC addresses of the source and destination endpoints are preserved.

When you configure the Bridge Domain in the ACI, you need to decide what you want to do with the ARP packets, and what you want to do with the Unknown L2 Unicast. You can basically:

Enable ARP Flooding, or not.
Choose between the two L2 Unknown Unicast modes: Flood and Hardware Proxy.

Hardware Proxy

By default, Layer 2 unknown unicast traffic is sent to the spine proxy. This behaviour is controlled by the hardware proxy option associated with a bridge domain: if the destination is not known, send the packet to the spine proxy; if the spine proxy also does not know the address, discard the packet (default mode).

The advantage of the hardware proxy mode is that no flooding occurs in the fabric. The potential disadvantage is that the fabric has to learn all the endpoint addresses.

With Cisco ACI, however, this is not a concern for virtual and physical servers that are part of the fabric: the database is built for scalability to millions of endpoints. However, if the fabric had to learn all the IP addresses coming from the Internet, it would clearly not scale.

Flooding Mode

Alternatively, you can enable flooding mode: if the destination MAC address is not known, flood in the bridge domain. By default, ARP traffic is not flooded but sent to the destination endpoint. By enabling ARP flooding, ARP traffic is also flooded. A good use case for enabling ARP flooding would be when the Default Gateway resides outside of the ACI Fabric. This non-optimal configuration will require ARP Flooding enabled on the BD.

This mode of operation is equivalent to that of a regular Layer 2 switch, except that in Cisco ACI this traffic is transported in the fabric as a Layer 3 frame with all the benefits of Layer 2 multi-pathing, fast convergence, and so on.

Hardware proxy and unknown unicast and ARP flooding are two opposite modes of operation. With hardware proxy disabled and without unicast and ARP flooding, Layer 2 switching would not work.

This option does not have any impact on what the mapping database actually learns; the mapping database is always populated for Layer 2 entries regardless of this configuration.

Welcome to Mat's Cloud

How DevOps and Cloud raise the importance of System Integrator

What should be your strategy when deploying the Cloud?

Conclusion

Cisco ACI Unknown Unicast: Hardware Proxy vs Flooding Mode

Hardware Proxy

Flooding Mode

Most Popular Posts